Hi, Just to continue this discussion. I think right now Lucene's retrieval algorithm is based purely on Vector Space Model, which is simple and efficient.
However, there maybe cases where folks like me want to use another set of completely different ranking algorithms, those which do not even use tf/idf. For example, I am thinking about adding Cover Density ranking algorithm to lucene, which is for now purely based on the proximity information and does not require any global ranking variables. But looking into the lucene code, it seems not very easy to make a hack for that. At least, for me, a novice lucene user. I read on the lucene whiteboard 2.0 that lucene will accomodate more in terms of what to be indexed and such. That move might be good for implementing other or ad hoc ranking algorithms. Cheers, Jian On Wed, 26 Jan 2005 10:25:15 -0500, Ian Soboroff <[EMAIL PROTECTED]> wrote: > Erik Hatcher <[EMAIL PROTECTED]> writes: > > > By all means, if you have other suggestions for our site, let us know > > at [EMAIL PROTECTED] > > One of the things I would like to see, but which isn't either in the > Lucene site, documentation, or "Lucene in Action", is a complete > description of how the retrieval algorithm works. That is, how the > HitCollector, Scorers, Similarity, etc all fit together. > > I'm involved in a project which to some degree is looking at poking > deeply into this part of the Lucene code. We have a nice (non-Lucene) > framework for working with more different kinds of similarity > functions (beyond tf-idf) which should also be expandable to include > query expansion, relevance feedback, and the like. > > I used to think that integrating it would be as simple as hacking in > Similarity, but I'm beginning to think it might need broader changes. > I could obviously hook in our whole retrieval setup by just diving for > an IndexReader and doing it all by hand, but then I would have to redo > the incremental search and possibly the rich query structure, which > would be a lose. > > So anyway, I got LIA hoping for a good explanation (not a good > Explanation) on this bit, but it wasn't there. There are some hints > on the Lucene site, but nothing complete. If I muddle it out before > anything gets contributed, I'll try to write something up, but don't > expect anything too soon... > > Ian > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]