Hi,

Just to continue this discussion. I think right now Lucene's retrieval
algorithm is based purely on Vector Space Model, which is simple and
efficient.

However, there maybe cases where folks like me want to use another set
of completely different ranking algorithms, those which do not even
use tf/idf.

For example, I am thinking about adding Cover Density ranking
algorithm to lucene, which is for now purely based on the proximity
information and does not require any global ranking variables. But
looking into the lucene code, it seems not very easy to make a hack
for that. At least, for me, a novice lucene user.

I read on the lucene whiteboard 2.0 that lucene will accomodate more
in terms of what to be indexed and such. That move might be good for
implementing other or ad hoc ranking algorithms.

Cheers,

Jian


On Wed, 26 Jan 2005 10:25:15 -0500, Ian Soboroff <[EMAIL PROTECTED]> wrote:
> Erik Hatcher <[EMAIL PROTECTED]> writes:
> 
> > By all means, if you have other suggestions for our site, let us know
> > at [EMAIL PROTECTED]
> 
> One of the things I would like to see, but which isn't either in the
> Lucene site, documentation, or "Lucene in Action", is a complete
> description of how the retrieval algorithm works.  That is, how the
> HitCollector, Scorers, Similarity, etc all fit together.
> 
> I'm involved in a project which to some degree is looking at poking
> deeply into this part of the Lucene code.  We have a nice (non-Lucene)
> framework for working with more different kinds of similarity
> functions (beyond tf-idf) which should also be expandable to include
> query expansion, relevance feedback, and the like.
> 
> I used to think that integrating it would be as simple as hacking in
> Similarity, but I'm beginning to think it might need broader changes.
> I could obviously hook in our whole retrieval setup by just diving for
> an IndexReader and doing it all by hand, but then I would have to redo
> the incremental search and possibly the rich query structure, which
> would be a lose.
> 
> So anyway, I got LIA hoping for a good explanation (not a good
> Explanation) on this bit, but it wasn't there.  There are some hints
> on the Lucene site, but nothing complete.  If I muddle it out before
> anything gets contributed, I'll try to write something up, but don't
> expect anything too soon...
> 
> Ian
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to