On Wed, Feb 17, 2010 at 10:31:19AM -0500, Robert Muir wrote:
> yet if we don't do the hard work up front to make it easy to plug in things
> like BM25, then no one will implement additional scoring formulas for
> Lucene, we currently make it terribly difficult to do this.
FWIW... Similarity and posting format spec are so closely tied that I'm
considering linking them in Lucy.
Schema schema = new Schema();
FullTextType bm25Type = new FullTextType(new BM25Similarity());
schema.specField("content", bm25Type);
schema.specField("title", bm25Type);
StringType matchType = new StringType(new MatchSimilarity());
schema.specField("category", matchType);
That way, custom scoring implementations can guarantee that they always have
the posting information they need available to make their similarity
judgments. Similarity also becomes a more generalized notion, with the
TF/IDF-specific functionality moving into a subclass.
Maybe something similar could be made to work in Lucene. Dunno how McCandless
has things set up for spec'ing codecs on the flex branch.
Marvin Humphrey
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]