I would like to search a collection of "keyword"s with lucene. A Document has one or many keywords. The keywords appear only once in a document. (tf = 1) for example: Document_1 : ( "aa" "bb" "cc" ) Document_2 : ( "bb" "cc" ) Document_3 : ( "cc" "dd" ) Document_4 : ( "aa" "cc" "dd" )
I have a query from more terms with different boost. The coord(int overlap, int maxOverlap) is turn off. i.e. always return 1.0. query = "aa^0.1 bb^0.9 xx^0.1 yy^0.1 zz^0.1" the query may contain many terms which do not appear in a Document. i.e. "xx" "yy" and "zz" here. Amd I got 3 hits Document_2 : ( "bb" "cc" ) : score : 0.75391763 Document_1 : ( "aa" "bb" "cc" ) : score : 0.67014897 Document_4 : ( "aa" "cc" "dd" ) : score : 0.0670149 [Question] is...why Document_2 better than Document_1 !? Document_1 does match two terms; "aa" and "bb". I want to emphasize the "match" and less care the "mismatch". How should I modify Similarity to achieve that? (Document_1 should get higher score!) Is there any suggestion or example to implement such "keyword collection" searching? For the query above, I actually use BooleanQuery with TermQuery. What else should I take care of? /* ************************************************** */ BooleanQuery q = new BooleanQuery(true); // disable coord TermQuery tq; { tq = new TermQuery(new Term(field, "aa")); tq.setBoost(.1f); q.add(tq, BooleanClause.Occur.SHOULD); } { tq = new TermQuery(new Term(field, "bb")); tq.setBoost(.9f); q.add(tq, BooleanClause.Occur.SHOULD); } { tq = new TermQuery(new Term(field, "xx")); tq.setBoost(.1f); q.add(tq, BooleanClause.Occur.SHOULD); } .... Hits hits = isearcher.search(q); /* ************************************************** */ Thanks Lin