Chuck Williams wrote:
The issue is this.  Imagine you have two fields, title and document,
both of which you want to search with simple queries like:  albino
elephant.  There are two general approaches, either a) create a combined
field that concatenates the two individual fields, or b) expand the
simple query into a BooleanQuery that searches for each term in both
fields.

With approach a), you lose the flexibility to set separate boost factors
on the individual fields.  I wanted title to be much more important than
description for ranking results, and wanted to control this explicitly,
as length norm was not always doing the right thing; e.g., descriptions
are not always long.

With approach b) you run into another problem.  Suppose the example
query is expanded into (title:albino description:albino title:elephant
description:elephant).  Then, assuming tf/idf doesn't affect ranking, a
document with albino in both title and description will score the same
as a document with albino in title and elephant in description.  The
latter document for most applications is much better since it matches
both query terms.  If albino is the more important term according to
idf, then the less desirable documents (albino in both fields) will rank
consistently ahead of the albino elephants (which is what was happening
to me, yielding horrible results).

Another way to handle this would be to generate a query like:

  title:(albino elephant) description(albino elephant)

In this case the coord factor would boost titles and descriptions which contained both terms. You may or may not want to disable the coord factor for the outer query, which can be done with:

BooleanQuery title = new BooleanQuery();
title.add(new TermQuery(new Term("title", "albino")), false, false);
title.add(new TermQuery(new Term("title", "elephant")), false, false);

BooleanQuery desc = new BooleanQuery();
desc.add(new TermQuery(new Term("desc", "albino")), false, false);
desc.add(new TermQuery(new Term("desc", "elephant")), false, false);

BooleanQuery outer = new BooleanQuery() {
  public getSimilarity() {
    new DefaultSimilarity() {
      public coord(int overlap, int length) { return 1.0f; }
    }
  }
};
outer.add(title, false, false);
outer.add(desc, false, false);

In general, doesn't coord() handle this situation?

Also, you can separately boost title and desc here, if you like:

  title:(albino elephant)^4.0 description(albino elephant)

or

title.boost(4.0f);



Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to