Re: Field boosting Was: Indexing multiple instances of the same field for each document

Erik Hatcher Fri, 27 Feb 2004 19:47:59 -0800

On Feb 27, 2004, at 6:26 PM, Stephane James Vaucher wrote:

Slightly off topic to this thread, but how would adding different fields with the same name deal with boosts? I've looked at the javadoc and FAQ, but I think it's not a common use of this feature, any insight?

There is only one boost per field name. However, the effect is the multiplication of them all interestingly. So, in your example below, the boost of the "fieldName" is 2.

Erik


E.G.
Document doc = new Document();
Field f1 = Field.Keyword("fieldName", "foo");
f1.setBoost(1);
doc.add(f1);

Field f2 = Field.Keyword("fieldName", "bar");
f2.setBoost(2);
doc.add(f2);

Cheers,
sv

On Fri, 27 Feb 2004, Doug Cutting wrote:

I think it's document.add().  Fields are pushed onto the front, rather
than added to the end.

Doug

Roy Klein wrote:

I think it's got something to do with Document.invertDocument().

When I reverse the words in the phrase, the other document matches the phrase query.

Roy

-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, February 27, 2004 4:34 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document

On Feb 27, 2004, at 4:10 PM, Roy Klein wrote:

Hi Erik,

While you might be right in this example (using Field.Keyword), I can see how this would still be a problem in other cases. For instance, if

I were adding more than one word at a time in the example I attached.

I concur that it appears to be a bug. It is unlikely folks use Lucene like this too much though - there probably are not too many scenarios where combining things into a single String or Reader is a burden.

I'm interested to know where in the code this oddity occurs so I can
understand it more.  I did a brief bit of troubleshooting but haven't
figured it out yet.  Something in DocumentWriter I presume.

Erik

Roy


-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, February 27, 2004 2:12 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document

Roy,

On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:

       Document doc = new Document();
       doc.add(Field.Text("contents", "the"));

Changing these to Field.Keyword gets it to work. I'm delving a little

bit to understand why, but it seems if you are adding words
individually anyway you'd want them to be untokenized, right?

Erik

       doc.add(Field.Text("contents", "quick"));
       doc.add(Field.Text("contents", "brown"));
       doc.add(Field.Text("contents", "fox"));
       doc.add(Field.Text("contents", "jumped"));
       doc.add(Field.Text("contents", "over"));
       doc.add(Field.Text("contents", "the"));
       doc.add(Field.Text("contents", "lazy"));
       doc.add(Field.Text("contents", "dogs"));
       doc.add(Field.Keyword("docnumber", "1"));
       writer.addDocument(doc);
       doc = new Document();
       doc.add(Field.Text("contents", "the quick brown fox jumped
over the lazy dogs"));
       doc.add(Field.Keyword("docnumber", "2"));
       writer.addDocument(doc);
       writer.close();
   }

   public static void query(File indexDir) throws IOException
   {
       Query query = null;
       PhraseQuery pquery = new PhraseQuery();
       Hits hits = null;

       try {
           query = QueryParser.parse("quick brown", "contents", new
StandardAnalyzer());
       } catch (Exception qe) {System.out.println(qe.toString());}
       if (query == null) return;
       System.out.println("Query: " + query.toString());
       IndexReader reader = IndexReader.open(indexDir);
       IndexSearcher searcher = new IndexSearcher(reader);

       hits = searcher.search(query);
       System.out.println("Hits: " + hits.length());

       for (int i = 0; i < hits.length(); i++)
       {
           System.out.println( hits.doc(i).get("docnumber") + " ");
       }


       pquery.add(new Term("contents", "quick"));
       pquery.add(new Term("contents", "brown"));
       System.out.println("PQuery: " + pquery.toString());
       hits = searcher.search(pquery);
       System.out.println("Phrase Hits: " + hits.length());
       for (int i = 0; i < hits.length(); i++)
       {
           System.out.println( hits.doc(i).get("docnumber") + " ");
       }

       searcher.close();
       reader.close();

   }
   public static void main(String[] args) throws Exception {
       if (args.length != 1) {
           throw new Exception("Usage: " + test.class.getName() + "
<index dir>");
       }
       File indexDir = new File(args[0]);
       test(indexDir);
       query(indexDir);
   }
}

------------------------------------------------------------------- -- - - - ------- My results: Query: contents:quick contents:brown Hits: 2 1 2 PQuery: contents:"quick brown" Phrase Hits: 1 2

------------------------------------------------------------------- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

-------------------------------------------------------------------- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

-------------------------------------------------------------------- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Field boosting Was: Indexing multiple instances of the same field for each document

Reply via email to