Can you run a phrase query with high slop factor? like "united states of america"~999999
This will match documents with all the terms anywhere in the document. But it will also give higher scores when the terms are closer together, using edit distance I believe. -chris On 7/18/05, Rajesh Munavalli <[EMAIL PROTECTED]> wrote: > Intution behind adding n-grams is to boost naturally occurring larger > phrases versus using phrase queries. For example, if I am searching for > "united states of america", I want the search results to return the > documents ordered as follows > > Rank 1 - Documents containing all the words occurring together > Rank 2 - Documents containing maximum number of words in the same > sentence > Rank 3 - Documents containing all the words but some might appear in the > same sentence some may not > Rank 4 - Documents containig atleast one or two words > > If we have a n-gram index, most probably document talking about "united > states" gets preference over document containing "united" and "states" > seperately. If I am correct, this can be achieved without using phrase > queries. I am not sure if there is a better way to achieve the same > effect. > > Thanks, > > Rajesh > > > -----Original Message----- > From: Andy Roberts [mailto:[EMAIL PROTECTED] > Sent: Monday, July 18, 2005 5:56 PM > To: java-user@lucene.apache.org > Subject: Re: n-gram indexing > > On Monday 18 Jul 2005 21:27, Rajesh Munavalli wrote: > > At what point do I add n-grams? Does the order in which I add n-grams > > affect exact phrase queries later? My questions are > > > > (1) Should I add all the 1-grams followed by 2-grams followed by > > 3-grams..etc sentence by sentence OR > > > > (2) Add all the 1 grams of entire document first before starting > > 2-grams for the entire document? > > > > What is the general accepted notion of adding n-grams of a document? > > > > thanks, > > > > Rajesh > > I can't see any real advantage of storing n-grams explicitly. Just index > the document and use phrase queries. Order is significant with phrase > queries if I recall correctly, although you can use SpanNearQueries to > look for unordered ngrams, although I don't know why you would want to! > > Perhaps if you explain a little more about what you are trying to > achieve more generally, we can confirm that you don't need to mess with > explicit indexing of indexing. > > Andy > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]