On Wednesday 11 January 2006 00:09, Eric Jain wrote:
> Is there an efficient way to determine if two or more terms frequently 
> appear next to each other sequence? For a query like:
> 
> a b c
> 
> one or more of the following suggestions could be generated:
> 
> "a b c"
> "a b" c
> a "b c"
> 
> I could of course just run a search with all possible combinations, but 
> perhaps there is a better way?

One way that might be better is to provide your own Scorer
that works on the term positions of the three or more terms.
This would be better for performance because it only uses one
term positions object per query term (a, b, and c here).

For two terms, Nutch has something very similar that works
over multiple fields. Have a look in the archives for the thread
"Lucene performance bottlenecks" that started on 2 Dec 2005.
I don't know how Nutch handles more than two query terms.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to