Op Wednesday 03 September 2008 18:06:57 schreef Matt Ronge: > On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote: > > Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge: > >> On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote: > >>> Can you tell us a bit more about what you custom query does? > >>> Perhaps you can build the "candidate filter" and reuse it over > >>> and over again? > >> > >> I cannot reuse it. The candidate filter would be constructed by > >> first running a boolean query with a number of SHOULD clauses. So > >> then I know what docs atleast contain the terms I'm looking for. > >> Once I have this set, I will look at the ordering of the matches > >> (it's a bit more sophisticated than just a phrase query) and find > >> the final matches. > > > > Sounds like you may want to take a look at SpanNearQuery. > > I'm going to take a second look at SpanNearQuery. I need it to > support optional tokens, so I'm guessing I'll need to create a > subclass to do that.
SpanNearQuery was not designed for optional tokens. This can be tricky so make sure your specs are good. I know only of this article for optional tokens and proximity: Kunihiko Sadakane and Hiroshi Imai. Fast algorithms for k -word proximity search. IEICE Trans. Fundamentals, E84-A(9), September 2001. > > >> Since my boolean clauses are different for each query I can't > >> reuse the filter. > > > > With (a variation of) SpanNearQuery you may end up not needing > > any filtering at all, because it already uses skipTo() where > > possible. > > > > In case you are looking for documents that contain partial phrases > > from an input query that has more than 2 words, have a look at > > Nutch. > > I poked around in the Nutch docs and Javadocs, what should I look at > in Nutch? What does it do exactly, is it the trick that Doug Cutting > mentioned where you concat neighboring terms together like "Hello > world" becomes the token hello.world? That is an optimization for combinations of high frequency terms, which is built into nutch iirc. But I don't know the details, so please ask on a nutch list. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]