Eric also gave me the idea of using a SpanNear with maximum slop as a boolean to connect spans. Using this and SpanOr seems to make my time spent on the distribution of proximity clauses a little foolish :) Is that true? Is there any disadvantage to the max slop Spannear, SpanOr solution? Any advantage to distributing the 'and's?
- Mark On 9/1/06, Mark Miller <[EMAIL PROTECTED]> wrote:
Thanks for the tip Paul. It is embarrassing, but I only realized how OrSpan queries worked a day or two ago based on a tip from Eric. The way I assumed it would create the spans before was just wrong and I never had researched further. Now I see that it would be a nice optimization for what I have...but I have not yet looked into how easy it will be to integrate it into my distribution algorithm. I do use it for multiphrase queries however based on Erics tip. It will hopefully be pretty simple to apply it to my distribution, but I have not had time to check it out. I plowed this thing out pretty quickly and am hoping I can go back and clean up a lot of things. Need a short break though to pump out some other things. As I learn more about Lucene and JavaCC I will incorporate new methods into the parser. - Mark On 9/1/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > > On Friday 01 September 2006 12:54, Mark Miller wrote: > > > Hi Paul, > > > > I also have to treat things differently depending on if I am in a > > proximity clause or boolean clause. A wildcard in a boolean is mapped > to > > a wildcard query. A wildcard in a proximity is mapped to a regex span > > that has been modified to only deal with * and ?. When I run into a > > proximity, I collect a small tree of each clause and distribute them > > against each other...(old | map) ~3 big gets distributed to old ~3 big > | > > map ~3 big. This distribution method appears to handle all > > There is no need to repeat "big". SpanQueries can be nested, > so when mapping like this: > SpanNear(SpanOr( old, map), big) > the query structure will only grow for truncations and fuzzy stuff. > > > boolean/proximity nesting/mixing cases for me, including: great ! "big > > old phrase search" ~5 (holy ~4 (big black bear)). The distribution > > maintains order of operations, but also obviously can create some > pretty > > large queries. > > Regards, > Paul Elschot > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >