You are not kidding. I keep finding this tougher and tougher. Originally my method worked for most simple queries, but not all. I improved it to cover some more ground, allowing some modestly complex queries, but now even that improvement seems woefully inadequate for solving the general case. I was handling some pretty complex queries, but losing order of operations in a proximity query if things got too exciting. Among other small bugs.

The parse tree handles the boolean stuff fine on its own, but the proximity seems to require a distributed attack (think algebra) that still maintains order of operations. It is a bit nasty.

consider:
cop | fowl & (fowl | priest & man) ! helicopter ~8 (hillary | tom)

this must distribute to (roughly)
cop | ( (fowl & ( (fowl ~8 hillary | (priest ~8 hillary & man ~8 hillary) ) ! helicopter ~8 hillary) | (fowl ~8 tom | (priest ~8 tom & man ~8 tom) ) ! helicopter ~8 tom))

I have gotten close but instead of (fowl | (priest & man)) I might get ((fowl | priest) & man)...a naive distribution will ignore order of ops and order by left to right.

my order of ops:
& "and"
| "or"
~ "within"
! "butnot"
<space between words>

I have a new plan of attack that I have begun, but who knows where it will lead. I thought I was so close, but apparently just a tease...that method could only take me so far. I hate to put so much work into this since I doubt anyone will even use such complex queries (the queries I monitor are always so basic) but I may give it a go just to see if my new idea will solve the general case.

We will see if this parser actually has any life in it. Maybe I am no closer than you where-- I am very new at this.

- Mark
Mark --

Yes please! I'm very interested in the mixing of boolean and proximity
operators. I have also worked on a parser (using JavaCC) but haven't
managed to crack queries such as:

    ((a OR b) AND c) NEAR (d NOT e)

I can get the parse tree okay, but haven't figured out how to translate
that into a valid Lucene Query object. Simple queries such as:

    (a OR b) NEXT (c OR d)   // note the use of OR exclusively!

are okay, but nothing more complex. So: bring it on!

-- Robert
rwatkins at foo-bar.org

On Mon, 21 Aug 2006, Mark Miller wrote:

Is anyone interested in helping me test out a new query parser (i.e is
anyone interested in using this, thereby helping me test it) ?

The parser uses a intermediate parse tree representation, unlike Lucene's
Query Filter.

[ snipped ]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to