Ah, this is a fun one.... lots of fiddly issues with how queries work and how QueryParser works. I'll take a stab at some of these inline below....

On Monday, September 22, 2003, at 08:26 PM, Dan Quaroni wrote:
I have a simple command line interface for testing.

Interesting interface. Looks like something that if made generic enough would be handy to have at least in the sandbox.


  I'm getting some odd
results, though, with certain logic of wildcard searches.

not all your queries are truly "WildcardQuery"'s though. look at the class it constructed to get a better idea of what is happening.


It seems like
depending on what order I put the fields of the query in alters the results
drastically when I AND them together.

Not quite the right explanation of what is happening. More below....


***************************
This one makes sense

Query> name:amb*
State> california
name:amb*
[EMAIL PROTECTED]
amb*
2819 total matching documents

Right.... QueryParser does a little optimization here and anything with a simple trailing * turns into a PrefixQuery, meaning all name fields that begin with "amb".


***************************
This is the REALLY confusing one.  We know there's a company named AMB
Property Corporation.  Why do I get NO hits?

Query> name:"amb prop*"
State> california
name:"amb prop*"
[EMAIL PROTECTED]
"amb prop"
0 total matching documents

Notice you're now in PhraseQuery land. Wildcards don't work like you seem to expect here. What is really happening here is a query for documents that have "amb" and "prop" terms side by side in that order. The asterisk got axed by the analyzer. If you said "name:amb name:prop*" you'd get some hits I believe, as it would turn into a boolean query with a term and wildcard queries either OR'd or AND'd together. PhraseQuery does not support wildcards. A custom subclass of QueryParser could do some interesting things here and expand wildcard-like terms like this in a phrase into PhrasePrefixQuery, but that is probably overkill here (although maybe not). Look at the test case for PhrasePrefixQuery for some hints.


Ok, so I get some results with this (I know the * isn't neccessary at the
end of property, but bear with me for the next example where it goes all
screwy)


Query> name:amb property*
State> california
name:amb property*
[EMAIL PROTECTED]
amb name:amb property*:property*
56 total matching documents

your default field for QueryParser is "property*"? Odd field name, or is the output fishy? I'm a bit confused by the "property*:" there. I'm assuming you're outputting the Query.toString here.


See above for a different way to phrase the query.

***************************
south san francisco is an exact match to the city. Why does this find 0
results??!


Query> name:amb property* AND city:south san francisco
State> california
name:amb property* AND city:south san francisco
[EMAIL PROTECTED]
amb +name:amb property* AND city:south san francisco:property* +city:south
name:
amb property* AND city:south san francisco:san name:amb property* AND
city:south
san francisco:francisco
0 total matching documents

with all the AND's going on, this makes sense because "san" and "francisco" end up as separate term queries. you'd have to say city:"south san francisco" to turn it into a PhraseQuery.


****************************
Do this and suddenly I get matches

Query> name:amb propert* and city:"south san fran*"
State> california
name:amb propert* and city:"south san fran*"
[EMAIL PROTECTED]
amb name:amb propert* and city:"south san fran*":propert* city:"south san
fran"56 total matching documents

you're getting hits on the wildcard match at least, and probably on name field "amb" as well. again, phrase queries don't support wildcards like you've done here with "south san fran*" so you're not matching anything with that.


*****************************
And look, this gets matches too:

Query> name:"amb propert*" and city:"south san*"
State> california
name:"amb propert*" and city:"south san*"
[EMAIL PROTECTED]
"amb propert" city:"south san"
10732 total matching documents

my guess here is you're getting hits on "south san" as a phrase query. are there that many in that area?


*****************************
Yet do this and we're back to 0 results:

Query> name:"amb propert*" and city:"south san fran*"
State> california
name:"amb propert*" and city:"south san fran*"
[EMAIL PROTECTED]
"amb propert" city:"south san fran"
0 total matching documents

you're getting zero hits from "amb propert*" since * is getting stripped by the analyzer and there is no "amb propert" phrase match, and with the AND (which should be all uppercase, right?) definitely not getting hits.


******************************
Now flip the query around and it works:

Query> city:"south san fran*" and name:amb propert*
State> california
city:"south san fran*" and name:amb propert*
[EMAIL PROTECTED]
city:"south san fran" amb city:"south san fran*" and name:amb
propert*:propert*
56 total matching documents

You didn't quite flip it around, you took off some quotes too, which removed a PhraseQuery and you're getting your hits from name:amb here as well as probably the wildcard of propert*. I'm still confused by the output of propert*: here - are you using the CVS version of Lucene? the toString looks ok there, maybe there was a bug in that method in earlier code?


*******************************
Finally, using the prefix of the metaphone name with quotes around it
produces no results:

Query> metaph_name:"ambprp*"
State> california
metaph_name:"ambprp*"
[EMAIL PROTECTED]
metaph_name:ambprp
0 total matching documents

Notice this is a TermQuery - thats the clue... the asterisk is taken literally there, so no matches.


*******************************
But take away the quotes and it works:

Query> metaph_name:ambprp*
State> california
metaph_name:ambprp*
[EMAIL PROTECTED]
metaph_name:ambprp*
6 total matching documents

Now you kicked it into an optimized wildcard query, which turns into a prefix query, hence the matches.


********************************
But quotes don't seem to matter in this complex wildcard:

Query> metaph_name:ambprp* and city:"sou* or san or fra*"
State> california
metaph_name:ambprp* and city:"sou* or san or fra*"
[EMAIL PROTECTED]
metaph_name:ambprp* city:"sou san fra"
6 total matching documents

your clue here is that the toString output has the asterisks removed, so your analyzer stripped them. again quotes mean phrase query. phrase queries don't support wildcards.


So... Can someone help me nail down the logic for these things so we can
construct some good queries?

I hope my above analysis helps. I may not be perfectly right on everything, but should be relatively close at identifying the issues. Fixing it is more up to how you want to deal with it. Perhaps a custom QueryParser is more what you're after.


Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to