Re: Negative OR in fq field not working as expected
The solr 'lucene' query parser (that's being used there, in an fq) sometimes has trouble with pure negative clauses in an OR. Even though it can handle pure negative queries like -type:foo, it has trouble with pure negative in an OR like you are doing. At least in 1.4.1, don't know if it's been improved in 3.1. I _think_ you may have a case it has trouble with. This is what I do instead, to rewrite the query to mean the same thing but not give the lucene query parser trouble: fq=( (*:* AND -type:foo) OR restriction_id:1) *:* means everything, so (*:* AND -type:foo) means the same thing as just -type:foo, but can get around the lucene query parsers troubles. So that might work for you. Dismax has even WORSE problems with pure negative, with no easy way to get around em, so switching to dismax is probably not helpful there. On 4/25/2011 4:27 PM, Simon Wistow wrote: I have a field 'type' that has several values. If it's type 'foo' then it also has a field 'restriction_id'. What I want is a filter query which says either it's not a 'foo' or if it is then it has the restriction '1' I expect two matches - one of type 'bar' and one of type 'foo' Neither fq=(-type:foo OR restriction_id:1) fq={!dismax q.op=OR}-type:foo restriction_id:1 produce any results. fq=restriction_id:1 gets the 'foo' typed result. fq=type:bar get the 'bar' typed result. Either of these fq=type:[* TO *] OR (type:foo AND restriction_id:1) fq=type:(bar OR quux OR fleeg) OR restriction_id:1 do work but are very, very slow to the point of unusability (our indexes are pretty large). Searching round it seems like other people have experienced similar issues and the answer has been Lucene just doesn't work like that When dealing with Lucene people are strongly encouraged to think in terms of MUST, MUST_NOT and SHOULD (which are represented in the query parser as the prefixes +, - and the default) instead of in terms of AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's QueryParser) is not a strict Boolean Logic system, so it's best not to try and think of it like one. http://wiki.apache.org/lucene-java/BooleanQuerySyntax Am I just out of luck? Might edismax help here? Simon
Re: Negative OR in fq field not working as expected
On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said: This is what I do instead, to rewrite the query to mean the same thing but not give the lucene query parser trouble: fq=( (*:* AND -type:foo) OR restriction_id:1) *:* means everything, so (*:* AND -type:foo) means the same thing as just -type:foo, but can get around the lucene query parsers troubles. So that might work for you. Thanks for confirming my suspicions. Unfortunately I've tried that as well and, whilst it works it's also unbelievably slow (~30s query time). Would writing my own Query Parser help here? Simon
Re: Negative OR in fq field not working as expected
On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow si...@thegestalt.org wrote: On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said: This is what I do instead, to rewrite the query to mean the same thing but not give the lucene query parser trouble: fq=( (*:* AND -type:foo) OR restriction_id:1) *:* means everything, so (*:* AND -type:foo) means the same thing as just -type:foo, but can get around the lucene query parsers troubles. So that might work for you. Thanks for confirming my suspicions. Unfortunately I've tried that as well and, whilst it works it's also unbelievably slow (~30s query time). It really shouldn't be that slow... how many documents are in your index, and how many match -type:foo? bq. Would writing my own Query Parser help here? Nope. That's just syntax. If filters of the form ( (*:* AND -type:foo) OR restriction_id:1) are much slower (to the point where it causes you problems) and filters of the form type:foo) OR restriction_id:1 are fast, then you could index the negation of the type field as well (if you know all the types) For instance, in a doc, index two type fields: type:bar type_not:foo Or if type is multi-valued, you could index both foo and NOT_foo in the same field. Then you could express the filter as type_not:foo OR restriction_id:1 or type:NOT_foo OR restriction_id:1 -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Negative OR in fq field not working as expected
Yeah, I do the (*:* AND -type:foo) OR something:else thing on my own pretty big index, and it's not slow at all. At least no slower than doing any other X OR Y where X and Y both include lots of results. Pre-warming the field cache for, in this case, the 'type' field may help. Same as it would if 'X' were just type:bar (not negated) where type:bar matched about the same number or documents as -type:foo does in your case. In general, there's nothing special that should make that slow, it's a pretty ordinary query, really. Just using weird syntax to get around lucene query parser issues. [Obligatory mention: This may have nothing to do with your issue, but I have found occasions where not having enough RAM allocated to Solr 1.4.1 can make things terribly slow, even though there is no OutOfMemory error or other error in the logs. Especially if you are doing facetting and/or StatsComponent. Excaserbated if you are using the default JVM GC strategies instead of picking some of the concurrent strategies.] On 4/25/2011 5:02 PM, Yonik Seeley wrote: On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistowsi...@thegestalt.org wrote: On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said: This is what I do instead, to rewrite the query to mean the same thing but not give the lucene query parser trouble: fq=( (*:* AND -type:foo) OR restriction_id:1) *:* means everything, so (*:* AND -type:foo) means the same thing as just -type:foo, but can get around the lucene query parsers troubles. So that might work for you. Thanks for confirming my suspicions. Unfortunately I've tried that as well and, whilst it works it's also unbelievably slow (~30s query time). It really shouldn't be that slow... how many documents are in your index, and how many match -type:foo? bq. Would writing my own Query Parser help here? Nope. That's just syntax. If filters of the form ( (*:* AND -type:foo) OR restriction_id:1) are much slower (to the point where it causes you problems) and filters of the form type:foo) OR restriction_id:1 are fast, then you could index the negation of the type field as well (if you know all the types) For instance, in a doc, index two type fields: type:bar type_not:foo Or if type is multi-valued, you could index both foo and NOT_foo in the same field. Then you could express the filter as type_not:foo OR restriction_id:1 or type:NOT_foo OR restriction_id:1 -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Negative OR in fq field not working as expected
On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said: It really shouldn't be that slow... how many documents are in your index, and how many match -type:foo? Total number of docs is 161,000,000 type:foo 39,000,000 -type:foo 122,200,000 type:bar 90,000,000 We're aware it's large and we're in the process or splitting the index up but I was just hoping that there was a workaround I could use in order to reclaim some performance.