Re: Negative OR in fq field not working as expected

2011-04-25 Thread Jonathan Rochkind
The solr 'lucene' query parser (that's being used there, in an fq) 
sometimes has trouble with pure negative clauses in an OR.


Even though it can handle pure negative queries like -type:foo, it 
has trouble with pure negative in an OR like you are doing. At least in 
1.4.1, don't know if it's been improved in 3.1.  I _think_ you may have 
a case it has trouble with.


This is what I do instead, to rewrite the query to mean the same thing 
but not give the lucene query parser trouble:


fq=( (*:* AND -type:foo) OR restriction_id:1)

*:* means everything, so (*:* AND -type:foo) means the same thing as 
just -type:foo, but can get around the lucene query parsers troubles.


So that might work for you.

Dismax has even WORSE problems with pure negative, with no easy way to 
get around em, so switching to dismax is probably not helpful there.


On 4/25/2011 4:27 PM, Simon Wistow wrote:

I have a field 'type' that has several values. If it's type 'foo' then
it also has a field 'restriction_id'.

What I want is a filter query which says either it's not a 'foo' or if
it is then it has the restriction '1'

I expect two matches - one of type 'bar' and one of type 'foo'

Neither

  fq=(-type:foo OR restriction_id:1)
  fq={!dismax q.op=OR}-type:foo restriction_id:1

produce any results.

  fq=restriction_id:1

gets the 'foo' typed result.

  fq=type:bar

get the 'bar' typed result.

Either of these

   fq=type:[* TO *] OR (type:foo AND restriction_id:1)
   fq=type:(bar OR quux OR fleeg) OR restriction_id:1

do work but are very, very slow to the point of unusability (our indexes
are pretty large).

Searching round it seems like other people have experienced similar
issues and the answer has been Lucene just doesn't work like that

When dealing with Lucene people are strongly encouraged to think in
terms of MUST, MUST_NOT and SHOULD (which are represented in the query
parser as the prefixes +, - and the default) instead of in terms of
AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's
QueryParser) is not a strict Boolean Logic system, so it's best not to
try and think of it like one.

   http://wiki.apache.org/lucene-java/BooleanQuerySyntax

Am I just out of luck? Might edismax help here?

Simon









Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
 This is what I do instead, to rewrite the query to mean the same thing but 
 not give the lucene query parser trouble:
 
 fq=( (*:* AND -type:foo) OR restriction_id:1)
 
 *:* means everything, so (*:* AND -type:foo) means the same thing as 
 just -type:foo, but can get around the lucene query parsers troubles.
 
 So that might work for you.

Thanks for confirming my suspicions.

Unfortunately I've tried that as well and, whilst it works 
it's also unbelievably slow (~30s query time).

Would writing my own Query Parser help here?

Simon






Re: Negative OR in fq field not working as expected

2011-04-25 Thread Yonik Seeley
On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow si...@thegestalt.org wrote:
 On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
 This is what I do instead, to rewrite the query to mean the same thing but
 not give the lucene query parser trouble:

 fq=( (*:* AND -type:foo) OR restriction_id:1)

 *:* means everything, so (*:* AND -type:foo) means the same thing as
 just -type:foo, but can get around the lucene query parsers troubles.

 So that might work for you.

 Thanks for confirming my suspicions.

 Unfortunately I've tried that as well and, whilst it works
 it's also unbelievably slow (~30s query time).

It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?

bq. Would writing my own Query Parser help here?

Nope.  That's just syntax.

If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)

For instance, in a doc, index two type fields:
type:bar
type_not:foo

Or if type is multi-valued, you could index both foo and NOT_foo in
the same field.

Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Negative OR in fq field not working as expected

2011-04-25 Thread Jonathan Rochkind

Yeah, I do the (*:* AND -type:foo) OR something:else

thing on my own pretty big index, and it's not slow at all.  At least no 
slower than doing any other X OR Y where X and Y both include lots of 
results.


Pre-warming the field cache for, in this case, the 'type' field may 
help. Same as it would if 'X' were just type:bar (not negated) where 
type:bar matched about the same number or documents as -type:foo 
does in your case.  In general, there's nothing special that should make 
that slow, it's a pretty ordinary query, really. Just using weird syntax 
to get around lucene query parser  issues.


[Obligatory mention: This may have nothing to do with your issue, but I 
have found occasions where not having enough RAM allocated to Solr 1.4.1 
can make things terribly slow, even though there is no OutOfMemory error 
or other error in the logs. Especially if you are doing facetting and/or 
StatsComponent.  Excaserbated if you are using the default JVM GC 
strategies instead of picking some of the concurrent strategies.]


On 4/25/2011 5:02 PM, Yonik Seeley wrote:

On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistowsi...@thegestalt.org  wrote:

On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:

This is what I do instead, to rewrite the query to mean the same thing but
not give the lucene query parser trouble:

fq=( (*:* AND -type:foo) OR restriction_id:1)

*:* means everything, so (*:* AND -type:foo) means the same thing as
just -type:foo, but can get around the lucene query parsers troubles.

So that might work for you.

Thanks for confirming my suspicions.

Unfortunately I've tried that as well and, whilst it works
it's also unbelievably slow (~30s query time).

It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?

bq. Would writing my own Query Parser help here?

Nope.  That's just syntax.

If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)

For instance, in a doc, index two type fields:
type:bar
type_not:foo

Or if type is multi-valued, you could index both foo and NOT_foo in
the same field.

Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco



Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said:
 It really shouldn't be that slow... how many documents are in your
 index, and how many match -type:foo?

Total number of docs is 161,000,000

 type:foo  39,000,000
-type:foo 122,200,000 
 type:bar 90,000,000

We're aware it's large and we're in the process or splitting the index 
up but I was just hoping that there was a workaround I could use in 
order to reclaim some performance.