Re: Can anyone explain this Solr query behavior?
Oh, I simply changed the query parser type to lucene, with &defType=lucene and then I see essentially the same error that edismax does when it internally tries to parse the query. But, it might be nice if DEBUG level logging for edismax did display the error as well and then told you what remediation it was performing.. -- Jack Krupansky -Original Message- From: Shankar Sundararaju Sent: Friday, May 24, 2013 1:01 PM To: solr-user@lucene.apache.org Subject: Re: Can anyone explain this Solr query behavior? Hi Jack Krupansky, Thank you for your reply. I would like to know how you got the error logging? Is there any special flag I have to turn on? Because I don't see it in my solr.log even after switching the log level to DEBUG. org.apache.solr.**search.SyntaxError: Cannot parse 'id:* AND text:()': Encountered " ")" ") "" at line 1, column 15. Thanks -Shankar On Thu, May 23, 2013 at 5:41 PM, Jack Krupansky wrote: Okay... sorry I wasn't paying close enough attention. What is happening is that the empty parentheses are illegal in Lucene query syntax: org.apache.solr.**search.SyntaxError: Cannot parse 'id:* AND text:()': Encountered " ")" ") "" at line 1, column 15. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... <NUMBER> ... <TERM> ... "*" ... 400 Edismax traps such errors and then "escapes" the query so that Lucene will no longer throw an error. In this case, it puts quotes around the "AND" operator, which is why you see "and" included in the parsed query as if it were a term. And I believe it turns "text:()" into "text:"()"", which makes the original Lucene error go away, but the "()" analyzes to nothing and generates no term in the query. So, fix your syntax error and the anomaly should go away. -- Jack Krupansky -Original Message- From: Shankar Sundararaju Sent: Thursday, May 23, 2013 7:23 PM To: solr-user@lucene.apache.org Subject: Re: Can anyone explain this Solr query behavior? Hi Erick, Here's the output after turning on the debug flag: *q=text:()&debug=query* yields 0 17 true text:() query text:()<**/str> text:() (+())/no_**coord +(**) **ExtendedDismaxQParser *q=doc-id:3000&debug=query* yields 0 17 doc-id:3000 query : : doc-id:**3000 doc-id:**3000 (+doc-id:**3000)/no_coord +**doc-id:`#8;#0;#0;#23;8 **ExtendedDismaxQParser *q=doc-id:3000 AND text:()&debug=query* yields 0 23 doc-id:3000 AND text:() query : : : : : : doc-id:**3000 AND text:() doc-id:3000 AND text:() (+(doc-id:3000 DisjunctionMaxQuery((**Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) **ExtendedDismaxQParser *solrconfig.xml:* explicit 10 text edismax text^1.0 Title^3.0 Classification^2.0 Contributors^2.0 Publisher^2.0 *schema.xml:* * * * * *Note:* MyAnalyzer among few other customizations, uses WhitespaceTokenizer and LoweCaseFilter Thanks a lot. -Shankar On Thu, May 23, 2013 at 4:34 AM, Erick Erickson * *wrote: Please post the results of adding &debug=query to the URL. That'll tell us what the query parser spits out which is much easier to analyze. Best Erick On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju wrote: > This query returns 0 documents: *q=(+Title:() +Classification:() > +Contributors:() +text:())* > > This returns 1 document: *q=doc-id:3000* > > And this returns 631580 documents when I was expecting 0: > *q=doc-id:3000 > AND (+Title:() +Classification:() +Contributors:() +text:())* > > Am I missing something here? Can someone please explain? I am using > Solr > 4.2.1 > > Thanks > -Shankar -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c) -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: Can anyone explain this Solr query behavior?
Hi Jack Krupansky, Thank you for your reply. I would like to know how you got the error logging? Is there any special flag I have to turn on? Because I don't see it in my solr.log even after switching the log level to DEBUG. org.apache.solr.**search.SyntaxError: Cannot parse 'id:* AND text:()': Encountered " ")" ") "" at line 1, column 15. Thanks -Shankar On Thu, May 23, 2013 at 5:41 PM, Jack Krupansky wrote: > Okay... sorry I wasn't paying close enough attention. What is happening is > that the empty parentheses are illegal in Lucene query syntax: > > org.apache.solr.**search.SyntaxError: Cannot parse 'id:* > AND text:()': Encountered " ")" ") "" at line 1, column 15. > Was expecting one of: ><NOT> ... >"+" ... >"-" ... ><BAREOPER> ... >"(" ... >"*" ... ><QUOTED> ... ><TERM> ... ><PREFIXTERM> ... ><WILDTERM> ... ><REGEXPTERM> ... >"[" ... >"{" ... ><LPARAMS> ... ><NUMBER> ... ><TERM> ... >"*" ... > > 400 > > Edismax traps such errors and then "escapes" the query so that Lucene will > no longer throw an error. In this case, it puts quotes around the "AND" > operator, which is why you see "and" included in the parsed query as if it > were a term. And I believe it turns "text:()" into "text:"()"", which makes > the original Lucene error go away, but the "()" analyzes to nothing and > generates no term in the query. > > So, fix your syntax error and the anomaly should go away. > > -- Jack Krupansky > > -Original Message- From: Shankar Sundararaju > Sent: Thursday, May 23, 2013 7:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Can anyone explain this Solr query behavior? > > > Hi Erick, > > Here's the output after turning on the debug flag: > > *q=text:()&debug=query* > > >yields > > > > 0 > 17 > > true > text:() > query > > > > > text:()<**/str> > text:() > (+())/no_**coord > +(**) > **ExtendedDismaxQParser > > > > > > > > *q=doc-id:3000&debug=query* > > >yields > > > > 0 > 17 > > doc-id:3000 > query > > > > > : > : > > > > doc-id:**3000 > doc-id:**3000 > (+doc-id:**3000)/no_coord > +**doc-id:`#8;#0;#0;#23;8 > **ExtendedDismaxQParser > > > > > > > > *q=doc-id:3000 AND text:()&debug=query* > > yields > > > > 0 > 23 > > doc-id:3000 AND text:() > query > > > > > : > > : > > > : > > > : > > > : > > > : > > > > doc-id:**3000 AND text:() > doc-id:3000 AND text:() > > (+(doc-id:3000 DisjunctionMaxQuery((**Publisher:and^2.0 | text:and | > Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord > > > +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | > Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) > > **ExtendedDismaxQParser > > > > > > > > *solrconfig.xml:* > > > > explicit > 10 > text > edismax > text^1.0 Title^3.0 Classification^2.0 > Contributors^2.0 Publisher^2.0 > > > *schema.xml:* > > "false"/>* > * > > multiValued="false"/> > class="MyAnalyzer"/> type="multiterm" class="MyAnalyzer"/> > * > * > *Note:* MyAnalyzer among few other customizations, uses WhitespaceTokenizer > > and LoweCaseFilter > > Thanks a lot. > > -Shankar > > > On Thu, May 23, 2013 at 4:34 AM, Erick Erickson * > *wrote: > > Please post the results of adding &debug=query to the URL. >> That'll tell us what the query parser spits out which is much >> easier to analyze. >> >> Best >> Erick >> >> On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju >> wrote: >> > This query returns 0 documents: *q=(+Title:() +Classification:() >> > +Contributors:() +text:())* >> > >> > This returns 1 document: *q=doc-id:3000* >> > >> > And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 >> > AND (+Title:() +Classification:() +Contributors:() +text:())* >> > >> > Am I missing something here? Can someone please explain? I am using Solr >> > 4.2.1 >> > >> > Thanks >> > -Shankar >> >> > > > -- > Regards, > *Shankar Sundararaju > *Sr. Software Architect > > ebrary, a ProQuest company > 410 Cambridge Avenue, Palo Alto, CA 94306 USA > shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c) > -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: Can anyone explain this Solr query behavior?
Hi Upayavira, Thank you for your analysis. I thought 'AND' & groupings are supported as per documentation: http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html#Grouping But yes, q=doc-id:3000 AND (-text:[* TO *]) works as expected. Thanks -Shankar On Thu, May 23, 2013 at 5:31 PM, Upayavira wrote: > (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | > Classification:and^2.0 | Contributors:and^2.0 | > Title:and^3.0/no_coord > > You're using edismax, not lucene. So AND is being considered as a search > term, not an operator, and the word 'and' probably exists in 631580 > documents. > > Why is it triggering dismax? Probably because field:() is not valid > syntax, so edismax is dropping to dismax because it isn't a valid lucene > query. > > What do you expect text:() to do? > > If you want to match any docs that have a value in the text field, use > q=text:[* TO *] > > To match docs that *don't* have a value in the text field: q=-text[* TO > *] > > Upayavira > > On Fri, May 24, 2013, at 12:23 AM, Shankar Sundararaju wrote: > > Hi Erick, > > > > Here's the output after turning on the debug flag: > > > > *q=text:()&debug=query* > > > > yields > > > > > > > > 0 > > 17 > > > > true > > text:() > > query > > > > > > > > > > text:() > > text:() > > (+())/no_coord > > +() > > ExtendedDismaxQParser > > > > > > > > > > > > > > > > *q=doc-id:3000&debug=query* > > > > yields > > > > > > > > 0 > > 17 > > > > doc-id:3000 > > query > > > > > > > > > > : > > : > > > > > > > > doc-id:3000 > > doc-id:3000 > > (+doc-id:3000)/no_coord > > +doc-id:`#8;#0;#0;#23;8 > > ExtendedDismaxQParser > > > > > > > > > > > > > > > > *q=doc-id:3000 AND text:()&debug=query* > > > > yields > > > > > > > > 0 > > 23 > > > > doc-id:3000 AND text:() > > query > > > > > > > > > > : > > > > : > > > > > > : > > > > > > : > > > > > > : > > > > > > : > > > > > > > > doc-id:3000 AND text:() > > doc-id:3000 AND text:() > > > > (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | > > Classification:and^2.0 | Contributors:and^2.0 | > > Title:and^3.0/no_coord > > > > > > +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | > > Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) > > > > ExtendedDismaxQParser > > > > > > > > > > > > > > > > *solrconfig.xml:* > > > > > >explicit > >10 > >text > >edismax > >text^1.0 Title^3.0 Classification^2.0 > > Contributors^2.0 Publisher^2.0 > > > > > > *schema.xml:* > > > "false"/>* > > * > > > multiValued="false"/> > > > class="MyAnalyzer"/> > > > type="multiterm" class="MyAnalyzer"/> > > * > > * > > *Note:* MyAnalyzer among few other customizations, uses > > WhitespaceTokenizer > > and LoweCaseFilter > > > > Thanks a lot. > > > > -Shankar > > > > > > On Thu, May 23, 2013 at 4:34 AM, Erick Erickson > > wrote: > > > > > Please post the results of adding &debug=query to the URL. > > > That'll tell us what the query parser spits out which is much > > > easier to analyze. > > > > > > Best > > > Erick > > > > > > On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju > > > wrote: > > > > This query returns 0 documents: *q=(+Title:() +Classification:() > > > > +Contributors:() +text:())* > > > > > > > > This returns 1 document: *q=doc-id:3000* > > > > > > > > And this returns 631580 documents when I was expecting 0: > *q=doc-id:3000 > > > > AND (+Title:() +Classification:() +Contributors:() +text:())* > > > > > > > > Am I missing something here? Can someone please explain? I am using > Solr > > > > 4.2.1 > > > > > > > > Thanks > > > > -Shankar > > > > > > > > > > > -- > > Regards, > > *Shankar Sundararaju > > *Sr. Software Architect > > ebrary, a ProQuest company > > 410 Cambridge Avenue, Palo Alto, CA 94306 USA > > shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057(c) > -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: Can anyone explain this Solr query behavior?
Okay... sorry I wasn't paying close enough attention. What is happening is that the empty parentheses are illegal in Lucene query syntax: org.apache.solr.search.SyntaxError: Cannot parse 'id:* AND text:()': Encountered " ")" ") "" at line 1, column 15. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... <NUMBER> ... <TERM> ... "*" ... 400 Edismax traps such errors and then "escapes" the query so that Lucene will no longer throw an error. In this case, it puts quotes around the "AND" operator, which is why you see "and" included in the parsed query as if it were a term. And I believe it turns "text:()" into "text:"()"", which makes the original Lucene error go away, but the "()" analyzes to nothing and generates no term in the query. So, fix your syntax error and the anomaly should go away. -- Jack Krupansky -Original Message- From: Shankar Sundararaju Sent: Thursday, May 23, 2013 7:23 PM To: solr-user@lucene.apache.org Subject: Re: Can anyone explain this Solr query behavior? Hi Erick, Here's the output after turning on the debug flag: *q=text:()&debug=query* yields 0 17 true text:() query text:() text:() (+())/no_coord +() ExtendedDismaxQParser *q=doc-id:3000&debug=query* yields 0 17 doc-id:3000 query : : doc-id:3000 doc-id:3000 (+doc-id:3000)/no_coord +doc-id:`#8;#0;#0;#23;8 ExtendedDismaxQParser *q=doc-id:3000 AND text:()&debug=query* yields 0 23 doc-id:3000 AND text:() query : : : : : : doc-id:3000 AND text:() doc-id:3000 AND text:() (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) ExtendedDismaxQParser *solrconfig.xml:* explicit 10 text edismax text^1.0 Title^3.0 Classification^2.0 Contributors^2.0 Publisher^2.0 *schema.xml:* * * * * *Note:* MyAnalyzer among few other customizations, uses WhitespaceTokenizer and LoweCaseFilter Thanks a lot. -Shankar On Thu, May 23, 2013 at 4:34 AM, Erick Erickson wrote: Please post the results of adding &debug=query to the URL. That'll tell us what the query parser spits out which is much easier to analyze. Best Erick On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju wrote: > This query returns 0 documents: *q=(+Title:() +Classification:() > +Contributors:() +text:())* > > This returns 1 document: *q=doc-id:3000* > > And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 > AND (+Title:() +Classification:() +Contributors:() +text:())* > > Am I missing something here? Can someone please explain? I am using Solr > 4.2.1 > > Thanks > -Shankar -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: Can anyone explain this Solr query behavior?
(+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord You're using edismax, not lucene. So AND is being considered as a search term, not an operator, and the word 'and' probably exists in 631580 documents. Why is it triggering dismax? Probably because field:() is not valid syntax, so edismax is dropping to dismax because it isn't a valid lucene query. What do you expect text:() to do? If you want to match any docs that have a value in the text field, use q=text:[* TO *] To match docs that *don't* have a value in the text field: q=-text[* TO *] Upayavira On Fri, May 24, 2013, at 12:23 AM, Shankar Sundararaju wrote: > Hi Erick, > > Here's the output after turning on the debug flag: > > *q=text:()&debug=query* > > yields > > > > 0 > 17 > > true > text:() > query > > > > > text:() > text:() > (+())/no_coord > +() > ExtendedDismaxQParser > > > > > > > > *q=doc-id:3000&debug=query* > > yields > > > > 0 > 17 > > doc-id:3000 > query > > > > > : > : > > > > doc-id:3000 > doc-id:3000 > (+doc-id:3000)/no_coord > +doc-id:`#8;#0;#0;#23;8 > ExtendedDismaxQParser > > > > > > > > *q=doc-id:3000 AND text:()&debug=query* > > yields > > > > 0 > 23 > > doc-id:3000 AND text:() > query > > > > > : > > : > > > : > > > : > > > : > > > : > > > > doc-id:3000 AND text:() > doc-id:3000 AND text:() > > (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | > Classification:and^2.0 | Contributors:and^2.0 | > Title:and^3.0/no_coord > > > +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | > Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) > > ExtendedDismaxQParser > > > > > > > > *solrconfig.xml:* > > >explicit >10 >text >edismax >text^1.0 Title^3.0 Classification^2.0 > Contributors^2.0 Publisher^2.0 > > > *schema.xml:* > "false"/>* > * > multiValued="false"/> > class="MyAnalyzer"/> > type="multiterm" class="MyAnalyzer"/> > * > * > *Note:* MyAnalyzer among few other customizations, uses > WhitespaceTokenizer > and LoweCaseFilter > > Thanks a lot. > > -Shankar > > > On Thu, May 23, 2013 at 4:34 AM, Erick Erickson > wrote: > > > Please post the results of adding &debug=query to the URL. > > That'll tell us what the query parser spits out which is much > > easier to analyze. > > > > Best > > Erick > > > > On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju > > wrote: > > > This query returns 0 documents: *q=(+Title:() +Classification:() > > > +Contributors:() +text:())* > > > > > > This returns 1 document: *q=doc-id:3000* > > > > > > And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 > > > AND (+Title:() +Classification:() +Contributors:() +text:())* > > > > > > Am I missing something here? Can someone please explain? I am using Solr > > > 4.2.1 > > > > > > Thanks > > > -Shankar > > > > > > -- > Regards, > *Shankar Sundararaju > *Sr. Software Architect > ebrary, a ProQuest company > 410 Cambridge Avenue, Palo Alto, CA 94306 USA > shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: Can anyone explain this Solr query behavior?
Hi Erick, Here's the output after turning on the debug flag: *q=text:()&debug=query* yields 0 17 true text:() query text:() text:() (+())/no_coord +() ExtendedDismaxQParser *q=doc-id:3000&debug=query* yields 0 17 doc-id:3000 query : : doc-id:3000 doc-id:3000 (+doc-id:3000)/no_coord +doc-id:`#8;#0;#0;#23;8 ExtendedDismaxQParser *q=doc-id:3000 AND text:()&debug=query* yields 0 23 doc-id:3000 AND text:() query : : : : : : doc-id:3000 AND text:() doc-id:3000 AND text:() (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) ExtendedDismaxQParser *solrconfig.xml:* explicit 10 text edismax text^1.0 Title^3.0 Classification^2.0 Contributors^2.0 Publisher^2.0 *schema.xml:* * * * * *Note:* MyAnalyzer among few other customizations, uses WhitespaceTokenizer and LoweCaseFilter Thanks a lot. -Shankar On Thu, May 23, 2013 at 4:34 AM, Erick Erickson wrote: > Please post the results of adding &debug=query to the URL. > That'll tell us what the query parser spits out which is much > easier to analyze. > > Best > Erick > > On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju > wrote: > > This query returns 0 documents: *q=(+Title:() +Classification:() > > +Contributors:() +text:())* > > > > This returns 1 document: *q=doc-id:3000* > > > > And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 > > AND (+Title:() +Classification:() +Contributors:() +text:())* > > > > Am I missing something here? Can someone please explain? I am using Solr > > 4.2.1 > > > > Thanks > > -Shankar > -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: Can anyone explain this Solr query behavior?
Please post the results of adding &debug=query to the URL. That'll tell us what the query parser spits out which is much easier to analyze. Best Erick On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju wrote: > This query returns 0 documents: *q=(+Title:() +Classification:() > +Contributors:() +text:())* > > This returns 1 document: *q=doc-id:3000* > > And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 > AND (+Title:() +Classification:() +Contributors:() +text:())* > > Am I missing something here? Can someone please explain? I am using Solr > 4.2.1 > > Thanks > -Shankar
Can anyone explain this Solr query behavior?
This query returns 0 documents: *q=(+Title:() +Classification:() +Contributors:() +text:())* This returns 1 document: *q=doc-id:3000* And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 AND (+Title:() +Classification:() +Contributors:() +text:())* Am I missing something here? Can someone please explain? I am using Solr 4.2.1 Thanks -Shankar