Re: Problem with queries that includes NOT

2015-02-26 Thread david . davila
Hi,

I thought that we were using the edismax query parser, but it seems that 
we had configured the dismax parser.
I have made some tests with the edismax parser and it works fine, so I'll 
change it in our production Solr.

Regards,

David Dávila
DIT - 915828763




De: Alvaro Cabrerizo topor...@gmail.com
Para:   solr-user@lucene.apache.org solr-user@lucene.apache.org, 
Fecha:  25/02/2015 16:41
Asunto: Re: Problem with queries that includes NOT



Hi,

The edismax parser should be able to manage the query you want to ask. 
I've
made a test and the next both queries give me the right result (see the
parenthesis):

   - {!edismax}(NOT id:7 AND NOT id:8  AND id:9)   (gives 1 
hit
   the id:9)
   - {!edismax}((NOT id:7 AND NOT id:8)  AND id:9) (gives 1 
hit
   the id:9)

In general, the issue appears when using the lucene query parser mixing
different boolean clauses (including NOT). Thus, as you commented, the 
next
queries gives different result


   - NOT id:7 AND NOT id:8  AND id:9   (gives 1 hit the
   id:9)
   - (NOT id:7 AND NOT id:8)  AND id:9 (gives 0 hits when
   expecting 1 )

Since I read the chapter Limitations of prohibited clauses in 
sub-queries
from the Apache Solr 3 Enterprise Search Server many years ago,  I 
always
add the *all documents query clause *:**  to the negative clauses to avoid
the problem you mentioned. Thus I will recommend to rewrite the query you
showed us as:

   - (**:*: AND* NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND
   sys_FileType:PROTOTIPE
   - (NOT id:7 AND NOT id:8 *AND *:**)  AND id:9 (gives 1 hit
   as expected)

The above query can be read then as give me all the documents except those
having ID01 and PDF_TEXT and having PROTOTIPE

Regards.




On Wed, Feb 25, 2015 at 1:23 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
  We have problems with some queries. All of them include the tag NOT, 
and
  in my opinion, the results don´t make any sense.
 
  First problem:
 
  This query  NOT Proc:ID01returns   95806 results, however this 
one
 
  NOT Proc:ID01 OR FileType:PDF_TEXT returns  11484 results. But it's
  impossible that adding a tag OR the query has less number of results.
 
  Second problem. Here the problem is because of the brackets and the 
NOT
  tag:
 
   This query:
 
  (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE
  returns 0 documents.
 
  But this query:
 
  (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE)
  returns 53 documents, which is correct. So, the problem is the 
position
 of
  the bracket. I have checked the same query without NOTs, and it works
 fine
  returning the same number of results in both cases.  So, I think the
  problem is the combination of the bracket positions and the NOT tag.

 For the first query, there is a difference between NOT condition1 OR
 condition2 and NOT (condition1 OR condition2) ... I can imagine the
 first one increasing the document count compared to just NOT
 condition1 ... the second one wouldn't increase it.

 Boolean queries in Solr (and very likely Lucene as well) do not always
 do what people expect.

 http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
 https://lucidworks.com/blog/why-not-and-or-and-not/

 As mentioned in the second link above, you'll get better results if you
 use the prefix operators with explicit parentheses.  One word of
 warning, though -- the prefix operators do not work correctly if you
 change the default operator to AND.

 Thanks,
 Shawn





Re: Problem with queries that includes NOT

2015-02-25 Thread Jack Krupansky
As a general proposition, your first stop with any query interpretation
questions should be to add the debigQuery=true parameter and look at the
parsed_query in the query response which shows how the query is really
interpreted.

-- Jack Krupansky

On Wed, Feb 25, 2015 at 8:21 AM, david.dav...@correo.aeat.es wrote:

 Hi Shawn,

 thank you for your quick response. I will read your links and make some
 tests.

 Regards,

 David Dávila
 DIT - 915828763




 De: Shawn Heisey apa...@elyograg.org
 Para:   solr-user@lucene.apache.org,
 Fecha:  25/02/2015 13:23
 Asunto: Re: Problem with queries that includes NOT



 On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
  We have problems with some queries. All of them include the tag NOT, and

  in my opinion, the results don´t make any sense.
 
  First problem:
 
  This query  NOT Proc:ID01returns   95806 results, however this one
 
  NOT Proc:ID01 OR FileType:PDF_TEXT returns  11484 results. But it's
  impossible that adding a tag OR the query has less number of results.
 
  Second problem. Here the problem is because of the brackets and the NOT
  tag:
 
   This query:
 
  (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE
  returns 0 documents.
 
  But this query:
 
  (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE)
  returns 53 documents, which is correct. So, the problem is the position
 of
  the bracket. I have checked the same query without NOTs, and it works
 fine
  returning the same number of results in both cases.  So, I think the
  problem is the combination of the bracket positions and the NOT tag.

 For the first query, there is a difference between NOT condition1 OR
 condition2 and NOT (condition1 OR condition2) ... I can imagine the
 first one increasing the document count compared to just NOT
 condition1 ... the second one wouldn't increase it.

 Boolean queries in Solr (and very likely Lucene as well) do not always
 do what people expect.

 http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
 https://lucidworks.com/blog/why-not-and-or-and-not/

 As mentioned in the second link above, you'll get better results if you
 use the prefix operators with explicit parentheses.  One word of
 warning, though -- the prefix operators do not work correctly if you
 change the default operator to AND.

 Thanks,
 Shawn





Re: Problem with queries that includes NOT

2015-02-25 Thread david . davila
Hi Shawn,

thank you for your quick response. I will read your links and make some 
tests.

Regards,

David Dávila
DIT - 915828763




De: Shawn Heisey apa...@elyograg.org
Para:   solr-user@lucene.apache.org, 
Fecha:  25/02/2015 13:23
Asunto: Re: Problem with queries that includes NOT



On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
 We have problems with some queries. All of them include the tag NOT, and 

 in my opinion, the results don´t make any sense.
 
 First problem:
 
 This query  NOT Proc:ID01returns   95806 results, however this one 

 NOT Proc:ID01 OR FileType:PDF_TEXT returns  11484 results. But it's 
 impossible that adding a tag OR the query has less number of results.
 
 Second problem. Here the problem is because of the brackets and the NOT 
 tag:
 
  This query:
 
 (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE 
 returns 0 documents.
 
 But this query:
 
 (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) 
 returns 53 documents, which is correct. So, the problem is the position 
of 
 the bracket. I have checked the same query without NOTs, and it works 
fine 
 returning the same number of results in both cases.  So, I think the 
 problem is the combination of the bracket positions and the NOT tag.

For the first query, there is a difference between NOT condition1 OR
condition2 and NOT (condition1 OR condition2) ... I can imagine the
first one increasing the document count compared to just NOT
condition1 ... the second one wouldn't increase it.

Boolean queries in Solr (and very likely Lucene as well) do not always
do what people expect.

http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
https://lucidworks.com/blog/why-not-and-or-and-not/

As mentioned in the second link above, you'll get better results if you
use the prefix operators with explicit parentheses.  One word of
warning, though -- the prefix operators do not work correctly if you
change the default operator to AND.

Thanks,
Shawn




Re: Problem with queries that includes NOT

2015-02-25 Thread Shawn Heisey
On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
 We have problems with some queries. All of them include the tag NOT, and 
 in my opinion, the results don´t make any sense.
 
 First problem:
 
 This query  NOT Proc:ID01returns   95806 results, however this one 
 NOT Proc:ID01 OR FileType:PDF_TEXT returns  11484 results. But it's 
 impossible that adding a tag OR the query has less number of results.
 
 Second problem. Here the problem is because of the brackets and the NOT 
 tag:
 
  This query:
 
 (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE 
 returns 0 documents.
 
 But this query:
 
 (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) 
 returns 53 documents, which is correct. So, the problem is the position of 
 the bracket. I have checked the same query without NOTs, and it works fine 
 returning the same number of results in both cases.  So, I think the 
 problem is the combination of the bracket positions and the NOT tag.

For the first query, there is a difference between NOT condition1 OR
condition2 and NOT (condition1 OR condition2) ... I can imagine the
first one increasing the document count compared to just NOT
condition1 ... the second one wouldn't increase it.

Boolean queries in Solr (and very likely Lucene as well) do not always
do what people expect.

http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
https://lucidworks.com/blog/why-not-and-or-and-not/

As mentioned in the second link above, you'll get better results if you
use the prefix operators with explicit parentheses.  One word of
warning, though -- the prefix operators do not work correctly if you
change the default operator to AND.

Thanks,
Shawn



Re: Problem with queries that includes NOT

2015-02-25 Thread Alvaro Cabrerizo
Hi,

The edismax parser should be able to manage the query you want to ask. I've
made a test and the next both queries give me the right result (see the
parenthesis):

   - {!edismax}(NOT id:7 AND NOT id:8  AND id:9)   (gives 1 hit
   the id:9)
   - {!edismax}((NOT id:7 AND NOT id:8)  AND id:9) (gives 1 hit
   the id:9)

In general, the issue appears when using the lucene query parser mixing
different boolean clauses (including NOT). Thus, as you commented, the next
queries gives different result


   - NOT id:7 AND NOT id:8  AND id:9   (gives 1 hit the
   id:9)
   - (NOT id:7 AND NOT id:8)  AND id:9 (gives 0 hits when
   expecting 1 )

Since I read the chapter Limitations of prohibited clauses in sub-queries
from the Apache Solr 3 Enterprise Search Server many years ago,  I always
add the *all documents query clause *:**  to the negative clauses to avoid
the problem you mentioned. Thus I will recommend to rewrite the query you
showed us as:

   - (**:*: AND* NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND
   sys_FileType:PROTOTIPE
   - (NOT id:7 AND NOT id:8 *AND *:**)  AND id:9 (gives 1 hit
   as expected)

The above query can be read then as give me all the documents except those
having ID01 and PDF_TEXT and having PROTOTIPE

Regards.




On Wed, Feb 25, 2015 at 1:23 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
  We have problems with some queries. All of them include the tag NOT, and
  in my opinion, the results don´t make any sense.
 
  First problem:
 
  This query  NOT Proc:ID01returns   95806 results, however this one
 
  NOT Proc:ID01 OR FileType:PDF_TEXT returns  11484 results. But it's
  impossible that adding a tag OR the query has less number of results.
 
  Second problem. Here the problem is because of the brackets and the NOT
  tag:
 
   This query:
 
  (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE
  returns 0 documents.
 
  But this query:
 
  (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE)
  returns 53 documents, which is correct. So, the problem is the position
 of
  the bracket. I have checked the same query without NOTs, and it works
 fine
  returning the same number of results in both cases.  So, I think the
  problem is the combination of the bracket positions and the NOT tag.

 For the first query, there is a difference between NOT condition1 OR
 condition2 and NOT (condition1 OR condition2) ... I can imagine the
 first one increasing the document count compared to just NOT
 condition1 ... the second one wouldn't increase it.

 Boolean queries in Solr (and very likely Lucene as well) do not always
 do what people expect.

 http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
 https://lucidworks.com/blog/why-not-and-or-and-not/

 As mentioned in the second link above, you'll get better results if you
 use the prefix operators with explicit parentheses.  One word of
 warning, though -- the prefix operators do not work correctly if you
 change the default operator to AND.

 Thanks,
 Shawn




Problem with queries that includes NOT

2015-02-25 Thread david . davila
Hello,

We have problems with some queries. All of them include the tag NOT, and 
in my opinion, the results don´t make any sense.

First problem:

This query  NOT Proc:ID01returns   95806 results, however this one 
NOT Proc:ID01 OR FileType:PDF_TEXT returns  11484 results. But it's 
impossible that adding a tag OR the query has less number of results.

Second problem. Here the problem is because of the brackets and the NOT 
tag:

 This query:

(NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE 
returns 0 documents.

But this query:

(NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) 
returns 53 documents, which is correct. So, the problem is the position of 
the bracket. I have checked the same query without NOTs, and it works fine 
returning the same number of results in both cases.  So, I think the 
problem is the combination of the bracket positions and the NOT tag.

This second problem is less important, but the queries comes from a web 
page and I'd have to change it, so I need to know if the problem is Solr 
or not.



This is the part of the scheme that applies:

fieldType name=string class=solr.StrField sortMissingLast=true/



Thank you very much,




David Dávila 

DIT - 915828763