Re: Problem with queries that includes NOT
Hi, I thought that we were using the edismax query parser, but it seems that we had configured the dismax parser. I have made some tests with the edismax parser and it works fine, so I'll change it in our production Solr. Regards, David Dávila DIT - 915828763 De: Alvaro Cabrerizo topor...@gmail.com Para: solr-user@lucene.apache.org solr-user@lucene.apache.org, Fecha: 25/02/2015 16:41 Asunto: Re: Problem with queries that includes NOT Hi, The edismax parser should be able to manage the query you want to ask. I've made a test and the next both queries give me the right result (see the parenthesis): - {!edismax}(NOT id:7 AND NOT id:8 AND id:9) (gives 1 hit the id:9) - {!edismax}((NOT id:7 AND NOT id:8) AND id:9) (gives 1 hit the id:9) In general, the issue appears when using the lucene query parser mixing different boolean clauses (including NOT). Thus, as you commented, the next queries gives different result - NOT id:7 AND NOT id:8 AND id:9 (gives 1 hit the id:9) - (NOT id:7 AND NOT id:8) AND id:9 (gives 0 hits when expecting 1 ) Since I read the chapter Limitations of prohibited clauses in sub-queries from the Apache Solr 3 Enterprise Search Server many years ago, I always add the *all documents query clause *:** to the negative clauses to avoid the problem you mentioned. Thus I will recommend to rewrite the query you showed us as: - (**:*: AND* NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE - (NOT id:7 AND NOT id:8 *AND *:**) AND id:9 (gives 1 hit as expected) The above query can be read then as give me all the documents except those having ID01 and PDF_TEXT and having PROTOTIPE Regards. On Wed, Feb 25, 2015 at 1:23 PM, Shawn Heisey apa...@elyograg.org wrote: On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote: We have problems with some queries. All of them include the tag NOT, and in my opinion, the results don´t make any sense. First problem: This query NOT Proc:ID01returns 95806 results, however this one NOT Proc:ID01 OR FileType:PDF_TEXT returns 11484 results. But it's impossible that adding a tag OR the query has less number of results. Second problem. Here the problem is because of the brackets and the NOT tag: This query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE returns 0 documents. But this query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) returns 53 documents, which is correct. So, the problem is the position of the bracket. I have checked the same query without NOTs, and it works fine returning the same number of results in both cases. So, I think the problem is the combination of the bracket positions and the NOT tag. For the first query, there is a difference between NOT condition1 OR condition2 and NOT (condition1 OR condition2) ... I can imagine the first one increasing the document count compared to just NOT condition1 ... the second one wouldn't increase it. Boolean queries in Solr (and very likely Lucene as well) do not always do what people expect. http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/ https://lucidworks.com/blog/why-not-and-or-and-not/ As mentioned in the second link above, you'll get better results if you use the prefix operators with explicit parentheses. One word of warning, though -- the prefix operators do not work correctly if you change the default operator to AND. Thanks, Shawn
Re: Problem with queries that includes NOT
As a general proposition, your first stop with any query interpretation questions should be to add the debigQuery=true parameter and look at the parsed_query in the query response which shows how the query is really interpreted. -- Jack Krupansky On Wed, Feb 25, 2015 at 8:21 AM, david.dav...@correo.aeat.es wrote: Hi Shawn, thank you for your quick response. I will read your links and make some tests. Regards, David Dávila DIT - 915828763 De: Shawn Heisey apa...@elyograg.org Para: solr-user@lucene.apache.org, Fecha: 25/02/2015 13:23 Asunto: Re: Problem with queries that includes NOT On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote: We have problems with some queries. All of them include the tag NOT, and in my opinion, the results don´t make any sense. First problem: This query NOT Proc:ID01returns 95806 results, however this one NOT Proc:ID01 OR FileType:PDF_TEXT returns 11484 results. But it's impossible that adding a tag OR the query has less number of results. Second problem. Here the problem is because of the brackets and the NOT tag: This query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE returns 0 documents. But this query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) returns 53 documents, which is correct. So, the problem is the position of the bracket. I have checked the same query without NOTs, and it works fine returning the same number of results in both cases. So, I think the problem is the combination of the bracket positions and the NOT tag. For the first query, there is a difference between NOT condition1 OR condition2 and NOT (condition1 OR condition2) ... I can imagine the first one increasing the document count compared to just NOT condition1 ... the second one wouldn't increase it. Boolean queries in Solr (and very likely Lucene as well) do not always do what people expect. http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/ https://lucidworks.com/blog/why-not-and-or-and-not/ As mentioned in the second link above, you'll get better results if you use the prefix operators with explicit parentheses. One word of warning, though -- the prefix operators do not work correctly if you change the default operator to AND. Thanks, Shawn
Re: Problem with queries that includes NOT
Hi Shawn, thank you for your quick response. I will read your links and make some tests. Regards, David Dávila DIT - 915828763 De: Shawn Heisey apa...@elyograg.org Para: solr-user@lucene.apache.org, Fecha: 25/02/2015 13:23 Asunto: Re: Problem with queries that includes NOT On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote: We have problems with some queries. All of them include the tag NOT, and in my opinion, the results don´t make any sense. First problem: This query NOT Proc:ID01returns 95806 results, however this one NOT Proc:ID01 OR FileType:PDF_TEXT returns 11484 results. But it's impossible that adding a tag OR the query has less number of results. Second problem. Here the problem is because of the brackets and the NOT tag: This query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE returns 0 documents. But this query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) returns 53 documents, which is correct. So, the problem is the position of the bracket. I have checked the same query without NOTs, and it works fine returning the same number of results in both cases. So, I think the problem is the combination of the bracket positions and the NOT tag. For the first query, there is a difference between NOT condition1 OR condition2 and NOT (condition1 OR condition2) ... I can imagine the first one increasing the document count compared to just NOT condition1 ... the second one wouldn't increase it. Boolean queries in Solr (and very likely Lucene as well) do not always do what people expect. http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/ https://lucidworks.com/blog/why-not-and-or-and-not/ As mentioned in the second link above, you'll get better results if you use the prefix operators with explicit parentheses. One word of warning, though -- the prefix operators do not work correctly if you change the default operator to AND. Thanks, Shawn
Re: Problem with queries that includes NOT
On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote: We have problems with some queries. All of them include the tag NOT, and in my opinion, the results don´t make any sense. First problem: This query NOT Proc:ID01returns 95806 results, however this one NOT Proc:ID01 OR FileType:PDF_TEXT returns 11484 results. But it's impossible that adding a tag OR the query has less number of results. Second problem. Here the problem is because of the brackets and the NOT tag: This query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE returns 0 documents. But this query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) returns 53 documents, which is correct. So, the problem is the position of the bracket. I have checked the same query without NOTs, and it works fine returning the same number of results in both cases. So, I think the problem is the combination of the bracket positions and the NOT tag. For the first query, there is a difference between NOT condition1 OR condition2 and NOT (condition1 OR condition2) ... I can imagine the first one increasing the document count compared to just NOT condition1 ... the second one wouldn't increase it. Boolean queries in Solr (and very likely Lucene as well) do not always do what people expect. http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/ https://lucidworks.com/blog/why-not-and-or-and-not/ As mentioned in the second link above, you'll get better results if you use the prefix operators with explicit parentheses. One word of warning, though -- the prefix operators do not work correctly if you change the default operator to AND. Thanks, Shawn
Re: Problem with queries that includes NOT
Hi, The edismax parser should be able to manage the query you want to ask. I've made a test and the next both queries give me the right result (see the parenthesis): - {!edismax}(NOT id:7 AND NOT id:8 AND id:9) (gives 1 hit the id:9) - {!edismax}((NOT id:7 AND NOT id:8) AND id:9) (gives 1 hit the id:9) In general, the issue appears when using the lucene query parser mixing different boolean clauses (including NOT). Thus, as you commented, the next queries gives different result - NOT id:7 AND NOT id:8 AND id:9 (gives 1 hit the id:9) - (NOT id:7 AND NOT id:8) AND id:9 (gives 0 hits when expecting 1 ) Since I read the chapter Limitations of prohibited clauses in sub-queries from the Apache Solr 3 Enterprise Search Server many years ago, I always add the *all documents query clause *:** to the negative clauses to avoid the problem you mentioned. Thus I will recommend to rewrite the query you showed us as: - (**:*: AND* NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE - (NOT id:7 AND NOT id:8 *AND *:**) AND id:9 (gives 1 hit as expected) The above query can be read then as give me all the documents except those having ID01 and PDF_TEXT and having PROTOTIPE Regards. On Wed, Feb 25, 2015 at 1:23 PM, Shawn Heisey apa...@elyograg.org wrote: On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote: We have problems with some queries. All of them include the tag NOT, and in my opinion, the results don´t make any sense. First problem: This query NOT Proc:ID01returns 95806 results, however this one NOT Proc:ID01 OR FileType:PDF_TEXT returns 11484 results. But it's impossible that adding a tag OR the query has less number of results. Second problem. Here the problem is because of the brackets and the NOT tag: This query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE returns 0 documents. But this query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) returns 53 documents, which is correct. So, the problem is the position of the bracket. I have checked the same query without NOTs, and it works fine returning the same number of results in both cases. So, I think the problem is the combination of the bracket positions and the NOT tag. For the first query, there is a difference between NOT condition1 OR condition2 and NOT (condition1 OR condition2) ... I can imagine the first one increasing the document count compared to just NOT condition1 ... the second one wouldn't increase it. Boolean queries in Solr (and very likely Lucene as well) do not always do what people expect. http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/ https://lucidworks.com/blog/why-not-and-or-and-not/ As mentioned in the second link above, you'll get better results if you use the prefix operators with explicit parentheses. One word of warning, though -- the prefix operators do not work correctly if you change the default operator to AND. Thanks, Shawn
Problem with queries that includes NOT
Hello, We have problems with some queries. All of them include the tag NOT, and in my opinion, the results don´t make any sense. First problem: This query NOT Proc:ID01returns 95806 results, however this one NOT Proc:ID01 OR FileType:PDF_TEXT returns 11484 results. But it's impossible that adding a tag OR the query has less number of results. Second problem. Here the problem is because of the brackets and the NOT tag: This query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE returns 0 documents. But this query: (NOT Proc:ID01 AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) returns 53 documents, which is correct. So, the problem is the position of the bracket. I have checked the same query without NOTs, and it works fine returning the same number of results in both cases. So, I think the problem is the combination of the bracket positions and the NOT tag. This second problem is less important, but the queries comes from a web page and I'd have to change it, so I need to know if the problem is Solr or not. This is the part of the scheme that applies: fieldType name=string class=solr.StrField sortMissingLast=true/ Thank you very much, David Dávila DIT - 915828763