Re: Problem with Query Parser
Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. The only thing that is suspicious to me is the EnglishPorter. Yes you are right. ion part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Yes re-index is necessary. Should I delete also from the index section? You should remove English porter from both query and index analyzer. What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful.
Re: Problem with Query Parser
Thanks Ahmet. Definitely using analyzer appears the english porter as the killer ;) Regards German On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote: Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. The only thing that is suspicious to me is the EnglishPorter. Yes you are right. ion part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Yes re-index is necessary. Should I delete also from the index section? You should remove English porter from both query and index analyzer. What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful.
Re: Problem with Query Parser
Another way to do multi-lingual indexing is to have a separate field for each language. Solr/Lucene have custom processing for some languages. On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli germanbiozz...@gmail.com wrote: Thanks Ahmet. Definitely using analyzer appears the english porter as the killer ;) Regards German On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote: Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. The only thing that is suspicious to me is the EnglishPorter. Yes you are right. ion part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Yes re-index is necessary. Should I delete also from the index section? You should remove English porter from both query and index analyzer. What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful. -- Lance Norskog goks...@gmail.com
Re: Problem with Query Parser?
Can someone explain this? +myField:\* +city:Mumbai gives me all results for +city:Mumbai myField is a regular text field and * is not a stopword. Cheers Avlesh On Tue, Jun 16, 2009 at 10:26 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Tue, Jun 16, 2009 at 12:28 AM, Avlesh Singhavl...@gmail.com wrote: Probably the analyzer removed the $, leaving an empty term and causing the clause to be removed altogether. I predicted this behavior while writing the mail yesterday, Yonik. Does it sound logical and intuitive? It's intuitive in some circumstances, and not in others. It's certainly not intuitive in this particular case. I think there's another JIRA issue already open for this somewhere. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singhavl...@gmail.com wrote: Can someone explain this? +myField:\* +city:Mumbai gives me all results for +city:Mumbai myField is a regular text field and * is not a stopword. * and other non alphanumerics are probably being dropped by WordDelimiterFilter. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
Thanks Yonik! Cheers Avlesh On Tue, Jun 16, 2009 at 7:25 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singhavl...@gmail.com wrote: Can someone explain this? +myField:\* +city:Mumbai gives me all results for +city:Mumbai myField is a regular text field and * is not a stopword. * and other non alphanumerics are probably being dropped by WordDelimiterFilter. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
Hi, It looks like the query parser is doing its job of removing certain characters from the query string. Maybe you can use this method directly or at least mimic it in your application: ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public static String escapeQueryChars(String s) { Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Avlesh Singh avl...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, June 15, 2009 8:06:03 AM Subject: Problem with Query Parser? I noticed a strange behavior of the Query parser for the following query on my index. +(category_name:$ product_name:$ brand_name:$) +is_available:1 Fields, category_name, product_name and brand_name are of type text and is_available is a string field, storing 0 or 1 for each doc in the index. When I perform the query: *+(category_name:$ product_name:$ brand_name:$)*, i get no results (which is as expected); However, when I perform the query: *+(category_name:$ product_name:$ brand_name:$) +is_available:1*, I get results for all is_available=1. This is unexpected and undesired, the first half of the query is simply ignored. I have noticed this behaviour for pretty much all the special characters: $, ^, * etc ... I am using the default text field analyzer. Am I missing something or is this a known bug in Solr? Cheers Avlesh
Re: Problem with Query Parser?
How does one explain this? +myField:$ give zero result +myField:$ +city:Mumbai gives result for city:Mumbai Cheers Avlesh On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, It looks like the query parser is doing its job of removing certain characters from the query string. Maybe you can use this method directly or at least mimic it in your application: ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public static String escapeQueryChars(String s) { Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Avlesh Singh avl...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, June 15, 2009 8:06:03 AM Subject: Problem with Query Parser? I noticed a strange behavior of the Query parser for the following query on my index. +(category_name:$ product_name:$ brand_name:$) +is_available:1 Fields, category_name, product_name and brand_name are of type text and is_available is a string field, storing 0 or 1 for each doc in the index. When I perform the query: *+(category_name:$ product_name:$ brand_name:$)*, i get no results (which is as expected); However, when I perform the query: *+(category_name:$ product_name:$ brand_name:$) +is_available:1*, I get results for all is_available=1. This is unexpected and undesired, the first half of the query is simply ignored. I have noticed this behaviour for pretty much all the special characters: $, ^, * etc ... I am using the default text field analyzer. Am I missing something or is this a known bug in Solr? Cheers Avlesh
Re: Problem with Query Parser?
Probably the analyzer removed the $, leaving an empty term and causing the clause to be removed altogether. I predicted this behavior while writing the mail yesterday, Yonik. Does it sound logical and intuitive? Cheers Avlesh On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singhavl...@gmail.com wrote: How does one explain this? +myField:$ give zero result +myField:$ +city:Mumbai gives result for city:Mumbai Probably the analyzer removed the $, leaving an empty term and causing the clause to be removed altogether. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
And here's the debug info: str name=rawquerystring+myField:$ +city:Mumbai/str str name=querystring+myField:$ +city:Mumbai/str str name=parsedquery+city:Mumbai/str str name=parsedquery_toString+city:Mumbai/str str name=QParserOldLuceneQParser/str I found this unintuitive. No results rather than All results was the expected behavior. Cheers Avlesh On Tue, Jun 16, 2009 at 9:58 AM, Avlesh Singh avl...@gmail.com wrote: Probably the analyzer removed the $, leaving an empty term and causing the clause to be removed altogether. I predicted this behavior while writing the mail yesterday, Yonik. Does it sound logical and intuitive? Cheers Avlesh On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singhavl...@gmail.com wrote: How does one explain this? +myField:$ give zero result +myField:$ +city:Mumbai gives result for city:Mumbai Probably the analyzer removed the $, leaving an empty term and causing the clause to be removed altogether. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
Maybe you can use this method directly or at least mimic it in your application: ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public static String escapeQueryChars(String s) Does not help either, Otis. (+myField:$ +city:Mumbai) at best could get converted into (+myField:\\$ +city:Mumbai) Output remains the same: all results rather than expected no results. Cheers Avlesh On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, It looks like the query parser is doing its job of removing certain characters from the query string. Maybe you can use this method directly or at least mimic it in your application: ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public static String escapeQueryChars(String s) { Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Avlesh Singh avl...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, June 15, 2009 8:06:03 AM Subject: Problem with Query Parser? I noticed a strange behavior of the Query parser for the following query on my index. +(category_name:$ product_name:$ brand_name:$) +is_available:1 Fields, category_name, product_name and brand_name are of type text and is_available is a string field, storing 0 or 1 for each doc in the index. When I perform the query: *+(category_name:$ product_name:$ brand_name:$)*, i get no results (which is as expected); However, when I perform the query: *+(category_name:$ product_name:$ brand_name:$) +is_available:1*, I get results for all is_available=1. This is unexpected and undesired, the first half of the query is simply ignored. I have noticed this behaviour for pretty much all the special characters: $, ^, * etc ... I am using the default text field analyzer. Am I missing something or is this a known bug in Solr? Cheers Avlesh