Re: Problem with Query Parser
Another way to do multi-lingual indexing is to have a separate field for each language. Solr/Lucene have custom processing for some languages. On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli wrote: > Thanks Ahmet. Definitely using analyzer appears the english porter as > the killer ;) > Regards > German > > On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN wrote: >> >>> Hi everybody >>> >>> I have a simple but (for me) annoying problem. I'm happy >>> user of Solr >>> 1.4 with a small collection of documents. Today one of the >>> users has >>> reported that a query returns documents that are >>> non-pertinent to the >>> expression. I have spanish, portuguese and english text >>> inside the >>> collection. Using the Solr administration interface I've >>> found that >>> she was right, if I search for the spanish term >>> "represion", I found >>> just only the word root, I mean it returns every document >>> with the >>> term "repres". Using the admin-debug search I found this: >>> >>> >>> >>> >> name="rawquerystring">description:represion >>> >> name="querystring">description:represion >>> >> name="parsedquery">description:repres >>> >> name="parsedquery_toString">description:repres >>> >>> the "ion" part of the term was deleted by the query parser. >>> The first >>> question is: I don´t know now where should I see to >>> correct this, at >>> the schema.xml or at the solrconfig.xml. >> >>> The only thing that is suspicious to me is the >>> EnglishPorter. >> >> Yes you are right. "ion" part of the term was deleted by it. You can verify >> this using /admin/analysis.jsp page. It will tell you which >> TokenFilterFactory removes it. >> >>> I've deleted from the configuration but nothing changes. Should >>> I reindex the collection to see the changes? >> >> Yes re-index is necessary. >> >>> Should I delete also from the index section? >> >> You should remove English porter from both query and index analyzer. >> >>> What I will loose deleting English porter? >> >> You will lose stemming functionality. But since you have spanish, portuguese >> and english documents using English porter for all the documents is not >> meaningful. >> >> >> >> >> > -- Lance Norskog goks...@gmail.com
Re: Problem with Query Parser
Thanks Ahmet. Definitely using analyzer appears the english porter as the killer ;) Regards German On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN wrote: > >> Hi everybody >> >> I have a simple but (for me) annoying problem. I'm happy >> user of Solr >> 1.4 with a small collection of documents. Today one of the >> users has >> reported that a query returns documents that are >> non-pertinent to the >> expression. I have spanish, portuguese and english text >> inside the >> collection. Using the Solr administration interface I've >> found that >> she was right, if I search for the spanish term >> "represion", I found >> just only the word root, I mean it returns every document >> with the >> term "repres". Using the admin-debug search I found this: >> >> >> >> > name="rawquerystring">description:represion >> > name="querystring">description:represion >> > name="parsedquery">description:repres >> > name="parsedquery_toString">description:repres >> >> the "ion" part of the term was deleted by the query parser. >> The first >> question is: I don´t know now where should I see to >> correct this, at >> the schema.xml or at the solrconfig.xml. > >> The only thing that is suspicious to me is the >> EnglishPorter. > > Yes you are right. "ion" part of the term was deleted by it. You can verify > this using /admin/analysis.jsp page. It will tell you which > TokenFilterFactory removes it. > >> I've deleted from the configuration but nothing changes. Should >> I reindex the collection to see the changes? > > Yes re-index is necessary. > >> Should I delete also from the index section? > > You should remove English porter from both query and index analyzer. > >> What I will loose deleting English porter? > > You will lose stemming functionality. But since you have spanish, portuguese > and english documents using English porter for all the documents is not > meaningful. > > > > >
Re: Problem with Query Parser
> Hi everybody > > I have a simple but (for me) annoying problem. I'm happy > user of Solr > 1.4 with a small collection of documents. Today one of the > users has > reported that a query returns documents that are > non-pertinent to the > expression. I have spanish, portuguese and english text > inside the > collection. Using the Solr administration interface I've > found that > she was right, if I search for the spanish term > "represion", I found > just only the word root, I mean it returns every document > with the > term "repres". Using the admin-debug search I found this: > > > > name="rawquerystring">description:represion > name="querystring">description:represion > name="parsedquery">description:repres > name="parsedquery_toString">description:repres > > the "ion" part of the term was deleted by the query parser. > The first > question is: I don´t know now where should I see to > correct this, at > the schema.xml or at the solrconfig.xml. > The only thing that is suspicious to me is the > EnglishPorter. Yes you are right. "ion" part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. > I've deleted from the configuration but nothing changes. Should > I reindex the collection to see the changes? Yes re-index is necessary. > Should I delete also from the index section? You should remove English porter from both query and index analyzer. > What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful.
Problem with Query Parser
Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term "represion", I found just only the word root, I mean it returns every document with the term "repres". Using the admin-debug search I found this: description:represion description:represion description:repres description:repres the "ion" part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. At schema, description is and text is: The only thing that is suspicious to me is the EnglishPorter. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Should I delete also from the index section? What I will loose deleting English porter? Thanks a lot for the help German
Re: Problem with Query Parser?
Thanks Yonik! Cheers Avlesh On Tue, Jun 16, 2009 at 7:25 PM, Yonik Seeley wrote: > On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singh wrote: > > Can someone explain this? > > +myField:"\*" +city:Mumbai gives me all results for +city:Mumbai > > > > myField is a regular text field and "*" is not a stopword. > > * and other non alphanumerics are probably being dropped by > WordDelimiterFilter. > > -Yonik > http://www.lucidimagination.com >
Re: Problem with Query Parser?
On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singh wrote: > Can someone explain this? > +myField:"\*" +city:Mumbai gives me all results for +city:Mumbai > > myField is a regular text field and "*" is not a stopword. * and other non alphanumerics are probably being dropped by WordDelimiterFilter. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
Can someone explain this? +myField:"\*" +city:Mumbai gives me all results for +city:Mumbai myField is a regular text field and "*" is not a stopword. Cheers Avlesh On Tue, Jun 16, 2009 at 10:26 AM, Yonik Seeley wrote: > On Tue, Jun 16, 2009 at 12:28 AM, Avlesh Singh wrote: > >> > >> Probably the analyzer removed the "$", leaving an empty term and causing > >> the clause to be removed altogether. > >> > > > > I predicted this behavior while writing the mail yesterday, Yonik. > > Does it sound logical and intuitive? > > It's intuitive in some circumstances, and not in others. It's > certainly not intuitive in this particular case. I think there's > another JIRA issue already open for this somewhere. > > -Yonik > http://www.lucidimagination.com >
Re: Problem with Query Parser?
On Tue, Jun 16, 2009 at 12:28 AM, Avlesh Singh wrote: >> >> Probably the analyzer removed the "$", leaving an empty term and causing >> the clause to be removed altogether. >> > > I predicted this behavior while writing the mail yesterday, Yonik. > Does it sound logical and intuitive? It's intuitive in some circumstances, and not in others. It's certainly not intuitive in this particular case. I think there's another JIRA issue already open for this somewhere. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
> > Maybe you can use this method directly or at least mimic it in your > application: > ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public > static String escapeQueryChars(String s) > Does not help either, Otis. (+myField:"$" +city:Mumbai) at best could get converted into (+myField:"\\$" +city:Mumbai) Output remains the same: all results rather than expected no results. Cheers Avlesh On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Hi, > > It looks like the query parser is doing its job of removing certain > characters from the query string. > > Maybe you can use this method directly or at least mimic it in your > application: > > ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public > static String escapeQueryChars(String s) { > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Avlesh Singh > > To: solr-user@lucene.apache.org > > Sent: Monday, June 15, 2009 8:06:03 AM > > Subject: Problem with Query Parser? > > > > I noticed a strange behavior of the Query parser for the following query > on > > my index. > > +(category_name:"$" product_name:"$" brand_name:"$") +is_available:1 > > Fields, category_name, product_name and brand_name are of type "text" and > > is_available is a "string" field, storing 0 or 1 for each doc in the > index. > > > > When I perform the query: *+(category_name:"$" product_name:"$" > > brand_name:"$")*, i get no results (which is as expected); > > However, when I perform the query: *+(category_name:"$" product_name:"$" > > brand_name:"$") +is_available:1*, I get results for all is_available=1. > This > > is unexpected and undesired, the first half of the query is simply > ignored. > > > > I have noticed this behaviour for pretty much all the special characters: > $, > > ^, * etc ... I am using the default text field analyzer. > > Am I missing something or is this a known bug in Solr? > > > > Cheers > > Avlesh > >
Re: Problem with Query Parser?
And here's the debug info: +myField:"$" +city:Mumbai +myField:"$" +city:Mumbai +city:Mumbai +city:Mumbai OldLuceneQParser I found this unintuitive. "No results" rather than "All results" was the expected behavior. Cheers Avlesh On Tue, Jun 16, 2009 at 9:58 AM, Avlesh Singh wrote: > Probably the analyzer removed the "$", leaving an empty term and causing >> the clause to be removed altogether. >> > > I predicted this behavior while writing the mail yesterday, Yonik. > Does it sound logical and intuitive? > > Cheers > Avlesh > > > On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley > wrote: > >> On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singh wrote: >> > How does one explain this? >> > +myField:"$" give zero result >> > +myField:"$" +city:"Mumbai" gives result for city:"Mumbai" >> >> Probably the analyzer removed the "$", leaving an empty term and >> causing the clause to be removed altogether. >> >> -Yonik >> http://www.lucidimagination.com >> > >
Re: Problem with Query Parser?
> > Probably the analyzer removed the "$", leaving an empty term and causing > the clause to be removed altogether. > I predicted this behavior while writing the mail yesterday, Yonik. Does it sound logical and intuitive? Cheers Avlesh On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley wrote: > On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singh wrote: > > How does one explain this? > > +myField:"$" give zero result > > +myField:"$" +city:"Mumbai" gives result for city:"Mumbai" > > Probably the analyzer removed the "$", leaving an empty term and > causing the clause to be removed altogether. > > -Yonik > http://www.lucidimagination.com >
Re: Problem with Query Parser?
On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singh wrote: > How does one explain this? > +myField:"$" give zero result > +myField:"$" +city:"Mumbai" gives result for city:"Mumbai" Probably the analyzer removed the "$", leaving an empty term and causing the clause to be removed altogether. -Yonik http://www.lucidimagination.com
Re: Problem with Query Parser?
How does one explain this? +myField:"$" give zero result +myField:"$" +city:"Mumbai" gives result for city:"Mumbai" Cheers Avlesh On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Hi, > > It looks like the query parser is doing its job of removing certain > characters from the query string. > > Maybe you can use this method directly or at least mimic it in your > application: > > ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public > static String escapeQueryChars(String s) { > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Avlesh Singh > > To: solr-user@lucene.apache.org > > Sent: Monday, June 15, 2009 8:06:03 AM > > Subject: Problem with Query Parser? > > > > I noticed a strange behavior of the Query parser for the following query > on > > my index. > > +(category_name:"$" product_name:"$" brand_name:"$") +is_available:1 > > Fields, category_name, product_name and brand_name are of type "text" and > > is_available is a "string" field, storing 0 or 1 for each doc in the > index. > > > > When I perform the query: *+(category_name:"$" product_name:"$" > > brand_name:"$")*, i get no results (which is as expected); > > However, when I perform the query: *+(category_name:"$" product_name:"$" > > brand_name:"$") +is_available:1*, I get results for all is_available=1. > This > > is unexpected and undesired, the first half of the query is simply > ignored. > > > > I have noticed this behaviour for pretty much all the special characters: > $, > > ^, * etc ... I am using the default text field analyzer. > > Am I missing something or is this a known bug in Solr? > > > > Cheers > > Avlesh > >
Re: Problem with Query Parser?
Hi, It looks like the query parser is doing its job of removing certain characters from the query string. Maybe you can use this method directly or at least mimic it in your application: ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java: public static String escapeQueryChars(String s) { Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Avlesh Singh > To: solr-user@lucene.apache.org > Sent: Monday, June 15, 2009 8:06:03 AM > Subject: Problem with Query Parser? > > I noticed a strange behavior of the Query parser for the following query on > my index. > +(category_name:"$" product_name:"$" brand_name:"$") +is_available:1 > Fields, category_name, product_name and brand_name are of type "text" and > is_available is a "string" field, storing 0 or 1 for each doc in the index. > > When I perform the query: *+(category_name:"$" product_name:"$" > brand_name:"$")*, i get no results (which is as expected); > However, when I perform the query: *+(category_name:"$" product_name:"$" > brand_name:"$") +is_available:1*, I get results for all is_available=1. This > is unexpected and undesired, the first half of the query is simply ignored. > > I have noticed this behaviour for pretty much all the special characters: $, > ^, * etc ... I am using the default text field analyzer. > Am I missing something or is this a known bug in Solr? > > Cheers > Avlesh
Problem with Query Parser?
I noticed a strange behavior of the Query parser for the following query on my index. +(category_name:"$" product_name:"$" brand_name:"$") +is_available:1 Fields, category_name, product_name and brand_name are of type "text" and is_available is a "string" field, storing 0 or 1 for each doc in the index. When I perform the query: *+(category_name:"$" product_name:"$" brand_name:"$")*, i get no results (which is as expected); However, when I perform the query: *+(category_name:"$" product_name:"$" brand_name:"$") +is_available:1*, I get results for all is_available=1. This is unexpected and undesired, the first half of the query is simply ignored. I have noticed this behaviour for pretty much all the special characters: $, ^, * etc ... I am using the default text field analyzer. Am I missing something or is this a known bug in Solr? Cheers Avlesh