subject:"Re\: Problem with Query Parser\?"

Re: Problem with Query Parser

2009-10-18 Thread AHMET ARSLAN


 Hi everybody
 
 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:
 
 
 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str
 
 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter. 

Yes you are right. ion part of the term was deleted by it. You can verify 
this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory 
removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes? 

Yes re-index is necessary.

 Should I delete also from the index section? 

You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

You will lose stemming functionality. But since you have spanish, portuguese 
and english documents using English porter for all the documents is not 
meaningful.

Re: Problem with Query Parser

2009-10-18 Thread Germán Biozzoli

Thanks Ahmet. Definitely using analyzer appears the english porter as
the killer ;)
Regards
German

On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hi everybody

 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:


 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str

 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter.

 Yes you are right. ion part of the term was deleted by it. You can verify 
 this using /admin/analysis.jsp page. It will tell you which 
 TokenFilterFactory removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes?

 Yes re-index is necessary.

 Should I delete also from the index section?

 You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

 You will lose stemming functionality. But since you have spanish, portuguese 
 and english documents using English porter for all the documents is not 
 meaningful.

Re: Problem with Query Parser

2009-10-18 Thread Lance Norskog

Another way to do multi-lingual indexing is to have a separate field
for each language. Solr/Lucene have custom processing for some
languages.

On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli
germanbiozz...@gmail.com wrote:
 Thanks Ahmet. Definitely using analyzer appears the english porter as
 the killer ;)
 Regards
 German

 On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hi everybody

 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:


 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str

 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter.

 Yes you are right. ion part of the term was deleted by it. You can verify 
 this using /admin/analysis.jsp page. It will tell you which 
 TokenFilterFactory removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes?

 Yes re-index is necessary.

 Should I delete also from the index section?

 You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

 You will lose stemming functionality. But since you have spanish, portuguese 
 and english documents using English porter for all the documents is not 
 meaningful.









-- 
Lance Norskog
goks...@gmail.com

Re: Problem with Query Parser?

2009-06-16 Thread Avlesh Singh

Can someone explain this?
+myField:\* +city:Mumbai gives me all results for +city:Mumbai

myField is a regular text field and * is not a stopword.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 10:26 AM, Yonik Seeley
yo...@lucidimagination.comwrote:

 On Tue, Jun 16, 2009 at 12:28 AM, Avlesh Singhavl...@gmail.com wrote:
 
  Probably the analyzer removed the $, leaving an empty term and causing
  the clause to be removed altogether.
 
 
  I predicted this behavior while writing the mail yesterday, Yonik.
  Does it sound logical and intuitive?

 It's intuitive in some circumstances, and not in others.  It's
 certainly not intuitive in this particular case.  I think there's
 another JIRA issue already open for this somewhere.

 -Yonik
 http://www.lucidimagination.com

Re: Problem with Query Parser?

2009-06-16 Thread Yonik Seeley

On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singhavl...@gmail.com wrote:
 Can someone explain this?
 +myField:\* +city:Mumbai gives me all results for +city:Mumbai

 myField is a regular text field and * is not a stopword.

* and other non alphanumerics are probably being dropped by WordDelimiterFilter.

-Yonik
http://www.lucidimagination.com

Re: Problem with Query Parser?

2009-06-16 Thread Avlesh Singh

Thanks Yonik!

Cheers
Avlesh

On Tue, Jun 16, 2009 at 7:25 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singhavl...@gmail.com wrote:
  Can someone explain this?
  +myField:\* +city:Mumbai gives me all results for +city:Mumbai
 
  myField is a regular text field and * is not a stopword.

 * and other non alphanumerics are probably being dropped by
 WordDelimiterFilter.

 -Yonik
 http://www.lucidimagination.com

Re: Problem with Query Parser?

2009-06-15 Thread Otis Gospodnetic


Hi,

It looks like the query parser is doing its job of removing certain characters 
from the query string.

Maybe you can use this method directly or at least mimic it in your application:

./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public static 
String escapeQueryChars(String s) {


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Avlesh Singh avl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, June 15, 2009 8:06:03 AM
 Subject: Problem with Query Parser?
 
 I noticed a strange behavior of the Query parser for the following query on
 my index.
 +(category_name:$ product_name:$ brand_name:$) +is_available:1
 Fields, category_name, product_name and brand_name are of type text and
 is_available is a string field, storing 0 or 1 for each doc in the index.
 
 When I perform the query: *+(category_name:$ product_name:$
 brand_name:$)*, i get no results (which is as expected);
 However, when I perform the query: *+(category_name:$ product_name:$
 brand_name:$) +is_available:1*, I get results for all is_available=1. This
 is unexpected and undesired, the first half of the query is simply ignored.
 
 I have noticed this behaviour for pretty much all the special characters: $,
 ^, * etc ... I am using the default text field analyzer.
 Am I missing something or is this a known bug in Solr?
 
 Cheers
 Avlesh

Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh

How does one explain this?
+myField:$ give zero result
+myField:$ +city:Mumbai gives result for city:Mumbai

Cheers
Avlesh

On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Hi,

 It looks like the query parser is doing its job of removing certain
 characters from the query string.

 Maybe you can use this method directly or at least mimic it in your
 application:

 ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
 static String escapeQueryChars(String s) {


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Avlesh Singh avl...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, June 15, 2009 8:06:03 AM
  Subject: Problem with Query Parser?
 
  I noticed a strange behavior of the Query parser for the following query
 on
  my index.
  +(category_name:$ product_name:$ brand_name:$) +is_available:1
  Fields, category_name, product_name and brand_name are of type text and
  is_available is a string field, storing 0 or 1 for each doc in the
 index.
 
  When I perform the query: *+(category_name:$ product_name:$
  brand_name:$)*, i get no results (which is as expected);
  However, when I perform the query: *+(category_name:$ product_name:$
  brand_name:$) +is_available:1*, I get results for all is_available=1.
 This
  is unexpected and undesired, the first half of the query is simply
 ignored.
 
  I have noticed this behaviour for pretty much all the special characters:
 $,
  ^, * etc ... I am using the default text field analyzer.
  Am I missing something or is this a known bug in Solr?
 
  Cheers
  Avlesh

Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh


 Probably the analyzer removed the $, leaving an empty term and causing
 the clause to be removed altogether.


I predicted this behavior while writing the mail yesterday, Yonik.
Does it sound logical and intuitive?

Cheers
Avlesh

On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singhavl...@gmail.com wrote:
  How does one explain this?
  +myField:$ give zero result
  +myField:$ +city:Mumbai gives result for city:Mumbai

 Probably the analyzer removed the $, leaving an empty term and
 causing the clause to be removed altogether.

 -Yonik
 http://www.lucidimagination.com

Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh

And here's the debug info:
str name=rawquerystring+myField:$ +city:Mumbai/str
str name=querystring+myField:$ +city:Mumbai/str
str name=parsedquery+city:Mumbai/str
str name=parsedquery_toString+city:Mumbai/str
str name=QParserOldLuceneQParser/str

I found this unintuitive. No results rather than All results was the
expected behavior.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 9:58 AM, Avlesh Singh avl...@gmail.com wrote:

 Probably the analyzer removed the $, leaving an empty term and causing
 the clause to be removed altogether.


 I predicted this behavior while writing the mail yesterday, Yonik.
 Does it sound logical and intuitive?

 Cheers
 Avlesh


 On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley 
 yo...@lucidimagination.comwrote:

 On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singhavl...@gmail.com wrote:
  How does one explain this?
  +myField:$ give zero result
  +myField:$ +city:Mumbai gives result for city:Mumbai

 Probably the analyzer removed the $, leaving an empty term and
 causing the clause to be removed altogether.

 -Yonik
 http://www.lucidimagination.com

Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh


 Maybe you can use this method directly or at least mimic it in your
 application:
 ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
 static String escapeQueryChars(String s)


Does not help either, Otis.
(+myField:$ +city:Mumbai) at best could get converted into (+myField:\\$
+city:Mumbai)
Output remains the same: all results rather than expected no results.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Hi,

 It looks like the query parser is doing its job of removing certain
 characters from the query string.

 Maybe you can use this method directly or at least mimic it in your
 application:

 ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
 static String escapeQueryChars(String s) {


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Avlesh Singh avl...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, June 15, 2009 8:06:03 AM
  Subject: Problem with Query Parser?
 
  I noticed a strange behavior of the Query parser for the following query
 on
  my index.
  +(category_name:$ product_name:$ brand_name:$) +is_available:1
  Fields, category_name, product_name and brand_name are of type text and
  is_available is a string field, storing 0 or 1 for each doc in the
 index.
 
  When I perform the query: *+(category_name:$ product_name:$
  brand_name:$)*, i get no results (which is as expected);
  However, when I perform the query: *+(category_name:$ product_name:$
  brand_name:$) +is_available:1*, I get results for all is_available=1.
 This
  is unexpected and undesired, the first half of the query is simply
 ignored.
 
  I have noticed this behaviour for pretty much all the special characters:
 $,
  ^, * etc ... I am using the default text field analyzer.
  Am I missing something or is this a known bug in Solr?
 
  Cheers
  Avlesh

Re: Problem with Query Parser

Re: Problem with Query Parser

Re: Problem with Query Parser

Re: Problem with Query Parser?

Re: Problem with Query Parser?

Re: Problem with Query Parser?

Re: Problem with Query Parser?

Re: Problem with Query Parser?

Re: Problem with Query Parser?

Re: Problem with Query Parser?

Re: Problem with Query Parser?

11 matches

Site Navigation

Mail list logo

Footer information