Re: Problem with Query Parser

2009-10-18 Thread AHMET ARSLAN

 Hi everybody
 
 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:
 
 
 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str
 
 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter. 

Yes you are right. ion part of the term was deleted by it. You can verify 
this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory 
removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes? 

Yes re-index is necessary.

 Should I delete also from the index section? 

You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

You will lose stemming functionality. But since you have spanish, portuguese 
and english documents using English porter for all the documents is not 
meaningful. 






Re: Problem with Query Parser

2009-10-18 Thread Germán Biozzoli
Thanks Ahmet. Definitely using analyzer appears the english porter as
the killer ;)
Regards
German

On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hi everybody

 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:


 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str

 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter.

 Yes you are right. ion part of the term was deleted by it. You can verify 
 this using /admin/analysis.jsp page. It will tell you which 
 TokenFilterFactory removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes?

 Yes re-index is necessary.

 Should I delete also from the index section?

 You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

 You will lose stemming functionality. But since you have spanish, portuguese 
 and english documents using English porter for all the documents is not 
 meaningful.







Re: Problem with Query Parser

2009-10-18 Thread Lance Norskog
Another way to do multi-lingual indexing is to have a separate field
for each language. Solr/Lucene have custom processing for some
languages.

On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli
germanbiozz...@gmail.com wrote:
 Thanks Ahmet. Definitely using analyzer appears the english porter as
 the killer ;)
 Regards
 German

 On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hi everybody

 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:


 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str

 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter.

 Yes you are right. ion part of the term was deleted by it. You can verify 
 this using /admin/analysis.jsp page. It will tell you which 
 TokenFilterFactory removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes?

 Yes re-index is necessary.

 Should I delete also from the index section?

 You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

 You will lose stemming functionality. But since you have spanish, portuguese 
 and english documents using English porter for all the documents is not 
 meaningful.









-- 
Lance Norskog
goks...@gmail.com


Problem with Query Parser

2009-10-17 Thread Germán Biozzoli
Hi everybody

I have a simple but (for me) annoying problem. I'm happy user of Solr
1.4 with a small collection of documents. Today one of the users has
reported that a query returns documents that are non-pertinent to the
expression. I have spanish, portuguese and english text inside the
collection. Using the Solr administration interface I've found that
she was right, if I search for the spanish term represion, I found
just only the word root, I mean it returns every document with the
term repres. Using the admin-debug search I found this:


lst name=debug
str name=rawquerystringdescription:represion/str
str name=querystringdescription:represion/str
str name=parsedquerydescription:repres/str
str name=parsedquery_toStringdescription:repres/str

the ion part of the term was deleted by the query parser. The first
question is: I don´t know now where should I see to correct this, at
the schema.xml or at the solrconfig.xml.

At schema, description is

field name=description type=text indexed=true
multiValued=true stored=true/

and text is:

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

/fieldtype

The only thing that is suspicious to me is the EnglishPorter. I've
deleted from the configuration but nothing changes. Should I reindex
the collection to see the changes? Should I delete also from the index
section? What I will loose deleting English porter?

Thanks a lot for the help
German


Re: Problem with Query Parser?

2009-06-16 Thread Avlesh Singh
Can someone explain this?
+myField:\* +city:Mumbai gives me all results for +city:Mumbai

myField is a regular text field and * is not a stopword.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 10:26 AM, Yonik Seeley
yo...@lucidimagination.comwrote:

 On Tue, Jun 16, 2009 at 12:28 AM, Avlesh Singhavl...@gmail.com wrote:
 
  Probably the analyzer removed the $, leaving an empty term and causing
  the clause to be removed altogether.
 
 
  I predicted this behavior while writing the mail yesterday, Yonik.
  Does it sound logical and intuitive?

 It's intuitive in some circumstances, and not in others.  It's
 certainly not intuitive in this particular case.  I think there's
 another JIRA issue already open for this somewhere.

 -Yonik
 http://www.lucidimagination.com



Re: Problem with Query Parser?

2009-06-16 Thread Yonik Seeley
On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singhavl...@gmail.com wrote:
 Can someone explain this?
 +myField:\* +city:Mumbai gives me all results for +city:Mumbai

 myField is a regular text field and * is not a stopword.

* and other non alphanumerics are probably being dropped by WordDelimiterFilter.

-Yonik
http://www.lucidimagination.com


Re: Problem with Query Parser?

2009-06-16 Thread Avlesh Singh
Thanks Yonik!

Cheers
Avlesh

On Tue, Jun 16, 2009 at 7:25 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singhavl...@gmail.com wrote:
  Can someone explain this?
  +myField:\* +city:Mumbai gives me all results for +city:Mumbai
 
  myField is a regular text field and * is not a stopword.

 * and other non alphanumerics are probably being dropped by
 WordDelimiterFilter.

 -Yonik
 http://www.lucidimagination.com



Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
I noticed a strange behavior of the Query parser for the following query on
my index.
+(category_name:$ product_name:$ brand_name:$) +is_available:1
Fields, category_name, product_name and brand_name are of type text and
is_available is a string field, storing 0 or 1 for each doc in the index.

When I perform the query: *+(category_name:$ product_name:$
brand_name:$)*, i get no results (which is as expected);
However, when I perform the query: *+(category_name:$ product_name:$
brand_name:$) +is_available:1*, I get results for all is_available=1. This
is unexpected and undesired, the first half of the query is simply ignored.

I have noticed this behaviour for pretty much all the special characters: $,
^, * etc ... I am using the default text field analyzer.
Am I missing something or is this a known bug in Solr?

Cheers
Avlesh


Re: Problem with Query Parser?

2009-06-15 Thread Otis Gospodnetic

Hi,

It looks like the query parser is doing its job of removing certain characters 
from the query string.

Maybe you can use this method directly or at least mimic it in your application:

./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public static 
String escapeQueryChars(String s) {


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Avlesh Singh avl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, June 15, 2009 8:06:03 AM
 Subject: Problem with Query Parser?
 
 I noticed a strange behavior of the Query parser for the following query on
 my index.
 +(category_name:$ product_name:$ brand_name:$) +is_available:1
 Fields, category_name, product_name and brand_name are of type text and
 is_available is a string field, storing 0 or 1 for each doc in the index.
 
 When I perform the query: *+(category_name:$ product_name:$
 brand_name:$)*, i get no results (which is as expected);
 However, when I perform the query: *+(category_name:$ product_name:$
 brand_name:$) +is_available:1*, I get results for all is_available=1. This
 is unexpected and undesired, the first half of the query is simply ignored.
 
 I have noticed this behaviour for pretty much all the special characters: $,
 ^, * etc ... I am using the default text field analyzer.
 Am I missing something or is this a known bug in Solr?
 
 Cheers
 Avlesh



Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
How does one explain this?
+myField:$ give zero result
+myField:$ +city:Mumbai gives result for city:Mumbai

Cheers
Avlesh

On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Hi,

 It looks like the query parser is doing its job of removing certain
 characters from the query string.

 Maybe you can use this method directly or at least mimic it in your
 application:

 ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
 static String escapeQueryChars(String s) {


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Avlesh Singh avl...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, June 15, 2009 8:06:03 AM
  Subject: Problem with Query Parser?
 
  I noticed a strange behavior of the Query parser for the following query
 on
  my index.
  +(category_name:$ product_name:$ brand_name:$) +is_available:1
  Fields, category_name, product_name and brand_name are of type text and
  is_available is a string field, storing 0 or 1 for each doc in the
 index.
 
  When I perform the query: *+(category_name:$ product_name:$
  brand_name:$)*, i get no results (which is as expected);
  However, when I perform the query: *+(category_name:$ product_name:$
  brand_name:$) +is_available:1*, I get results for all is_available=1.
 This
  is unexpected and undesired, the first half of the query is simply
 ignored.
 
  I have noticed this behaviour for pretty much all the special characters:
 $,
  ^, * etc ... I am using the default text field analyzer.
  Am I missing something or is this a known bug in Solr?
 
  Cheers
  Avlesh




Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh

 Probably the analyzer removed the $, leaving an empty term and causing
 the clause to be removed altogether.


I predicted this behavior while writing the mail yesterday, Yonik.
Does it sound logical and intuitive?

Cheers
Avlesh

On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singhavl...@gmail.com wrote:
  How does one explain this?
  +myField:$ give zero result
  +myField:$ +city:Mumbai gives result for city:Mumbai

 Probably the analyzer removed the $, leaving an empty term and
 causing the clause to be removed altogether.

 -Yonik
 http://www.lucidimagination.com



Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
And here's the debug info:
str name=rawquerystring+myField:$ +city:Mumbai/str
str name=querystring+myField:$ +city:Mumbai/str
str name=parsedquery+city:Mumbai/str
str name=parsedquery_toString+city:Mumbai/str
str name=QParserOldLuceneQParser/str

I found this unintuitive. No results rather than All results was the
expected behavior.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 9:58 AM, Avlesh Singh avl...@gmail.com wrote:

 Probably the analyzer removed the $, leaving an empty term and causing
 the clause to be removed altogether.


 I predicted this behavior while writing the mail yesterday, Yonik.
 Does it sound logical and intuitive?

 Cheers
 Avlesh


 On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley 
 yo...@lucidimagination.comwrote:

 On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singhavl...@gmail.com wrote:
  How does one explain this?
  +myField:$ give zero result
  +myField:$ +city:Mumbai gives result for city:Mumbai

 Probably the analyzer removed the $, leaving an empty term and
 causing the clause to be removed altogether.

 -Yonik
 http://www.lucidimagination.com





Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh

 Maybe you can use this method directly or at least mimic it in your
 application:
 ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
 static String escapeQueryChars(String s)


Does not help either, Otis.
(+myField:$ +city:Mumbai) at best could get converted into (+myField:\\$
+city:Mumbai)
Output remains the same: all results rather than expected no results.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Hi,

 It looks like the query parser is doing its job of removing certain
 characters from the query string.

 Maybe you can use this method directly or at least mimic it in your
 application:

 ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
 static String escapeQueryChars(String s) {


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Avlesh Singh avl...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, June 15, 2009 8:06:03 AM
  Subject: Problem with Query Parser?
 
  I noticed a strange behavior of the Query parser for the following query
 on
  my index.
  +(category_name:$ product_name:$ brand_name:$) +is_available:1
  Fields, category_name, product_name and brand_name are of type text and
  is_available is a string field, storing 0 or 1 for each doc in the
 index.
 
  When I perform the query: *+(category_name:$ product_name:$
  brand_name:$)*, i get no results (which is as expected);
  However, when I perform the query: *+(category_name:$ product_name:$
  brand_name:$) +is_available:1*, I get results for all is_available=1.
 This
  is unexpected and undesired, the first half of the query is simply
 ignored.
 
  I have noticed this behaviour for pretty much all the special characters:
 $,
  ^, * etc ... I am using the default text field analyzer.
  Am I missing something or is this a known bug in Solr?
 
  Cheers
  Avlesh