Re: Solr special characters like '(' and ''?

2014-04-10 Thread rulinma
mark.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-special-characters-like-and-tp4129854p4130333.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr special characters like '(' and ''?

2014-04-09 Thread Philip Durbin
Filtering out special characters sounds like a good idea, or possibly
escaping some of them. I definitely want to avoid brittleness.

Right now I'm passing the query relatively as is which means users
can type title:foo to find documents that have foo in the title
field. But a query for just a colon (:) throws an error
(org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I
need to do more processing of the query before I pass it to Solr. I
need to escape that colon or something.

Is there some general advice on doing some sanity checks or escaping
special characters on user-supplied queries before you pass them to
Solr? Is it documented in the wiki? I'm using Solrj but I imagine the
advice applies to everyone.

Phil

p.s. I noticed a note saying These characters are part of the query
syntax and must be escaped at
https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/solrj/src/java/org/apache/solr/client/solrj/util/ClientUtils.java#L231
and learned of this part of the code from
http://lucene.472066.n3.nabble.com/What-is-the-full-list-of-Solr-Special-Characters-td4094053.html

On Tue, Apr 8, 2014 at 10:14 AM, Erick Erickson erickerick...@gmail.com wrote:
 I'd seriously consider filtering these characters out when you index
 and search, this is quite likely very brittle. The same item, say from
 two different vendors, might have D (E  F) or D E  F. If you just
 stripped all of the non alpha-num characters you'd likely get less
 brittle results.

 You know your problem domain better than I do though, so whatever
 makes most sense.

 Best,
 Erick

 On Tue, Apr 8, 2014 at 6:55 AM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Peter,

 TermQueryParser is useful in your case.
 q={!term f=categories_string}A|B|D (E  F)



 On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk 
 wrote:
 Hi

 How to search for Solr special characters like '(' and ''?

 I am trying to execute searches for products in my Solr (3.6.1) index, 
 based on the categories to which these products belong.
 The categories are stored in a multistring field for the products, and are 
 hierarchical, and are fed to the index like:
 A
 A|B
 A|B|C

 So this product would actually belong to category named C, which is a 
 child of B, which is a child of !A.

 I am able to execute queries for simple category names like this (eg. 
 fq=categories_string:A|B|C).

 But some categories have Solr special characters in their names, like: D (E 
  F)
 (Real example: Power supplies (Battery and Solar)).

 A query like fq=categories_string:A|B|D (E  F) simply fails.
 But even if I try
 fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\)
 (where I try to escape the special characters) does not find the products in 
 this category, and actually finds other unrelated categories.

 What am I doing wrong?

 Thanks,
 Peter




-- 
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin


Re: Solr special characters like '(' and ''?

2014-04-09 Thread Shawn Heisey
On 4/9/2014 8:39 AM, Philip Durbin wrote:
 Filtering out special characters sounds like a good idea, or possibly
 escaping some of them. I definitely want to avoid brittleness.
 
 Right now I'm passing the query relatively as is which means users
 can type title:foo to find documents that have foo in the title
 field. But a query for just a colon (:) throws an error
 (org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I
 need to do more processing of the query before I pass it to Solr. I
 need to escape that colon or something.
 
 Is there some general advice on doing some sanity checks or escaping
 special characters on user-supplied queries before you pass them to
 Solr? Is it documented in the wiki? I'm using Solrj but I imagine the
 advice applies to everyone.

SolrJ has the ClientUtils.escapeQueryChars method, which will
automatically escape any character that has special meaning to the query
parser.  It does so by preceding it with a backslash.

http://lucene.apache.org/solr/4_7_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html#escapeQueryChars%28java.lang.String%29

You do need to be careful with it, though.  For a query formatted like
field:(value) you'd only want to apply it to the 'value' part, because
if you applied it to the whole query, the colon and parentheses would
become part of the query text -- probably not what you want.

Thanks,
Shawn



Re: Solr special characters like '(' and ''?

2014-04-09 Thread Erick Erickson
Note that when I mentioned filter these characters out I had
something like PatternReplaceCharFilterFactory or LowerCaseTokenizer
in mind rather than you having to do it manually. Doesn't help
figuring out what to escape on the URL though.

Best,
Erick

On Wed, Apr 9, 2014 at 8:05 AM, Shawn Heisey s...@elyograg.org wrote:
 On 4/9/2014 8:39 AM, Philip Durbin wrote:
 Filtering out special characters sounds like a good idea, or possibly
 escaping some of them. I definitely want to avoid brittleness.

 Right now I'm passing the query relatively as is which means users
 can type title:foo to find documents that have foo in the title
 field. But a query for just a colon (:) throws an error
 (org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I
 need to do more processing of the query before I pass it to Solr. I
 need to escape that colon or something.

 Is there some general advice on doing some sanity checks or escaping
 special characters on user-supplied queries before you pass them to
 Solr? Is it documented in the wiki? I'm using Solrj but I imagine the
 advice applies to everyone.

 SolrJ has the ClientUtils.escapeQueryChars method, which will
 automatically escape any character that has special meaning to the query
 parser.  It does so by preceding it with a backslash.

 http://lucene.apache.org/solr/4_7_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html#escapeQueryChars%28java.lang.String%29

 You do need to be careful with it, though.  For a query formatted like
 field:(value) you'd only want to apply it to the 'value' part, because
 if you applied it to the whole query, the colon and parentheses would
 become part of the query text -- probably not what you want.

 Thanks,
 Shawn



Re: Solr special characters like '(' and ''?

2014-04-09 Thread Furkan KAMACI
Hi;

I have developed a Search API for such kind of cases and generate Solr
query within that API. I have also have my own query syntax.

When a search query comes into my API I generate query and does not allow
for something like *:*. On the other hand I escape query string and append
the appropriate field for search query as like: field:(escaped_value) so
there is not a security concern about reaching the fields of schema or
escaping concern.

I think that building a search API something like that and handling
security, escaping etc. within it is a way you should consider. If try to
do something like that I can answer your questions.

Thanks;
Furkan KAMACI


2014-04-09 18:29 GMT+03:00 Erick Erickson erickerick...@gmail.com:

 Note that when I mentioned filter these characters out I had
 something like PatternReplaceCharFilterFactory or LowerCaseTokenizer
 in mind rather than you having to do it manually. Doesn't help
 figuring out what to escape on the URL though.

 Best,
 Erick

 On Wed, Apr 9, 2014 at 8:05 AM, Shawn Heisey s...@elyograg.org wrote:
  On 4/9/2014 8:39 AM, Philip Durbin wrote:
  Filtering out special characters sounds like a good idea, or possibly
  escaping some of them. I definitely want to avoid brittleness.
 
  Right now I'm passing the query relatively as is which means users
  can type title:foo to find documents that have foo in the title
  field. But a query for just a colon (:) throws an error
  (org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I
  need to do more processing of the query before I pass it to Solr. I
  need to escape that colon or something.
 
  Is there some general advice on doing some sanity checks or escaping
  special characters on user-supplied queries before you pass them to
  Solr? Is it documented in the wiki? I'm using Solrj but I imagine the
  advice applies to everyone.
 
  SolrJ has the ClientUtils.escapeQueryChars method, which will
  automatically escape any character that has special meaning to the query
  parser.  It does so by preceding it with a backslash.
 
 
 http://lucene.apache.org/solr/4_7_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html#escapeQueryChars%28java.lang.String%29
 
  You do need to be careful with it, though.  For a query formatted like
  field:(value) you'd only want to apply it to the 'value' part, because
  if you applied it to the whole query, the colon and parentheses would
  become part of the query text -- probably not what you want.
 
  Thanks,
  Shawn
 



Solr special characters like '(' and ''?

2014-04-08 Thread Peter Kirk
Hi

How to search for Solr special characters like '(' and ''?

I am trying to execute searches for products in my Solr (3.6.1) index, based 
on the categories to which these products belong.
The categories are stored in a multistring field for the products, and are 
hierarchical, and are fed to the index like:
A
A|B
A|B|C

So this product would actually belong to category named C, which is a child 
of B, which is a child of !A.

I am able to execute queries for simple category names like this (eg. 
fq=categories_string:A|B|C).

But some categories have Solr special characters in their names, like: D (E  
F)
(Real example: Power supplies (Battery and Solar)).

A query like fq=categories_string:A|B|D (E  F) simply fails.
But even if I try 
fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\)
(where I try to escape the special characters) does not find the products in 
this category, and actually finds other unrelated categories.

What am I doing wrong?

Thanks,
Peter



Re: Solr special characters like '(' and ''?

2014-04-08 Thread Ahmet Arslan
Hi Peter,

TermQueryParser is useful in your case. 
q={!term f=categories_string}A|B|D (E  F)



On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk wrote:
Hi

How to search for Solr special characters like '(' and ''?

I am trying to execute searches for products in my Solr (3.6.1) index, based 
on the categories to which these products belong.
The categories are stored in a multistring field for the products, and are 
hierarchical, and are fed to the index like:
A
A|B
A|B|C

So this product would actually belong to category named C, which is a child 
of B, which is a child of !A.

I am able to execute queries for simple category names like this (eg. 
fq=categories_string:A|B|C).

But some categories have Solr special characters in their names, like: D (E  
F)
(Real example: Power supplies (Battery and Solar)).

A query like fq=categories_string:A|B|D (E  F) simply fails.
But even if I try 
fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\)
(where I try to escape the special characters) does not find the products in 
this category, and actually finds other unrelated categories.

What am I doing wrong?

Thanks,
Peter



Re: Solr special characters like '(' and ''?

2014-04-08 Thread Erick Erickson
I'd seriously consider filtering these characters out when you index
and search, this is quite likely very brittle. The same item, say from
two different vendors, might have D (E  F) or D E  F. If you just
stripped all of the non alpha-num characters you'd likely get less
brittle results.

You know your problem domain better than I do though, so whatever
makes most sense.

Best,
Erick

On Tue, Apr 8, 2014 at 6:55 AM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Peter,

 TermQueryParser is useful in your case.
 q={!term f=categories_string}A|B|D (E  F)



 On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk wrote:
 Hi

 How to search for Solr special characters like '(' and ''?

 I am trying to execute searches for products in my Solr (3.6.1) index, 
 based on the categories to which these products belong.
 The categories are stored in a multistring field for the products, and are 
 hierarchical, and are fed to the index like:
 A
 A|B
 A|B|C

 So this product would actually belong to category named C, which is a child 
 of B, which is a child of !A.

 I am able to execute queries for simple category names like this (eg. 
 fq=categories_string:A|B|C).

 But some categories have Solr special characters in their names, like: D (E 
  F)
 (Real example: Power supplies (Battery and Solar)).

 A query like fq=categories_string:A|B|D (E  F) simply fails.
 But even if I try
 fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\)
 (where I try to escape the special characters) does not find the products in 
 this category, and actually finds other unrelated categories.

 What am I doing wrong?

 Thanks,
 Peter



RE: Solr special characters like '(' and ''?

2014-04-08 Thread Peter Kirk
Thanks for the comments, and for the idea for the term query parser.
This seems to work well (except I still can't get '' in a category name to 
work - I can get the (one and only) customer to change the category names).

I'll look into fixing the indexing side of things - could be an idea to strip 
out the special characters.
I'm working on the search side of things.

/Peter


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 8. april 2014 16:15
To: solr-user@lucene.apache.org; Ahmet Arslan
Subject: Re: Solr special characters like '(' and ''?

I'd seriously consider filtering these characters out when you index and 
search, this is quite likely very brittle. The same item, say from two 
different vendors, might have D (E  F) or D E  F. If you just stripped all of 
the non alpha-num characters you'd likely get less brittle results.

You know your problem domain better than I do though, so whatever makes most 
sense.

Best,
Erick

On Tue, Apr 8, 2014 at 6:55 AM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Peter,

 TermQueryParser is useful in your case.
 q={!term f=categories_string}A|B|D (E  F)



 On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk wrote:
 Hi

 How to search for Solr special characters like '(' and ''?

 I am trying to execute searches for products in my Solr (3.6.1) index, 
 based on the categories to which these products belong.
 The categories are stored in a multistring field for the products, and are 
 hierarchical, and are fed to the index like:
 A
 A|B
 A|B|C

 So this product would actually belong to category named C, which is a child 
 of B, which is a child of !A.

 I am able to execute queries for simple category names like this (eg. 
 fq=categories_string:A|B|C).

 But some categories have Solr special characters in their names, like: D (E 
  F)
 (Real example: Power supplies (Battery and Solar)).

 A query like fq=categories_string:A|B|D (E  F) simply fails.
 But even if I try
 fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\)
 (where I try to escape the special characters) does not find the products in 
 this category, and actually finds other unrelated categories.

 What am I doing wrong?

 Thanks,
 Peter



Re: Solr special characters like '(' and ''?

2014-04-08 Thread T. Kuro Kurosaka
I don't think  is special to the parser. Classic examples like ATT 
just work, as far as query parser is considered.

https://wiki.apache.org/solr/SolrQuerySyntax
even tells that you can escape the special meaning by the backslash.

 is special in the URL, however, and that has to be hex-escaped as %26.

On 04/08/2014 06:37 AM, Peter Kirk wrote:

Hi

How to search for Solr special characters like '(' and ''?



Kuro