Re: Solr special characters like '(' and ''?
mark. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-special-characters-like-and-tp4129854p4130333.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr special characters like '(' and ''?
Filtering out special characters sounds like a good idea, or possibly escaping some of them. I definitely want to avoid brittleness. Right now I'm passing the query relatively as is which means users can type title:foo to find documents that have foo in the title field. But a query for just a colon (:) throws an error (org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I need to do more processing of the query before I pass it to Solr. I need to escape that colon or something. Is there some general advice on doing some sanity checks or escaping special characters on user-supplied queries before you pass them to Solr? Is it documented in the wiki? I'm using Solrj but I imagine the advice applies to everyone. Phil p.s. I noticed a note saying These characters are part of the query syntax and must be escaped at https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/solrj/src/java/org/apache/solr/client/solrj/util/ClientUtils.java#L231 and learned of this part of the code from http://lucene.472066.n3.nabble.com/What-is-the-full-list-of-Solr-Special-Characters-td4094053.html On Tue, Apr 8, 2014 at 10:14 AM, Erick Erickson erickerick...@gmail.com wrote: I'd seriously consider filtering these characters out when you index and search, this is quite likely very brittle. The same item, say from two different vendors, might have D (E F) or D E F. If you just stripped all of the non alpha-num characters you'd likely get less brittle results. You know your problem domain better than I do though, so whatever makes most sense. Best, Erick On Tue, Apr 8, 2014 at 6:55 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Peter, TermQueryParser is useful in your case. q={!term f=categories_string}A|B|D (E F) On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk wrote: Hi How to search for Solr special characters like '(' and ''? I am trying to execute searches for products in my Solr (3.6.1) index, based on the categories to which these products belong. The categories are stored in a multistring field for the products, and are hierarchical, and are fed to the index like: A A|B A|B|C So this product would actually belong to category named C, which is a child of B, which is a child of !A. I am able to execute queries for simple category names like this (eg. fq=categories_string:A|B|C). But some categories have Solr special characters in their names, like: D (E F) (Real example: Power supplies (Battery and Solar)). A query like fq=categories_string:A|B|D (E F) simply fails. But even if I try fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\) (where I try to escape the special characters) does not find the products in this category, and actually finds other unrelated categories. What am I doing wrong? Thanks, Peter -- Philip Durbin Software Developer for http://thedata.org http://www.iq.harvard.edu/people/philip-durbin
Re: Solr special characters like '(' and ''?
On 4/9/2014 8:39 AM, Philip Durbin wrote: Filtering out special characters sounds like a good idea, or possibly escaping some of them. I definitely want to avoid brittleness. Right now I'm passing the query relatively as is which means users can type title:foo to find documents that have foo in the title field. But a query for just a colon (:) throws an error (org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I need to do more processing of the query before I pass it to Solr. I need to escape that colon or something. Is there some general advice on doing some sanity checks or escaping special characters on user-supplied queries before you pass them to Solr? Is it documented in the wiki? I'm using Solrj but I imagine the advice applies to everyone. SolrJ has the ClientUtils.escapeQueryChars method, which will automatically escape any character that has special meaning to the query parser. It does so by preceding it with a backslash. http://lucene.apache.org/solr/4_7_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html#escapeQueryChars%28java.lang.String%29 You do need to be careful with it, though. For a query formatted like field:(value) you'd only want to apply it to the 'value' part, because if you applied it to the whole query, the colon and parentheses would become part of the query text -- probably not what you want. Thanks, Shawn
Re: Solr special characters like '(' and ''?
Note that when I mentioned filter these characters out I had something like PatternReplaceCharFilterFactory or LowerCaseTokenizer in mind rather than you having to do it manually. Doesn't help figuring out what to escape on the URL though. Best, Erick On Wed, Apr 9, 2014 at 8:05 AM, Shawn Heisey s...@elyograg.org wrote: On 4/9/2014 8:39 AM, Philip Durbin wrote: Filtering out special characters sounds like a good idea, or possibly escaping some of them. I definitely want to avoid brittleness. Right now I'm passing the query relatively as is which means users can type title:foo to find documents that have foo in the title field. But a query for just a colon (:) throws an error (org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I need to do more processing of the query before I pass it to Solr. I need to escape that colon or something. Is there some general advice on doing some sanity checks or escaping special characters on user-supplied queries before you pass them to Solr? Is it documented in the wiki? I'm using Solrj but I imagine the advice applies to everyone. SolrJ has the ClientUtils.escapeQueryChars method, which will automatically escape any character that has special meaning to the query parser. It does so by preceding it with a backslash. http://lucene.apache.org/solr/4_7_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html#escapeQueryChars%28java.lang.String%29 You do need to be careful with it, though. For a query formatted like field:(value) you'd only want to apply it to the 'value' part, because if you applied it to the whole query, the colon and parentheses would become part of the query text -- probably not what you want. Thanks, Shawn
Re: Solr special characters like '(' and ''?
Hi; I have developed a Search API for such kind of cases and generate Solr query within that API. I have also have my own query syntax. When a search query comes into my API I generate query and does not allow for something like *:*. On the other hand I escape query string and append the appropriate field for search query as like: field:(escaped_value) so there is not a security concern about reaching the fields of schema or escaping concern. I think that building a search API something like that and handling security, escaping etc. within it is a way you should consider. If try to do something like that I can answer your questions. Thanks; Furkan KAMACI 2014-04-09 18:29 GMT+03:00 Erick Erickson erickerick...@gmail.com: Note that when I mentioned filter these characters out I had something like PatternReplaceCharFilterFactory or LowerCaseTokenizer in mind rather than you having to do it manually. Doesn't help figuring out what to escape on the URL though. Best, Erick On Wed, Apr 9, 2014 at 8:05 AM, Shawn Heisey s...@elyograg.org wrote: On 4/9/2014 8:39 AM, Philip Durbin wrote: Filtering out special characters sounds like a good idea, or possibly escaping some of them. I definitely want to avoid brittleness. Right now I'm passing the query relatively as is which means users can type title:foo to find documents that have foo in the title field. But a query for just a colon (:) throws an error (org.apache.solr.search.SyntaxError: Cannot parse ':') so obviously I need to do more processing of the query before I pass it to Solr. I need to escape that colon or something. Is there some general advice on doing some sanity checks or escaping special characters on user-supplied queries before you pass them to Solr? Is it documented in the wiki? I'm using Solrj but I imagine the advice applies to everyone. SolrJ has the ClientUtils.escapeQueryChars method, which will automatically escape any character that has special meaning to the query parser. It does so by preceding it with a backslash. http://lucene.apache.org/solr/4_7_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html#escapeQueryChars%28java.lang.String%29 You do need to be careful with it, though. For a query formatted like field:(value) you'd only want to apply it to the 'value' part, because if you applied it to the whole query, the colon and parentheses would become part of the query text -- probably not what you want. Thanks, Shawn
Solr special characters like '(' and ''?
Hi How to search for Solr special characters like '(' and ''? I am trying to execute searches for products in my Solr (3.6.1) index, based on the categories to which these products belong. The categories are stored in a multistring field for the products, and are hierarchical, and are fed to the index like: A A|B A|B|C So this product would actually belong to category named C, which is a child of B, which is a child of !A. I am able to execute queries for simple category names like this (eg. fq=categories_string:A|B|C). But some categories have Solr special characters in their names, like: D (E F) (Real example: Power supplies (Battery and Solar)). A query like fq=categories_string:A|B|D (E F) simply fails. But even if I try fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\) (where I try to escape the special characters) does not find the products in this category, and actually finds other unrelated categories. What am I doing wrong? Thanks, Peter
Re: Solr special characters like '(' and ''?
Hi Peter, TermQueryParser is useful in your case. q={!term f=categories_string}A|B|D (E F) On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk wrote: Hi How to search for Solr special characters like '(' and ''? I am trying to execute searches for products in my Solr (3.6.1) index, based on the categories to which these products belong. The categories are stored in a multistring field for the products, and are hierarchical, and are fed to the index like: A A|B A|B|C So this product would actually belong to category named C, which is a child of B, which is a child of !A. I am able to execute queries for simple category names like this (eg. fq=categories_string:A|B|C). But some categories have Solr special characters in their names, like: D (E F) (Real example: Power supplies (Battery and Solar)). A query like fq=categories_string:A|B|D (E F) simply fails. But even if I try fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\) (where I try to escape the special characters) does not find the products in this category, and actually finds other unrelated categories. What am I doing wrong? Thanks, Peter
Re: Solr special characters like '(' and ''?
I'd seriously consider filtering these characters out when you index and search, this is quite likely very brittle. The same item, say from two different vendors, might have D (E F) or D E F. If you just stripped all of the non alpha-num characters you'd likely get less brittle results. You know your problem domain better than I do though, so whatever makes most sense. Best, Erick On Tue, Apr 8, 2014 at 6:55 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Peter, TermQueryParser is useful in your case. q={!term f=categories_string}A|B|D (E F) On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk wrote: Hi How to search for Solr special characters like '(' and ''? I am trying to execute searches for products in my Solr (3.6.1) index, based on the categories to which these products belong. The categories are stored in a multistring field for the products, and are hierarchical, and are fed to the index like: A A|B A|B|C So this product would actually belong to category named C, which is a child of B, which is a child of !A. I am able to execute queries for simple category names like this (eg. fq=categories_string:A|B|C). But some categories have Solr special characters in their names, like: D (E F) (Real example: Power supplies (Battery and Solar)). A query like fq=categories_string:A|B|D (E F) simply fails. But even if I try fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\) (where I try to escape the special characters) does not find the products in this category, and actually finds other unrelated categories. What am I doing wrong? Thanks, Peter
RE: Solr special characters like '(' and ''?
Thanks for the comments, and for the idea for the term query parser. This seems to work well (except I still can't get '' in a category name to work - I can get the (one and only) customer to change the category names). I'll look into fixing the indexing side of things - could be an idea to strip out the special characters. I'm working on the search side of things. /Peter -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 8. april 2014 16:15 To: solr-user@lucene.apache.org; Ahmet Arslan Subject: Re: Solr special characters like '(' and ''? I'd seriously consider filtering these characters out when you index and search, this is quite likely very brittle. The same item, say from two different vendors, might have D (E F) or D E F. If you just stripped all of the non alpha-num characters you'd likely get less brittle results. You know your problem domain better than I do though, so whatever makes most sense. Best, Erick On Tue, Apr 8, 2014 at 6:55 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Peter, TermQueryParser is useful in your case. q={!term f=categories_string}A|B|D (E F) On Tuesday, April 8, 2014 4:37 PM, Peter Kirk p...@alpha-solutions.dk wrote: Hi How to search for Solr special characters like '(' and ''? I am trying to execute searches for products in my Solr (3.6.1) index, based on the categories to which these products belong. The categories are stored in a multistring field for the products, and are hierarchical, and are fed to the index like: A A|B A|B|C So this product would actually belong to category named C, which is a child of B, which is a child of !A. I am able to execute queries for simple category names like this (eg. fq=categories_string:A|B|C). But some categories have Solr special characters in their names, like: D (E F) (Real example: Power supplies (Battery and Solar)). A query like fq=categories_string:A|B|D (E F) simply fails. But even if I try fq=categories_string:A|B|D%20\(E%20%26amp%3B%20F\) (where I try to escape the special characters) does not find the products in this category, and actually finds other unrelated categories. What am I doing wrong? Thanks, Peter
Re: Solr special characters like '(' and ''?
I don't think is special to the parser. Classic examples like ATT just work, as far as query parser is considered. https://wiki.apache.org/solr/SolrQuerySyntax even tells that you can escape the special meaning by the backslash. is special in the URL, however, and that has to be hex-escaped as %26. On 04/08/2014 06:37 AM, Peter Kirk wrote: Hi How to search for Solr special characters like '(' and ''? Kuro