Handling special characters in Lucene 4.0
I have created strings like the below searchtext +sampletext and when I try to search the following using *** or *+** it does not give any result. I am using QueryParser.escape(String s) method to handle the special characters but does not look like it did anything. Also, when I search something like this: title:search* it works and returns the search result but when I search like the following, it wont work title:*** ( No Result) Is the above valid search criteria? If not, can someone suggest here what would be appropriate search criteria? Seems like StandardAnalyzer is stripping out all the special characters and searching and that's why when we search without special characters, it does seem to work. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
Maybe you are not using the same analyzer at index and query time. Even though you are correctly escaping the special query syntax characters, either the query analyzer is removing them or your index analyzer removed them. What analyzer are you using at index time? And, what analyzer are you using at query time? -- Jack Krupansky -Original Message- From: saisantoshi Sent: Sunday, October 20, 2013 12:47 PM To: java-user@lucene.apache.org Subject: Handling special characters in Lucene 4.0 I have created strings like the below searchtext +sampletext and when I try to search the following using *** or *+** it does not give any result. I am using QueryParser.escape(String s) method to handle the special characters but does not look like it did anything. Also, when I search something like this: title:search* it works and returns the search result but when I search like the following, it wont work title:*** ( No Result) Is the above valid search criteria? If not, can someone suggest here what would be appropriate search criteria? Seems like StandardAnalyzer is stripping out all the special characters and searching and that's why when we search without special characters, it does seem to work. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
StandardAnalyzer both at index and search time. We use the default one and don't have any custom analyzers. Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096710.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
The standard analyzer should remove those ampersands and pluses, so the core alpha terms should be matched. You would need to use the white space analyzer or a custom analyzer to preserve such special characters. Please give a specific indexed text string and a specific query that fails against it. Also, QueryParser.escape will also escape asterisks, so they won't perform wildcard query. And then the standard analyzer will remove the asterisks as it does with most punctuation. If you switch to an analyzer that preserves special characters, you can then manually escape special characters with a backslash, and then leave the asterisk unescaped to perform a wildcard query. -- Jack Krupansky -Original Message- From: saisantoshi Sent: Sunday, October 20, 2013 6:02 PM To: java-user@lucene.apache.org Subject: Re: Handling special characters in Lucene 4.0 StandardAnalyzer both at index and search time. We use the default one and don't have any custom analyzers. Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096710.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
Thanks. So, if I understand correctly, StandardAnalyzer wont work for the following below as it strips out the special characters and does search only on searchText ( in this case). queryText = *searchText* If we want to do a search like *** then we need to use WhiteSpaceAnalyzer. Please let me know if my understanding is correct. Also, I am not sure as the following is mentioned in the lucene docs? Is the below not for StandardAnalyzer then? It is not mentioned that it wont work for StandardAnalyzer. /* Escaping Special Characters Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ / To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 */ Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096727.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
It might be helpful if you would explain, at a higher level, what you are trying to accomplish. Where do these things come from? What higher-level problem are you trying to solve? On Sun, Oct 20, 2013 at 7:12 PM, saisantoshi saisantosh...@gmail.com wrote: Thanks. So, if I understand correctly, StandardAnalyzer wont work for the following below as it strips out the special characters and does search only on searchText ( in this case). queryText = *searchText* If we want to do a search like *** then we need to use WhiteSpaceAnalyzer. Please let me know if my understanding is correct. Also, I am not sure as the following is mentioned in the lucene docs? Is the below not for StandardAnalyzer then? It is not mentioned that it wont work for StandardAnalyzer. /* Escaping Special Characters Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ / To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 */ Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096727.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
Right, the Escaping Special Characters is simply to escape query operators like (means AND) and + (which means AND or MUST). Yes, the white space analyzer could be used, or a custom analyzer that uses the white space tokenizer and then also uses a filter to strip out any punctuation characters that you don't want to keep (e.g., period, comma, semicolon, parentheses, etc.) The query parser itself knows nothing about what your chosen analyzer does. But the query parser does specially interpret the special characters that the escape method mentions. -- Jack Krupansky -Original Message- From: saisantoshi Sent: Sunday, October 20, 2013 7:12 PM To: java-user@lucene.apache.org Subject: Re: Handling special characters in Lucene 4.0 Thanks. So, if I understand correctly, StandardAnalyzer wont work for the following below as it strips out the special characters and does search only on searchText ( in this case). queryText = *searchText* If we want to do a search like *** then we need to use WhiteSpaceAnalyzer. Please let me know if my understanding is correct. Also, I am not sure as the following is mentioned in the lucene docs? Is the below not for StandardAnalyzer then? It is not mentioned that it wont work for StandardAnalyzer. /* Escaping Special Characters Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ / To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 */ Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096727.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
what about other characters like ','( quote) characters. We have a requirement that a text can start with 'sampletext' and when I search with a '* it does not return any results but instead when I search with sample*, it does return the result. Thanks, Ranjith, -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096732.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Handling special characters in Lucene 4.0
Yes, other special (punctuation) characters will be preserved by the white space analyzer, but must be escaped in query strings. You will have to manually escape them with a backslash, since the QueryParser.escape method will escape asterisk as well, which would disable wildcard query. -- Jack Krupansky -Original Message- From: saisantoshi Sent: Sunday, October 20, 2013 7:43 PM To: java-user@lucene.apache.org Subject: Re: Handling special characters in Lucene 4.0 what about other characters like ','( quote) characters. We have a requirement that a text can start with 'sampletext' and when I search with a '* it does not return any results but instead when I search with sample*, it does return the result. Thanks, Ranjith, -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096732.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org