date:20131020

Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi

I have created strings like the below

searchtext
+sampletext

and when I try to search the following using *** or *+** it does not give
any result.

I am using QueryParser.escape(String s) method to handle the special
characters but does not look like it did anything.

Also, when I search something like this:

title:search*

it works and returns the search result

but when I search like the following, it wont work
title:***

( No Result)

Is the above valid search criteria? If not, can someone suggest here what
would be appropriate search criteria?

Seems like StandardAnalyzer is stripping out all the special characters and
searching and that's why when we search without special characters, it does
seem to work.

Thanks,
Sai.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky

Maybe you are not using the same analyzer at index and query time. Even 
though you are correctly escaping the special query syntax characters, 
either the query analyzer is removing them or your index analyzer removed 
them. What analyzer are you using at index time? And, what analyzer are you 
using at query time?


-- Jack Krupansky

-Original Message- 
From: saisantoshi

Sent: Sunday, October 20, 2013 12:47 PM
To: java-user@lucene.apache.org
Subject: Handling special characters in Lucene 4.0

I have created strings like the below

searchtext
+sampletext

and when I try to search the following using *** or *+** it does not give
any result.

I am using QueryParser.escape(String s) method to handle the special
characters but does not look like it did anything.

Also, when I search something like this:

title:search*

it works and returns the search result

but when I search like the following, it wont work
title:***

( No Result)

Is the above valid search criteria? If not, can someone suggest here what
would be appropriate search criteria?

Seems like StandardAnalyzer is stripping out all the special characters and
searching and that's why when we search without special characters, it does
seem to work.

Thanks,
Sai.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674.html

Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi

StandardAnalyzer both at index and search time. We use the default one and
don't have any custom analyzers.

Thanks,
Sai



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096710.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky

The standard analyzer should remove those ampersands and pluses, so the core 
alpha terms should be matched. You would need to use the white space 
analyzer or a custom analyzer to preserve such special characters.


Please give a specific indexed text string and a specific query that fails 
against it.


Also, QueryParser.escape will also escape asterisks, so they won't perform 
wildcard query. And then the standard analyzer will remove the asterisks as 
it does with most punctuation. If you switch to an analyzer that preserves 
special characters, you can then manually escape special characters with a 
backslash, and then leave the asterisk unescaped to perform a wildcard 
query.


-- Jack Krupansky

-Original Message- 
From: saisantoshi

Sent: Sunday, October 20, 2013 6:02 PM
To: java-user@lucene.apache.org
Subject: Re: Handling special characters in Lucene 4.0

StandardAnalyzer both at index and search time. We use the default one and
don't have any custom analyzers.

Thanks,
Sai



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096710.html

Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi

Thanks.

So, if I understand correctly, StandardAnalyzer wont work for the following
below as it strips out the special characters and does search only on
searchText ( in this case).

queryText = *searchText*

If we want to do a search like *** then we need to use
WhiteSpaceAnalyzer. Please let me know if my understanding is correct.

Also, I am not sure as the following is mentioned in the lucene docs? Is the
below not for StandardAnalyzer then? It is not mentioned that it wont work
for StandardAnalyzer.

/*
Escaping Special Characters

Lucene supports escaping special characters that are part of the query
syntax. The current list special characters are

+ -  || ! ( ) { } [ ] ^  ~ * ? : \ /

To escape these character use the \ before the character. For example to
search for (1+1):2 use the query:

\(1\+1\)\:2

*/

Thanks,
Sai.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096727.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Benson Margulies

It might be helpful if you would explain, at a higher level, what you
are trying to accomplish. Where do these things come from? What
higher-level problem are you trying to solve?

On Sun, Oct 20, 2013 at 7:12 PM, saisantoshi saisantosh...@gmail.com wrote:
 Thanks.

 So, if I understand correctly, StandardAnalyzer wont work for the following
 below as it strips out the special characters and does search only on
 searchText ( in this case).

 queryText = *searchText*

 If we want to do a search like *** then we need to use
 WhiteSpaceAnalyzer. Please let me know if my understanding is correct.

 Also, I am not sure as the following is mentioned in the lucene docs? Is the
 below not for StandardAnalyzer then? It is not mentioned that it wont work
 for StandardAnalyzer.

 /*
 Escaping Special Characters

 Lucene supports escaping special characters that are part of the query
 syntax. The current list special characters are

 + -  || ! ( ) { } [ ] ^  ~ * ? : \ /

 To escape these character use the \ before the character. For example to
 search for (1+1):2 use the query:

 \(1\+1\)\:2

 */

 Thanks,
 Sai.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096727.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky

Right, the Escaping Special Characters is simply to escape query operators 
like  (means AND) and + (which means AND or MUST). Yes, the 
white space analyzer could be used,  or a custom analyzer that uses the 
white space tokenizer and then also uses a filter to strip out any 
punctuation characters that you don't want to keep (e.g., period, comma, 
semicolon, parentheses, etc.)


The query parser itself knows nothing about what your chosen analyzer does. 
But the query parser does specially interpret the special characters that 
the escape method mentions.


-- Jack Krupansky

-Original Message- 
From: saisantoshi

Sent: Sunday, October 20, 2013 7:12 PM
To: java-user@lucene.apache.org
Subject: Re: Handling special characters in Lucene 4.0

Thanks.

So, if I understand correctly, StandardAnalyzer wont work for the following
below as it strips out the special characters and does search only on
searchText ( in this case).

queryText = *searchText*

If we want to do a search like *** then we need to use
WhiteSpaceAnalyzer. Please let me know if my understanding is correct.

Also, I am not sure as the following is mentioned in the lucene docs? Is the
below not for StandardAnalyzer then? It is not mentioned that it wont work
for StandardAnalyzer.

/*
Escaping Special Characters

Lucene supports escaping special characters that are part of the query
syntax. The current list special characters are

+ -  || ! ( ) { } [ ] ^  ~ * ? : \ /

To escape these character use the \ before the character. For example to
search for (1+1):2 use the query:

\(1\+1\)\:2

*/

Thanks,
Sai.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096727.html

Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi

what about other characters like ','( quote) characters. We have a
requirement that a text can start with 'sampletext' and when I search with a
'* it does not return any results but instead when I search with sample*, it
does return the result.

Thanks,
Ranjith,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096732.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky

Yes, other special (punctuation) characters will be preserved by the white 
space analyzer, but must be escaped in query strings. You will have to 
manually escape them with a backslash, since the QueryParser.escape method 
will escape asterisk as well, which would disable wildcard query.


-- Jack Krupansky

-Original Message- 
From: saisantoshi

Sent: Sunday, October 20, 2013 7:43 PM
To: java-user@lucene.apache.org
Subject: Re: Handling special characters in Lucene 4.0

what about other characters like ','( quote) characters. We have a
requirement that a text can start with 'sampletext' and when I search with a
'* it does not return any results but instead when I search with sample*, it
does return the result.

Thanks,
Ranjith,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096732.html

Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

Re: Handling special characters in Lucene 4.0

9 matches

Site Navigation

Mail list logo

Footer information