Thanks so I do not need to escape the "&" in "dog & cat"
But I do need to escape the "&&" in "dog && cat" correct? And do I escape as "dog \&& cat" or as "dog \&\& cat"? Ilya -----Original Message----- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Monday, September 17, 2012 10:55 AM To: java-user@lucene.apache.org Subject: Re: how to fully preprocess query before fuzzy search? " Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - && || ! ( ) { } [ ] ^ " ~ * ? : \ / " See: http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html So, maybe you should escape all special characters, and then add the fuzzy query. Note: In 4.0 the fuzzy query is limited to an editing distance of 2. -- Jack Krupansky -----Original Message----- From: Ilya Zavorin Sent: Monday, September 17, 2012 10:41 AM To: java-user@lucene.apache.org Subject: how to fully preprocess query before fuzzy search? I am processing a bunch of text coming out of OCR, i.e. it's machine-generated text that contains some errors like garbage characters attached to words, letters replaced with similarly looking characters (e.g. "I" with "1") etc. The text is whitespace-tokenized and I am trying to match each token against an index using a fuzzy match, so that small amounts of occasional garbage in the tokens do not prevent a match. Right now I am preprocessing each query as follows: //term = token Query queryF = parser.Parse(term.Replace("~", "") + "~"); However, searcher.Search still throws "can't parse" exceptions for queries that contain brackets, quotes and other garbage characters. So how should I fully preprocess a query to avoid these exceptions? Looks like I just need to remove a certain set of characters just like the tilde is removed above. What is the complete set of such characters? Do I need to do any other preprocess? Thanks, Ilya Zavorin --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org