I'm using default analyzer. Actually one that is set by default by Compass framework but I assume it is the same that would be used in Lucene by default. Which one should I use?
-----Original Message----- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, July 10, 2009 6:13 PM To: java-user@lucene.apache.org Subject: Re: Hindi, diacritics and search results Which analyzer in particular are you using? Its probably not doing what you want for hindi. These "diacritics" are important (vowels, etc). On Fri, Jul 10, 2009 at 3:10 PM, OBender<osya_ben...@hotmail.com> wrote: > Hi All, > > > > I'm using the default setup of lucene (no custom analyzers configured) and > came across the following issue: > > In Hindi if there is a letter with a diacritic in a phrase lucene will find > the phrase with this letter even if the search string is for the letter > without a diacritics. > > Is this an expected behavior? Maybe this is standard for all languages with > letters that have diacritics? > > > > From pure byte standpoint I can see the logic, the letter with diacritics > takes 6 bytes (E0 A4 95 E0 A5 87) and the single letter takes 3 (E0 A4 95) > so if I search for *some_letter* where some letter has code (E0 A4 95) > lucene finds the "phrase" (E0 A4 95 E0 A5 87) that includes that letter. > > > > Any comments much appreciated. > > > > Thanks. > > > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.0/2209 - Release Date: 07/10/09 17:57:00 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org