there is really no default in lucene a good start for hindi would be to try WhitespaceAnalyzer.
On Fri, Jul 10, 2009 at 9:13 PM, OBender Hotmail<osya_ben...@hotmail.com> wrote: > I'm using default analyzer. Actually one that is set by default by Compass > framework but I assume it is the same that would be used in Lucene by default. > Which one should I use? > > -----Original Message----- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Friday, July 10, 2009 6:13 PM > To: java-user@lucene.apache.org > Subject: Re: Hindi, diacritics and search results > > Which analyzer in particular are you using? > > Its probably not doing what you want for hindi. These "diacritics" are > important (vowels, etc). > > > On Fri, Jul 10, 2009 at 3:10 PM, OBender<osya_ben...@hotmail.com> wrote: >> Hi All, >> >> >> >> I'm using the default setup of lucene (no custom analyzers configured) and >> came across the following issue: >> >> In Hindi if there is a letter with a diacritic in a phrase lucene will find >> the phrase with this letter even if the search string is for the letter >> without a diacritics. >> >> Is this an expected behavior? Maybe this is standard for all languages with >> letters that have diacritics? >> >> >> >> From pure byte standpoint I can see the logic, the letter with diacritics >> takes 6 bytes (E0 A4 95 E0 A5 87) and the single letter takes 3 (E0 A4 95) >> so if I search for *some_letter* where some letter has code (E0 A4 95) >> lucene finds the "phrase" (E0 A4 95 E0 A5 87) that includes that letter. >> >> >> >> Any comments much appreciated. >> >> >> >> Thanks. >> >> >> >> > > > > -- > Robert Muir > rcm...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.0/2209 - Release Date: 07/10/09 > 17:57:00 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org