I'm using default analyzer. Actually one that is set by default by Compass 
framework but I assume it is the same that would be used in Lucene by default.
Which one should I use?

-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Friday, July 10, 2009 6:13 PM
To: java-user@lucene.apache.org
Subject: Re: Hindi, diacritics and search results

Which analyzer in particular are you using?

Its probably not doing what you want for hindi. These "diacritics" are
important (vowels, etc).


On Fri, Jul 10, 2009 at 3:10 PM, OBender<osya_ben...@hotmail.com> wrote:
> Hi All,
>
>
>
> I'm using the default setup of lucene (no custom analyzers configured) and
> came across the following issue:
>
> In Hindi if there is a letter with a diacritic in a phrase lucene will find
> the phrase with this letter even if the search string is for the letter
> without a diacritics.
>
> Is this an expected behavior? Maybe this is standard for all languages with
> letters that have diacritics?
>
>
>
> From pure byte standpoint I can see the logic, the letter with diacritics
> takes 6 bytes (E0 A4 95 E0 A5 87) and the single letter takes  3 (E0 A4 95)
> so if I search for *some_letter* where some letter has code (E0 A4 95)
> lucene finds the "phrase" (E0 A4 95 E0 A5 87) that includes that letter.
>
>
>
> Any comments much appreciated.
>
>
>
> Thanks.
>
>
>
>



-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Checked by AVG - www.avg.com 
Version: 8.5.375 / Virus Database: 270.13.0/2209 - Release Date: 07/10/09 
17:57:00


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to