Hi, document's encoding is "UTF-8".
i tried the explain() method and the result for "د-ژ" range searching is: fieldWeight(keywordIndex:ساب ووÙ�ر in 0), product of: 1.0 = tf(termFreq(keywordIndex:ساب ووÙ�ر)=1) 0.30685282 = idf(docFreq=1) 1.0 = fieldNorm(field=keywordIndex, doc=0) here keywordIndex is "ساب ووفر". i also installed the "luke.jnlp" but i don't know what to check by Luke. Thanks, Esra Grant Ingersoll-6 wrote: > > I am not sure how Standard Analyzer will perform on Farsi. The thing > to do now would be to get Luke and have a look at the actual document > that matches and see what it's tokens look like. You might also try > using the explain() method to see why that document matches. > > Also, are you sure you are loading the file w/ the proper encodings, > etc? > > -Grant > > On Apr 30, 2008, at 8:06 AM, esra wrote: > >> >> Hi, >> thanks for your reply. >> I am using StandartAnalyzer now and my xml document is like below: >> >> <keyword><![CDATA[ساب ووفر]]></keyword> >> <description><![CDATA[یک ووفر که در محفظه ای >> جدا از سایر درایور ها >> قرار دارد تا صدایی با باس فوق العاده >> پایین تولید کند. ]]></description> >> >> i googled for farsi analyzer and found nothing also i am not sure it >> if >> would solve my problem or not. >> >> Thanks, >> >> Esra >> >> >> Grant Ingersoll-6 wrote: >>> >>> What Analyzer are you using? You might try looking in Luke to see >>> what is in your index, etc. It also isn't clear to me what your >>> documents look like. >>> >>> As for a Farsi analyzer, I would Google "Farsi analyzer Lucene" and >>> see if you can find anything. Otherwise, you will have to write your >>> own (and donate it????) >>> >>> -Grant >>> >>> On Apr 30, 2008, at 3:21 AM, esra wrote: >>> >>>> >>>> hi, >>>> >>>> i am using lucene's "IndexSearcher" to search the given xml by >>>> keyword which >>>> contains farsi information. >>>> while searching i use ranges like >>>> >>>> آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی >>>> >>>> when i do search for "د-ژ" range the results are wrong , they >>>> are >>>> the >>>> results of " س-ظ "range. >>>> >>>> for example when i do search for "د-ژ" one of the the results is >>>> "ساب ووفر" >>>> , this result also shown on the " س-ظ " range's result list which >>>> is the >>>> corret range. >>>> >>>> As IndexSearcher use "compareTo" method and this method uses >>>> unicodes for >>>> comparing, i found the unicodes of the characters. >>>> >>>> د=U+62F >>>> ژ = U+698 >>>> and the first letter of "ساب ووفر " is س = U+633 >>>> >>>> Do you have any idea how to solve this problem, there are analyzers >>>> for >>>> different languages , >>>> will this be usefull if so do you know where to find a farsi >>>> analyzer? >>>> >>>> I would bu glad if you help. >>>> >>>> thanks , >>>> >>>> Esra >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/lucene-farsi-problem-tp16977096p16977096.html >>>> Sent from the Lucene - Java Users mailing list archive at >>>> Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>> >>> -------------------------- >>> Grant Ingersoll >>> >>> Lucene Helpful Hints: >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>> http://wiki.apache.org/lucene-java/LuceneFAQ >>> >>> >>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/lucene-farsi-problem-tp16977096p16980977.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > -------------------------- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p16993174.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]