Re: lucene farsi problem

Grant Ingersoll Thu, 01 May 2008 06:29:18 -0700


On May 1, 2008, at 4:36 AM, esra wrote:

Hi,

document's encoding is "UTF-8".
i tried the explain() method and the result for "د-ژ" rangesearching is:
fieldWeight(keywordIndex:Ø³Ø§Ø¨ ÙˆÙˆÙ�Ø± in 0),product of:
 1.0 = tf(termFreq(keywordIndex:Ø³Ø§Ø¨ ÙˆÙˆÙ�Ø±)=1)
 0.30685282 = idf(docFreq=1)
 1.0 = fieldNorm(field=keywordIndex, doc=0)

here keywordIndex is "ساب ووفر".
i also installed the "luke.jnlp" but i don't know what to check byLuke.



http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71

Luke can be used to view your index. Not saying it is your problemhere, but often times when I get back results that "seem" incorrect,the first thing I do is go look at my index using Luke, and comparethe "incorrect" document with what is in the query to see where the(mis)match is occurring. Usually, this analysis shows that mydocument/query is not what I thought it was.

Luke can browse documents and parse queries, amongst other usefulthings.

Thanks,

Esra



Grant Ingersoll-6 wrote:


I am not sure how Standard Analyzer will perform on Farsi.  The thing
to do now would be to get Luke and have a look at the actual document
that matches and see what it's tokens look like.  You might also try
using the explain() method to see why that document matches.

Also, are you sure you are loading the file w/ the proper encodings,
etc?

-Grant

On Apr 30, 2008, at 8:06 AM, esra wrote:


Hi,
thanks for your reply.
I am using StandartAnalyzer now and my xml document is like below:

<keyword><![CDATA[ساب ووفر]]></keyword>
    <description><![CDATA[یک ووفر که در محفظه ای
جدا از سایر درایور ها
قرار دارد تا صدایی با باس فوق العاده
پایین تولید کند. ]]></description>

i googled for farsi analyzer and found nothing also i am not sure it
if
would solve my problem or not.

Thanks,

Esra


Grant Ingersoll-6 wrote:


What Analyzer are you using?  You might try looking in Luke to see
what is in your index, etc.  It also isn't clear to me what your
documents look like.

As for a Farsi analyzer, I would Google "Farsi analyzer Lucene" and

see if you can find anything. Otherwise, you will have to writeyour

own (and donate it????)

-Grant

On Apr 30, 2008, at 3:21 AM, esra wrote:


hi,

i am using lucene's "IndexSearcher" to search the given xml by
keyword which
contains farsi information.
while searching i use ranges like

آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی

when i do search for  "د-ژ"  range the results are wrong , they
are
the
results of  " س-ظ "range.

for example when i do search for "د-ژ" one of the the resultsis

"ساب ووفر"

, this result also shown on the " س-ظ " range's result listwhich

is the
corret range.

As IndexSearcher use "compareTo" method and this method uses
unicodes for
comparing, i found the unicodes of the characters.

د=U+62F
ژ = U+698
and the first letter of "ساب ووفر " is  س = U+633

Do you have any idea how to solve this problem, there areanalyzers

for
different languages ,
will this be usefull if so do you know where to find a farsi
analyzer?

I would bu glad if you help.

thanks ,

Esra

--
View this message in context:

http://www.nabble.com/lucene-farsi-problem-tp16977096p16977096.html

Sent from the Lucene - Java Users mailing list archive at
Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
View this message in context:
http://www.nabble.com/lucene-farsi-problem-tp16977096p16980977.html

Sent from the Lucene - Java Users mailing list archive atNabble.com.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
View this message in context: 
http://www.nabble.com/lucene-farsi-problem-tp16977096p16993174.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene farsi problem

Reply via email to