Re: Morphological Search Problem

Grant Ingersoll Mon, 02 Apr 2007 05:56:01 -0700

Have you used Luke to see what is actually in the index? Or writtensome test cases for your analyzer to know that the appropriate tokensare coming out of your analyzer?

Also, could you give more details about the filters you are using? Iam not familiar w/ ExactTokensConstructorFilter, etc.

The formatting is a little hard to read, but I think it says you arepassing the ArabicStemmer to the SnowballFilter, correct? I assumeyou are dealing w/ mixed content, correct? That is, you have Arabicand English in the same token stream? I know when I was working onour Arabic/English project, we had to be careful about mixed contentlike this.




On Apr 2, 2007, at 7:57 AM, Shaimaa Mohamed wrote:

Dear all,

We are using a Unified Analyzer as the analyzer of Lucene so as to be
able to index and search Arabic and English documents as well.

Here is the code:



public TokenStream tokenStream(String FieldName, Reader reader)

    {



            switch(analysisMode) {

                  case UNIFIED:

                        return new ExactTokensContructorFilter(

                                    new SnowballFilter(

newArabicStemmer(


                                                                  new
ExactTokensSpecifierFilter(


getStandardAnalyzerStream(


reader)),

false,false)


                                ,latinLanguage));

                  case EXACT:

                        return new ExactTokensContructorFilter(

                                                new
ExactTokensSpecifierFilter(


getStandardAnalyzerStream(


reader)));

            }

            return null;

    }



But the problem is that the results of the morphological search in
English and Arabic are not good, for example:

The data in which I search contains "test", "testing" and "tested",then

when I search for "testing", it doesn't give "test" in the search
results, although that when I traced it I found that the tokens of
"testing" contains "test". But when I search for "manage", it gives me
"management" in the search results which is correct. So what's the
difference between both cases?



Beside that I tried to use only the Snowball Analyzer instead of the

Unified Analyzer and apply the same test but this time it givescorrect

and good results!!

So can anyone help, why using Unified Analyzer affects the results?



Note: latinLanguage in the above code = "English"



Thanks & Best Regards,

------------------------------------

Shaimaa Mohamed

Team Leader

ICT Department

Bibliotheca Alexandrina

P.O. Box 138, Chatby

Alexandria 21526, Egypt

Tel: +(203) 483 9999, Ext:1418

Fax: +(203) 482 0405

Email: [EMAIL PROTECTED]
<BLOCKED::mailto:[EMAIL PROTECTED]>

Web Site: www.bibalex.org <blocked::http://www.bibalex.org>


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Morphological Search Problem

Reply via email to