Re: Help in Arabic Analysers with Lucene on Windows

Girish Naik Mon, 29 Dec 2008 05:20:12 -0800

Sorry for that,

Here is how the Analyzer is Selected:

 public static Analyzer getAnalyzerInstance(String localeKey) {
    Analyzer analyzer = null;
    if (localeKey == null || localeKey.trim().equals("")) {
        localeKey = AppContext.getSetting("defaultLocale");
        System.out.println("<><><>><><><><><Locale key taken as Default ");
    } else {
        // localeKey may be a csv of locales, in which case picj the first
        // one.
        localeKey = StringUtils.split(localeKey, ",")[0].trim();
        System.out.println("<><><>><><><><><Locale key is trimmed");
    }
    System.out.println("<><><>><><><><><Locale is " + localeKey);
    String name = (String) _analyzerMap.get(localeKey);
    System.out.println("<><><>><><><><><Name from Locale is " + name);
    if (name == null) {
        analyzer = new StandardAnalyzer();
    } else {
        // if (name.equalsIgnoreCase("Arabic")) {
        // analyzer = new ArabicAnalyzer();
        // } else {
        analyzer = new SnowballAnalyzer(name);
        // }
    }
    return analyzer;
    }

While Indexing some are analyzed and some are not...

 document.add(new Field(FIELD_DOCUMENT_CREATED_ON, LocaleUtils
            .convert8859_6ToUTF8(com.aurigalogic.activesite.field.Field
                .indexableDate(avsDoc.getCreatedOn())),
            Field.Store.YES, Field.Index.NOT_ANALYZED));
... 
document.add(new Field(FIELD_CONTENT_TYPE, LocaleUtils
            .convert8859_6ToUTF8(version.getDocument()
                .getContentDescriptor().getName()),
            Field.Store.YES, Field.Index.ANALYZED));

Currently the method LocaleUtils.convert8859_6ToUTF8 does nothing but returns the parameter as is.

While seraching the Query parser etc. are created like

Analyzer analyzer = AnalyzerSelector.getAnalyzerInstance(locale);
...
QueryParser qparser = new QueryParser(Constants.FIELD_BODY, analyzer);
...

So while posting the form with a Arabic word does not fetch the results. An English word does work though!!

I would be more that helpful if anything else is required.

Regards,

Please do not print this email unless it is absolutely necessary.

Girish Naik
Development Lead

Neev Information Technologies Pvt Ltd
Bangalore, Karnataka India

Mobile: 91 09740091638
Email: girish.n...@gmail.com
IM: girish.naik (Skype)

http://www.linkedin.com/in/girishnaik

Join Neev Information Technologies Private Limited Group in LinkedIn

Fools rush in where angels fear to tread

See who we know in common

Want a signature like this?

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

On 12/29/2008 6:16 PM, Grant Ingersoll wrote:

Hi Girish,

Can you provide some sample code and info about what isn't working? All you have said so far is that the Arabic Analyzer doesn't work for you, but you have said nothing about how you are actually using it. Are you getting exceptions? Do the tokens not look right? Are no results coming back? Have you looked at your index in Luke?

I'm going to take a wild stab in the dark and guess that you are not reading in the input in the right encoding.

-Grant

On Dec 29, 2008, at 7:19 AM, Girish Naik wrote:

Hi,
    I am having a hard time in indexing the Arabic content and searching the same via Lucene. I have also used a Arabic Analyzer from the Lucene package but had no luck. I have also used a snowball jar but it doesn't contain an Arabic stemmer. So i had put the Lucene Arabic Stemmer in snowball jar (with modifications :-X ) but still have not got any luck so far.

    Moreover when i dont use any stemmers/ analyzer the search works perfectly on a Linux systems, but with Windows system the search does not work at all =-O

If anybody has any kind of solution or ideas, then please send it across. I will be very happy to implement and test them.

Thanks in Advance.

--

Regards,

Please do not print this email unless it is absolutely necessary.
Girish Naik
Development Lead

Neev Information Technologies Pvt Ltd
Bangalore, Karnataka India

<banner-5b.png>    Mobile: 91 09740091638
Email: girish.n...@gmail.com
IM: girish.naik (Skype)
http://www.linkedin.com/in/girishnaik

<banner-2c.png>    <neev_logo.gif>

Fools rush in where angels fear to tread
See who we know in common    Want a signature like this?
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Help in Arabic Analysers with Lucene on Windows

Reply via email to