I worked on trying to develop one and it became a colossal pain, a conclusive Arabic dictionary is about 20 volumes roughly the size of an encyclopedia, just to give you some background when you search for a word in the encyclopedia you have to reduce it to either it's 2 or three letter root, then you can look for your desired word underneath that root, reducing the words to that root as part of the stemming is useless because words belonging to the same root more often than not have nothing to do with each other furthermore, Arabic uses phonetic indicators on each letter called diacritics that change the way you pronounce the word which in turn changes the words meaning so two word spelled exactly the same way with different diacritics will mean two separate things, I've seen Arabic stemmers that kinda of work, but none of them are open source, this is a good paper from Berkeley that outlines the work and the challenges, http://metadata.sims.berkeley.edu/papers/trec2002.pdf, hope it helps.

Nader Henein

Scott Smith wrote:

Is anyone aware of an open source (non-GPL; i.e.., free for commercial
use) Arabic analyzer for Lucene?  Does Arabic really require a stemmer
as well (some of the reading I've seen on the web would suggest that a
stemmer is almost a necessity with Arabic to get anything useful where
it is not with other languages).



Scott









--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to