I worked on trying to develop one and it became a colossal pain, a
conclusive Arabic dictionary is about 20 volumes roughly the size of an
encyclopedia, just to give you some background when you search for a
word in the encyclopedia you have to reduce it to either it's 2 or three
letter root, then you can look for your desired word underneath that
root, reducing the words to that root as part of the stemming is useless
because words belonging to the same root more often than not have
nothing to do with each other furthermore, Arabic uses phonetic
indicators on each letter called diacritics that change the way you
pronounce the word which in turn changes the words meaning so two word
spelled exactly the same way with different diacritics will mean two
separate things, I've seen Arabic stemmers that kinda of work, but none
of them are open source, this is a good paper from Berkeley that
outlines the work and the challenges,
http://metadata.sims.berkeley.edu/papers/trec2002.pdf, hope it helps.
Nader Henein
Scott Smith wrote:
Is anyone aware of an open source (non-GPL; i.e.., free for commercial
use) Arabic analyzer for Lucene? Does Arabic really require a stemmer
as well (some of the reading I've seen on the web would suggest that a
stemmer is almost a necessity with Arabic to get anything useful where
it is not with other languages).
Scott
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]