We were using the aramorph library for some time and so we mapped out the set of features it provides, they come as follows:
---------------------------------------------------------------------------------------------------------------------------------------------------------------- The ء and ~ are considered unique characters. * أ , آ, ا, and إ are distinct * و and ؤ are distinct * ى and ئ are distinct * The ا and ة (denoting the feminine adjective) at the end of a word are optional. * The ال, ب, ل, ك, بال, كال, لل at the beginning of a word are optional * All حركات as well as the ّ (شدّة) are ignored. * The ي , و , ات , ون denoting the plural form of a word are optional. If the indexed word ends with a ة its plural, which replaces the ة with ات , is recognized. The following examples illustrate these rules: Indexed Word Search Term Success الحياة للحياة True حياة True حيا False ألحياة False إلحياة False كالحياة True بالحياة True بحياة True لحياة True دولارا دولار True بدولار True بالدولار True الدولار True دؤلارا False دولأرا False دولارأ False الكاتب كاتب True لكاتب True كاتبة True الكاتبة True الكاتبات True كاتبون True كاتبو \ كاتبي True كتب False جميلة جميلات True جميل True الجمال False بنت ابنة False بن True ابن True ابنت False ---------------------------------------------------------------------------------------------------------------------------------------------------------------- while with the new one, we only got matches for: | فّ فُ فٌ فف فِِ فٍ ف and the likes of that. -walid On Thu, 2009-07-23 at 09:33 -0400, Robert Muir wrote: > walid, can you provide any more information other than "very poor result"? > > Others have not measured much difference between morphological > analysis and light stemming: > http://ciir.cs.umass.edu/pubfiles/ir-249.pdf > > > On Thu, Jul 23, 2009 at 7:34 AM, walid<walid.bak...@elementn.com> wrote: > > http://issues.apache.org/jira/browse/LUCENE-1406 > > http://issues.apache.org/jira/browse/LUCENE-153 > > > > based on this, there are two options: > > 1- using the aramorph library > > 2- moving the code from trunk to the current release and using the > > provided arabic analyzer > > > > 1- the library works very well in indexing, tokenizing, stemming and > > everything, but causes memory leaks > > 2- the provided library has a very poor result compared to the aramorph > > library. > > > > Is there a plan to have better arabic support with full morphological > > analysis support? > > > > walid > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > >