Thank you for all this information Uwe and Walter! Let me digest this information and education myself on these matters and figure out a way forward.
Have a great one, Mete > On May 20, 2021, at 6:43 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > I recommend normalizing all characters with a compatibility transformation, > whether they are Arabic or not. > > We use this charFilter as the first step in every query and indexing analysis > chain. > > <charFilter class="solr.ICUNormalizer2CharFilterFactory"/> > > You’ll also need to include the ICU library, which should be included by > default. Actually, the compatbility normalization should be done by default, > too. That transform was designed specifically for string matching and search. > > We have this in every solrconfig.xml. > > <!-- extras for ICU-based Unicode normalization --> > <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lib/" > regex=".*\.jar" /> > <lib > dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lucene-libs" > regex=".*\.jar" /> > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On May 20, 2021, at 9:38 AM, Mete Kural <meteku...@icloud.com.INVALID> wrote: >> >> Hello Michael, >> >> Thank you very much for this information. >> >> I will try at java-u...@lucene.apache.org also. >> >> By the way, is the Arabic analyzer referenced here >> (https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/ar) >> just for the Arabic language or all languages written with the Arabic >> script? >> >> Thank you, >> Mete >> >> >>> On May 20, 2021, at 4:35 PM, Michael Wechner <michael.wech...@wyona.com> >>> wrote: >>> >>> Hi Mete >>> >>> You might also want to try the java-u...@lucene.apache.org mailing list >>> >>> https://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg >>> >>> Re languages other than english you might find more information at >>> >>> https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ#LuceneFAQ-CanIuseLucenetoindextextinChinese,Japanese,Korean,andothermulti-bytecharactersets? >>> >>> whereas I just realize that the following link does not work anymore >>> >>> https://lucene.apache.org/core/lucene-sandbox/ >>> >>> Are these analyzers now inside >>> >>> https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis >>> https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/ar >>> >>> ? >>> >>> Thanks >>> >>> Michael >>> >>> >>> Am 20.05.21 um 14:48 schrieb Mete Kural: >>>> Hello Lucene Community, >>>> >>>> I hope this finds you all well. I want to ask you if this would be the >>>> right medium to discuss some matters surrounding text search in relation >>>> to variant Unicode codings of words in Arabic and Arabic scripted >>>> languages. This is not a great example but the said matters are similar to >>>> matters around Latin scripted searches where the letter “İ” needs to be >>>> substituted with “I” in searches and so forth. Would this mailing list be >>>> the best medium to discuss such matters? If not, would you mind >>>> recommending me a medium for discussion on this? >>>> >>>> Kind regards, >>>> Mete Kural >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org