Re: Text search in Arabic

Mete Kural Sun, 23 May 2021 09:14:37 -0700

Thank you for all this information Uwe and Walter!

Let me digest this information and education myself on these matters and figure 
out a way forward.


Have a great one,
Mete


> On May 20, 2021, at 6:43 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> I recommend normalizing all characters with a compatibility transformation, 
> whether they are Arabic or not. 
> 
> We use this charFilter as the first step in every query and indexing analysis 
> chain.
> 
>         <charFilter class="solr.ICUNormalizer2CharFilterFactory"/>
> 
> You’ll also need to include the ICU library, which should be included by 
> default. Actually, the compatbility normalization should be done by default, 
> too. That transform was designed specifically for string matching and search.
> 
> We have this in every solrconfig.xml.
> 
>   <!-- extras for ICU-based Unicode normalization -->
>   <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lib/" 
> regex=".*\.jar" />
>   <lib 
> dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lucene-libs" 
> regex=".*\.jar" />
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On May 20, 2021, at 9:38 AM, Mete Kural <meteku...@icloud.com.INVALID> wrote:
>> 
>> Hello Michael,
>> 
>> Thank you very much for this information.
>> 
>> I will try at  java-u...@lucene.apache.org also.
>> 
>> By the way, is the Arabic analyzer referenced here 
>> (https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/ar)
>>  just for the Arabic language or all languages written with the Arabic 
>> script?
>> 
>> Thank you,
>> Mete
>> 
>> 
>>> On May 20, 2021, at 4:35 PM, Michael Wechner <michael.wech...@wyona.com> 
>>> wrote:
>>> 
>>> Hi Mete
>>> 
>>> You might also want to try the java-u...@lucene.apache.org mailing list
>>> 
>>> https://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg
>>> 
>>> Re languages other than english you might find more information at
>>> 
>>> https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ#LuceneFAQ-CanIuseLucenetoindextextinChinese,Japanese,Korean,andothermulti-bytecharactersets?
>>> 
>>> whereas I just realize that the following link does not work anymore
>>> 
>>> https://lucene.apache.org/core/lucene-sandbox/
>>> 
>>> Are these analyzers now inside
>>> 
>>> https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis
>>> https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/ar
>>> 
>>> ?
>>> 
>>> Thanks
>>> 
>>> Michael
>>> 
>>> 
>>> Am 20.05.21 um 14:48 schrieb Mete Kural:
>>>> Hello Lucene Community,
>>>> 
>>>> I hope this finds you all well. I want to ask you if this would be the 
>>>> right medium to discuss some matters surrounding text search in relation 
>>>> to variant Unicode codings of words in Arabic and Arabic scripted 
>>>> languages. This is not a great example but the said matters are similar to 
>>>> matters around Latin scripted searches where the letter “İ” needs to be 
>>>> substituted with “I” in searches and so forth. Would this mailing list be 
>>>> the best medium to discuss such matters? If not, would you mind 
>>>> recommending me a medium for discussion on this?
>>>> 
>>>> Kind regards,
>>>> Mete Kural
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Text search in Arabic

Reply via email to