Re: Arabic analyser

David Murgatroyd Wed, 11 Nov 2015 03:38:42 -0800

>So BasisTech works for the latest version of solr?

Yes, our latest Arabic analyzer supports up through 5.3.x. But since the
examples you give are names, it sounds like you might instead/also want our
fuzzy name matcher which will find "عبد الله" not only with "عبدالله" but
also with typos like "عبالله" or even translations into 'English' like
"abdollah". You can visit http://www.basistech.com/solutions/search/solr/
and fill out the form there to learn more (mentioning this thread). See
also http://www.slideshare.net/dmurga/simple-fuzzy-name-matching-in-solr
for a talk I gave at the San Francisco Solr Meet-up in April on how it
plugs in to Solr by creating a special field type you can query just like
any other; this was also presented at Lucene/Solr Revolution last month (
http://lucenerevolution.org/sessions/simple-fuzzy-name-matching-in-solr/).


Best,
David Murgatroyd
(VP, Engineering, Basis Technology)

On Wed, Nov 11, 2015 at 4:31 AM, Mahmoud Almokadem <prog.mahm...@gmail.com>
wrote:

> Thank Alex,
>
> So BasisTech works for the latest version of solr?
>
> Sincerely,
> Mahmoud
>
> On Tue, Nov 10, 2015 at 5:28 PM, Alexandre Rafalovitch <arafa...@gmail.com
> >
> wrote:
>
> > If this is for a significant project and you are ready to pay for it,
> > BasisTech has commercial solutions in this area I believe.
> >
> > Regards,
> >    Alex.
> > ----
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 November 2015 at 08:46, Mahmoud Almokadem <prog.mahm...@gmail.com>
> > wrote:
> > > Thanks Pual,
> > >
> > > Arabic analyser applying filters of normalisation and stemming only for
> > > single terms out of standard tokenzier.
> > > Gathering all synonyms will be hard work. Should I customise my
> Tokenizer
> > > to handle this case?
> > >
> > > Sincerely,
> > > Mahmoud
> > >
> > >
> > > On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht <p...@hoplahup.net>
> > wrote:
> > >
> > >> Mahmoud,
> > >>
> > >> there is an arabic analyzer:
> > >>   https://wiki.apache.org/solr/LanguageAnalysis#Arabic
> > >> doesn't it do what you describe?
> > >> Synonyms probably work there too.
> > >>
> > >> Paul
> > >>
> > >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> > >> > 9 novembre 2015 17:47
> > >> > Thanks Jack,
> > >> >
> > >> > This is a good solution, but we have more combinations that I think
> > >> > can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’
> > >> > and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo
> > >> > Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be
> > >> > applied for each separate term.
> > >> >
> > >> > Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a
> > >> > single term?
> > >> >
> > >> > Thanks,
> > >> > Mahmoud
> > >> >
> > >> >
> > >> >
> > >> > Jack Krupansky <mailto:jack.krupan...@gmail.com>
> > >> > 9 novembre 2015 16:47
> > >> > Use an index-time (but not query time) synonym filter with a rule
> > like:
> > >> >
> > >> > Abd Allah,Abdallah
> > >> >
> > >> > This will index the combined word in addition to the separate words.
> > >> >
> > >> > -- Jack Krupansky
> > >> >
> > >> > On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <
> > >> prog.mahm...@gmail.com>
> > >> >
> > >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> > >> > 9 novembre 2015 10:48
> > >> > Hello,
> > >> >
> > >> > We are indexing Arabic content and facing a problem for tokenizing
> > multi
> > >> > terms phrases like 'عبد الله' 'Abd Allah', so users will search for
> > >> > 'عبدالله' 'Abdallah' without space and need to get the results of
> 'عبد
> > >> > الله' with space. We are using StandardTokenizer.
> > >> >
> > >> >
> > >> > Is there any configurations to handle this case?
> > >> >
> > >> > Thank you,
> > >> > Mahmoud
> > >> >
> > >>
> > >>
> >
>

Re: Arabic analyser

Reply via email to