[ 
https://issues.apache.org/jira/browse/LUCENE-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929389#action_12929389
 ] 

Robert Muir commented on LUCENE-2745:
-------------------------------------

steven, check out the link at the bottom of that article.
especially the top... it explains the use in the language,
particularly to block cursive joining for prefixes, suffixes,
compounds. we split on this and the affixes are in the stoplist

this is how the whole analyzer works, more examples in
the tests... I can give you more refs later, when I have
better bandwidth... but its specific to this language.
we shouldn't split on it in general... also often a real
space is used instead, so this approach is the simplest
for the language

> ArabicAnalyzer - the ability to recognise email addresses host names and so on
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2745
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2745
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2
>         Environment: All
>            Reporter: M Alexander
>
> The ArabicAnalyzer does not recognise email addresses, hostnames and so on. 
> For example,
> a...@hotmail.com
> will be tokenised to [adam] [hotmail] [com]
> It would be great if the ArabicAnalyzer can tokenises this to 
> [a...@hotmail.com]. The same applies to hostnames and so on.
> Can this be resolved? I hope so
> Thanks
> MAA

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to