[ https://issues.apache.org/jira/browse/LUCENE-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929367#action_12929367 ]
M Alexander commented on LUCENE-2745: ------------------------------------- Yes Robert, I have faced the diacritics problem. I am trying to have an Analyzer that would not break on diacritics as well as recognising email addresses, hostnames and so on (which Arabic text may contain). This is why I asked the question to see if there is a way to have full Arabic analysis (including diacritics) along with recognising email addresses, hostnames, etc at the same Analyzer. I will try your suggestions and will share the output. Thanks Robert for your help > ArabicAnalyzer - the ability to recognise email addresses host names and so on > ------------------------------------------------------------------------------ > > Key: LUCENE-2745 > URL: https://issues.apache.org/jira/browse/LUCENE-2745 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Affects Versions: 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2 > Environment: All > Reporter: M Alexander > > The ArabicAnalyzer does not recognise email addresses, hostnames and so on. > For example, > a...@hotmail.com > will be tokenised to [adam] [hotmail] [com] > It would be great if the ArabicAnalyzer can tokenises this to > [a...@hotmail.com]. The same applies to hostnames and so on. > Can this be resolved? I hope so > Thanks > MAA -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org