[ 
https://issues.apache.org/jira/browse/LUCENE-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930857#action_12930857
 ] 

Steven Rowe commented on LUCENE-2745:
-------------------------------------

bq. how difficult is it to make the new StandardTokenizer (branch_3X) with its 
new capabilities (including properly tokenizing Arabic as well as identifying 
email addresses, hostnames, etc) to work with version 2.9.2? 

You wouldn't be able to just drop the files in and compile, but backporting to 
2.9.X would definitely be possible.

Here are the things I found looking through CHANGES.txt on branch_3x that would 
require attention if you were to backport to 2.9.2:

* LUCENE-2302: TermAttribute -> CharTermAttribute
* LUCENE-2074: Java4 -> Java5 regeneration of StandardTokenizerImpl* from 
.jflex source; support for different behavior based on Lucene Version

There are probably some other things, not sure what.

Likely LUCENE-2302 is the biggest issue (it will block compilation), but if I 
remember correctly, the change is fairly simple.

> ArabicAnalyzer - the ability to recognise email addresses host names and so on
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2745
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2745
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2
>         Environment: All
>            Reporter: M Alexander
>
> The ArabicAnalyzer does not recognise email addresses, hostnames and so on. 
> For example,
> a...@hotmail.com
> will be tokenised to [adam] [hotmail] [com]
> It would be great if the ArabicAnalyzer can tokenises this to 
> [a...@hotmail.com]. The same applies to hostnames and so on.
> Can this be resolved? I hope so
> Thanks
> MAA

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to