[jira] [Commented] (LUCENE-5927) 4.9 -> 4.10 change in StandardTokenizer behavior on \u1aa2

Robert Muir (JIRA) Mon, 08 Sep 2014 14:49:33 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126163#comment-14126163
 ]


Robert Muir commented on LUCENE-5927:
-------------------------------------

{quote}
Or just stop doing version-specific implementations (as will be the case in 
5.x)?
{quote}

In my opinion, thats unrelated to this issue (again for this particular issue, 
I think simulating the old bug is overkill because it just will not be useful). 

As far as the 4.6 unicode changes, the API complexity is out of the way in 5.x. 
 Analyzers have getVersion/setVersion and if we want to add 
Lucene40StandardTokenizer and have them make use of this to emulate 4.0 (as 
opposed to 4.6+) grammar, thats fine. With the API ryan has, it wont cause 
users "pain" and keeps the back compat.

> 4.9 -> 4.10 change in StandardTokenizer behavior on \u1aa2
> ----------------------------------------------------------
>
>                 Key: LUCENE-5927
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5927
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Ryan Ernst
>
> In 4.9, this string was broken into 2 tokens by StandardTokenizer:
> "\u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72" = "\u1aa2", " 
> \u1a7f\u1a6f\u1a6f\u1a61\u1a72"
> However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5927) 4.9 -> 4.10 change in StandardTokenizer behavior on \u1aa2

Reply via email to