[
https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482923
]
Hoss Man commented on LUCENE-841:
---------------------------------
there are lots of OSes and editors where changing the file encoding is somewhat
hard .. particularly if you have reasons why other files need to be in ASCII to
deal with other systems.
It's a trade off, people with UTF-8 capable environments would probably rather
see the real character, while people still using ascii would probably rather
see \uXXXX ... i would think the \xXXXX approach is the most universally
functional, since anyone can lookup a character from it's character code, but
people looking at funky control characters can't always tell what character
code it is.
(I wonder if there is an fast/easy way to get a char from a Unicode Character
name?)
> Replace UTF8 characters in stemmer code with integer values.
> ------------------------------------------------------------
>
> Key: LUCENE-841
> URL: https://issues.apache.org/jira/browse/LUCENE-841
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Karl Wettin
> Priority: Critical
>
> BrazillianStemmer, GermanStemmer, FrenchStemmer and DutchStemmer all contains
> UTF characters in the java code. All environments does not handle that. It
> really ought to be integer values instead.
> I'll come up with a patch sooner or later.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]