OpenNLP must work independent of the platform local
---------------------------------------------------
Key: OPENNLP-72
URL: https://issues.apache.org/jira/browse/OPENNLP-72
Project: OpenNLP
Issue Type: Bug
Reporter: Jörn Kottmann
The OpenNLP feature generation must produce the exact same results independent
of the platform local.
The feature generation code frequently uses String.toLowerCase() which
depending on the local might
produce different results, e.g. when used with a turkish local. That should of
course not be the case,
since the lower cased string will not match the feature which is in the
statistical model and might not
have been generated on a machine with a turkish local.
Instead the method Characters.toLowerCase(...) should be used, because it is
only using the UnicodeData file
and cannot perform locale-sensitive mappings.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.