Joern Kottmann created OPENNLP-1261:
---------------------------------------

             Summary: Lang Detect fails to predict language on long input texts
                 Key: OPENNLP-1261
                 URL: https://issues.apache.org/jira/browse/OPENNLP-1261
             Project: OpenNLP
          Issue Type: Improvement
            Reporter: Joern Kottmann


If the input text is very long, e.g. 100k chars, then the lang detect component 
fails to detect the language correctly, even though the text is only written in 
one language.

This issue was tracked down to the context generator, where the count of the 
ngrams are ignored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to