[
https://issues.apache.org/jira/browse/OPENNLP-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Wiesner closed OPENNLP-327.
----------------------------------
Resolution: Delivered
> Doccats bag of word feature generator should not use numbers as features
> ------------------------------------------------------------------------
>
> Key: OPENNLP-327
> URL: https://issues.apache.org/jira/browse/OPENNLP-327
> Project: OpenNLP
> Issue Type: Improvement
> Components: Doccat
> Reporter: Jörn Kottmann
> Assignee: Jörn Kottmann
> Priority: Minor
>
> It turned out that Doccats bag of word feature generator can be very
> sensitive to numbers when used for language identification. Therefore numbers
> should not be included in the bag of words.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)