[ https://issues.apache.org/jira/browse/OPENNLP-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635535#comment-17635535 ]
ASF GitHub Bot commented on OPENNLP-1385: ----------------------------------------- jzonthemtn commented on PR #428: URL: https://github.com/apache/opennlp/pull/428#issuecomment-1319178029 Thanks @atarora - will get this merged shortly! > Fix discrepancy in tokenizer documentation > ------------------------------------------ > > Key: OPENNLP-1385 > URL: https://issues.apache.org/jira/browse/OPENNLP-1385 > Project: OpenNLP > Issue Type: Task > Components: Documentation, Tokenizer > Affects Versions: 1.9.4, 2.0.0 > Reporter: Jeff Zemerick > Assignee: Atita Arora > Priority: Major > > In the tokenizer documentation in the user guide, the usage of the tool shows > a cutoff option: > -cutoff num > minimal number of times a feature must be seen, ignored if > -params is used. > However, this option is not present in the usage when running the CLI: > {quote}Arguments description: > -factory factoryName > A sub-class of TokenizerFactory where to get implementation > and resources. > -abbDict path > abbreviation dictionary in XML format. > -alphaNumOpt isAlphaNumOpt > Optimization flag to skip alpha numeric tokens for further > tokenization > -params paramsFile > training parameters file. > -lang language > language which is being processed. > -model modelFile > output model file. > -data sampleData > data to be used, usually a file name. > -encoding charsetName > encoding for reading and writing text, if absent the system > default is used. > {quote} > The CLI does not recognize cutoff as an option so it is likely the > documentation is incorrect but a review of the code should probably be done > first to be sure. -- This message was sent by Atlassian Jira (v8.20.10#820010)