[jira] [Commented] (OPENNLP-1385) Fix discrepancy in tokenizer documentation

ASF GitHub Bot (Jira) Thu, 17 Nov 2022 12:44:04 -0800


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635535#comment-17635535
 ]


ASF GitHub Bot commented on OPENNLP-1385:
-----------------------------------------

jzonthemtn commented on PR #428:
URL: https://github.com/apache/opennlp/pull/428#issuecomment-1319178029

   Thanks @atarora - will get this merged shortly!




> Fix discrepancy in tokenizer documentation
> ------------------------------------------
>
>                 Key: OPENNLP-1385
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1385
>             Project: OpenNLP
>          Issue Type: Task
>          Components: Documentation, Tokenizer
>    Affects Versions: 1.9.4, 2.0.0
>            Reporter: Jeff Zemerick
>            Assignee: Atita Arora
>            Priority: Major
>
> In the tokenizer documentation in the user guide, the usage of the tool shows 
> a cutoff option:
>         -cutoff num
>                 minimal number of times a feature must be seen, ignored if 
> -params is used.
> However, this option is not present in the usage when running the CLI:
> {quote}Arguments description:
>         -factory factoryName
>                 A sub-class of TokenizerFactory where to get implementation 
> and resources.
>         -abbDict path
>                 abbreviation dictionary in XML format.
>         -alphaNumOpt isAlphaNumOpt
>                 Optimization flag to skip alpha numeric tokens for further 
> tokenization
>         -params paramsFile
>                 training parameters file.
>         -lang language
>                 language which is being processed.
>         -model modelFile
>                 output model file.
>         -data sampleData
>                 data to be used, usually a file name.
>         -encoding charsetName
>                 encoding for reading and writing text, if absent the system 
> default is used.
> {quote}
> The CLI does not recognize cutoff as an option so it is likely the 
> documentation is incorrect but a review of the code should probably be done 
> first to be sure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (OPENNLP-1385) Fix discrepancy in tokenizer documentation

Reply via email to