[ https://issues.apache.org/jira/browse/OPENNLP-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629913#comment-15629913 ]
Joern Kottmann edited comment on OPENNLP-857 at 11/2/16 6:20 PM: ----------------------------------------------------------------- Thanks that is really nice work. We can apply that like it is. I removed one if statement and just initialize the tokenizer variable to the white space tokenizer. was (Author: joern): Thanks that is really nice work. We can apply that like it is. I remove one if statement and just initialize the tokenizer variable to the white space tokenizer. > ParserTool should take use Tokenizer instance. It should not use > java.util.StringTokenizer > ------------------------------------------------------------------------------------------ > > Key: OPENNLP-857 > URL: https://issues.apache.org/jira/browse/OPENNLP-857 > Project: OpenNLP > Issue Type: Improvement > Components: Parser > Affects Versions: 1.6.0 > Reporter: Tristan Nixon > Assignee: Joern Kottmann > Fix For: 1.6.1 > > Attachments: ParserToolTokenize.patch > > > It would be nice if the ParserTool would make use of a real tokenizer. In > addition to being the "right" thing to do, it would obviate issues like > OPENNLP-240 when using the parser tool. > While I realize that java.util.StringTokenizer effectively does the same work > as WhitespaceTokenizer, it seems odd to use the former when the latter exists. > To this end, I'm attaching a patch that adds an additional method > public static Parse[] parseLine(String line, Parser parser, Tokenizer > tokenizer, int numParses) > I've left the existing method > public static Parse[] parseLine(String line, Parser parser, int numParses) > in for convenience and backwards compatibility. It simply calls the new > method with WhitespaceTokenizer.INSTANCE > For good measure, I've added a new command-line argument -tk, which takes the > name of a tokenizer model. If none is specified, it will fall back on the > current behavior of using the whitespace tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)