[ https://issues.apache.org/jira/browse/OPENNLP-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761328#comment-17761328 ]
Martin Wiesner commented on OPENNLP-1190: ----------------------------------------- In 2023, [https://www.lsi.upc.es/~nlp/tools/nerc/nerc.html] yields a 404 for which reason the resource mentioned on the mailing list in 2014 is no longer available this way. > CONLL02 format > -------------- > > Key: OPENNLP-1190 > URL: https://issues.apache.org/jira/browse/OPENNLP-1190 > Project: OpenNLP > Issue Type: Bug > Components: Formats > Affects Versions: tools-1.5.3 > Reporter: Luca > Priority: Major > Original Estimate: 1h > Remaining Estimate: 1h > > According to the documentation, the following should work > bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types > per > es_corpus_train_persons.txt > However currently it delivers error message since it expects 3 columns > instead of 2 that are in the dataset. > This is a bug, introduced at line 130 of > opennlp.tools.formats.Conll02NameSampleStream.java where a length of 3 is > imposed. -- This message was sent by Atlassian Jira (v8.20.10#820010)