[jira] [Commented] (OPENNLP-1190) CONLL02 format

Martin Wiesner (Jira) Fri, 01 Sep 2023 07:39:42 -0700


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761328#comment-17761328
 ]


Martin Wiesner commented on OPENNLP-1190:
-----------------------------------------

In 2023, [https://www.lsi.upc.es/~nlp/tools/nerc/nerc.html] yields a 404 for 
which reason the resource mentioned on the mailing list in 2014 is no longer 
available this way.

> CONLL02 format
> --------------
>
>                 Key: OPENNLP-1190
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1190
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Formats
>    Affects Versions: tools-1.5.3
>            Reporter: Luca
>            Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According to the documentation, the following should work
>  bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types 
> per > es_corpus_train_persons.txt
> However currently it delivers error message since  it expects 3 columns 
> instead of 2 that are in the dataset.
> This is a bug, introduced at line 130 of   
> opennlp.tools.formats.Conll02NameSampleStream.java where a length of 3 is 
> imposed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (OPENNLP-1190) CONLL02 format

Reply via email to