Hi, Jairo,

I think you will have to perform two conversions:

1) From CONLL02 to the NameFinder format:

bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es
-types per > esp_nf.train

2) From NameFinder format to SentenceDetector format:

bin/opennlp SentenceDetectorConverter -data esp_nf.train -encoding <your
sys encoding> -detokenizer es-detokenizer.xml

You will have to create a detokenizer dictionary. Maybe the English one
will work for you:
http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/lang/en/tokenizer/en-detokenizer.xml?view=markup

*NOTE:*
While trying it using the OpenNLP 1.5.2 I got the following error:

$ bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es
-types per
IO error while reading training data or indexing data: Expected three
fields per line in training data!

Is it a bug or I am doing something wrong?



On Wed, Feb 8, 2012 at 2:18 PM, Jairo Sarabia
<[email protected]>wrote:

> Hello,
>
> Forgive my ignorance but, how is it done?
>
> Thank you!,
>
> Jairo
>
> 2012/2/7 Joern Kottmann <[email protected]>
>
> > Hello,
> >
> > sorry we don't offer a model currently. But with the new tooling
> > it should be fairly easy to train one on the CONLL02 data.
> >
> > Hope that helps,
> > Jörn
> >
> > On Mon, Feb 6, 2012 at 5:55 PM, Jairo Sarabia
> > <[email protected]>wrote:
> >
> > > Hello all!,
> > >
> > > I'm interested in the extraction of data from the Spanish DBpedia dumps
> > and
> > > need a Spanish sentence detector. I have seen that there is no model
> for
> > > opennlp 1.5. How I can obtain a model for this?
> > > It is important that is for the version 1.5
> > >
> > > Thanks in advance!,
> > >
> > > Jairo Sarabia
> > >
> >
>

Reply via email to