Nice that it works now.

I forgot to mention that you should remove the SPLIT tags in order
to train a sentence detector.

Jörn

On 5/13/11 11:56 AM, Jean-Claude Dauphin wrote:
Thanks a lot Jörn, it works now. I don't know why I typed SKIP instead of
SPLIT and I was focused on the error message.

Sorry for taking yr time.

Best wishes,

Jean-Claude


On Fri, May 13, 2011 at 11:47 AM, Jörn Kottmann<[email protected]>  wrote:

On 5/13/11 11:33 AM, Jean-Claude Dauphin wrote:

Hi,

I tried to produce train models for french from a set of french human
resource positions data which are splitted in sentences and use it as
sample
train data stream.
It works fine for the sentence detector model using *
SentenceDectectorME.train*

However, if I use the same sample as Tokenizer training content with *
opennlp.tools.tokenize.TokenizerME.train* , I got the following error:

The maxent model is not compatible!

The error message sounds a bit strange, what it means is that you only
train
with NO_SPLIT events (I guess). The produced model will not be able to
split any tokens.

We should fix the model validation code, or put out some more meaningful
error
message.

Anyway, to solve you problem rename your<SKIP>  tags to<SPLIT>.

Have a look at our documentation here:

http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.tokenizer.cmdline.training

Hope that helps,
Jörn




Reply via email to