On 5/13/11 11:33 AM, Jean-Claude Dauphin wrote:
Hi,

I tried to produce train models for french from a set of french human
resource positions data which are splitted in sentences and use it as sample
train data stream.
It works fine for the sentence detector model using *
SentenceDectectorME.train*

However, if I use the same sample as Tokenizer training content with *
opennlp.tools.tokenize.TokenizerME.train* , I got the following error:

The maxent model is not compatible!

The error message sounds a bit strange, what it means is that you only train
with NO_SPLIT events (I guess). The produced model will not be able to split any tokens.

We should fix the model validation code, or put out some more meaningful error
message.

Anyway, to solve you problem rename your <SKIP> tags to <SPLIT>.

Have a look at our documentation here:
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.tokenizer.cmdline.training

Hope that helps,
Jörn

Reply via email to