Re: maxent model is not compatible with Tokenizer training

Jörn Kottmann Fri, 13 May 2011 02:47:49 -0700

On 5/13/11 11:33 AM, Jean-Claude Dauphin wrote:

Hi,


I tried to produce train models for french from a set of french human
resource positions data which are splitted in sentences and use it as sample
train data stream.
It works fine for the sentence detector model using *
SentenceDectectorME.train*

However, if I use the same sample as Tokenizer training content with *
opennlp.tools.tokenize.TokenizerME.train* , I got the following error:

The maxent model is not compatible!


The error message sounds a bit strange, what it means is that you only train

with NO_SPLIT events (I guess). The produced model will not be able tosplit any tokens.

We should fix the model validation code, or put out some more meaningfulerror

message.

Anyway, to solve you problem rename your <SKIP> tags to <SPLIT>.

Have a look at our documentation here:
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.tokenizer.cmdline.training

Hope that helps,
Jörn

Re: maxent model is not compatible with Tokenizer training

Reply via email to