Hello,

your training data should also contain non job title tokens to be able
to work. Anyway the exception you are getting should not be thrown.
We already fix a bug related to that, can you tell us which version
you are using?

The training data format is actually:
<START:designation> COO <END>

See the two extra spaces.
There is no reason to write it 5 times into your training
file, you can just train with a cutoff 0.

If you are not using a new snapshot version already,
then give our 1.5.1 release candidate a try and see if the
bug is fixed there:
http://people.apache.org/~joern/releases/opennlp-1.5.1-incubating/rc2/

Hope that helps,
Jörn

On 3/4/11 5:32 AM, Ninaad Joshi wrote:
Hi,

i am trying to train a job title model with a training data file. But, when
i try to train, i get the following error:

Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
     at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
     at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
     at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
     at App.Train(App.java:49)
     at App.main(App.java:33)

My training code looks like this:

     public void Train() throws IOException
     {
         String trainFilename = "training\\Designation-train.txt";
         String modelFile = "models\\en-ner-designation.bin";

         ObjectStream<String>  lineStream = new PlainTextByLineStream(new
FileInputStream(trainFilename), "UTF-8");
         ObjectStream<NameSample>  sampleStream = new
NameSampleDataStream(lineStream);

         TokenNameFinderModel model = NameFinderME.train("en", "designation",
sampleStream, Collections.<String, Object>emptyMap(), 10, 5);
         BufferedOutputStream modelOut = null;

         try {
           modelOut = new BufferedOutputStream(new
FileOutputStream(modelFile));
           model.serialize(modelOut);
         } finally {
           if (modelOut != null)
              modelOut.close();
         }
     }


My training file looks like this. I have set cutoff to 5 and hence including
the token 5 times.

<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>

I tried putting some text before and after the tokens but same result. I
even tried adding more tokens but same result. Would appreciate if someone
can help me out. Thanks in advance for your help.

regards,
Ninaad


Reply via email to