Re: Multiple Token Name Finder Models

2017-06-07 Thread Manoj B. Narayanan
Hi,

This is with respect to the following bug posted on Source Forge a long
time back Multiple Models in CMD
. I have 2 questions regarding
this.

1) Currently there is support for loading multiple models in CMD tool and
it gives a single result. Is is possible with the API ?

2) How does the system resolve conflicts and produces a single result?

Thanks,
Manoj

On Mon, Jun 5, 2017 at 12:41 PM, Manoj B. Narayanan <
manojb.narayanan2...@gmail.com> wrote:

> Hi,
>
> This is with respect to the following bug posted on Source Forge a long
> time back Multiple Models in CMD
> . I have 2 questions
> regarding this.
>
> 1) Currently there is support for loading multiple models in CMD tool and
> it gives a single result. Is is possible with the API ?
>
> 2) How does the system resolve conflicts and produces a single result?
>
> Thanks,
> Manoj
>


Re: Missing serializer for postagger.bin

2017-06-07 Thread Damiano Porta
Hello Jorn,
i confirm the error. Please take a look at the code below. It is a working
example, you only need to edit the constants GENERATORS, POSTAGGER and
SERIALIZED.


*TEST FILE:*

package com.damiano.trainer;

import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import opennlp.tools.ml.perceptron.PerceptronTrainer;
import opennlp.tools.namefind.BioCodec;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSample;
import opennlp.tools.namefind.TokenNameFinderFactory;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.postag.POSModel;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.ObjectStreamUtils;
import opennlp.tools.util.TrainingParameters;
import org.apache.commons.io.IOUtils;

public class Test {

private final String GENERATORS = "/home/damiano/test.xml";
private final String POSTAGGER = "/home/damiano/postagger.bin";
private final String SERIALIZED = "/home/damiano/serialized.bin";

public static void main(String[] args) throws IOException {
Test test = new Test();
}

public Test() throws IOException {

List labelled = new ArrayList<>();

labelled.add(NameSample.parse("This is a sentence 
JACOB ", false));
labelled.add(NameSample.parse("This is a sentence 
JACK ", false));
labelled.add(NameSample.parse("This is a sentence 
THOMAS ", false));
labelled.add(NameSample.parse("This is a sentence 
GEORGE ", false));
labelled.add(NameSample.parse("This is a sentence 
WILLIAM ", false));
labelled.add(NameSample.parse("This is a sentence 
JAMES ", false));

TokenNameFinderFactory factory;

try (ObjectStream samples =
ObjectStreamUtils.createObjectStream(labelled)) {
//HashMap map = new HashMap<>();

try (InputStream in = new FileInputStream(GENERATORS)) {

// Resources
Map map = new HashMap<>();

// Pos Tagger
map.put("postagger.bin", Test.loadPosTagger(POSTAGGER));


// Factory
factory = new TokenNameFinderFactory(
IOUtils.toByteArray(in),
map,
new BioCodec()
);

try {

TrainingParameters mlParams = new TrainingParameters();
mlParams.put(TrainingParameters.ALGORITHM_PARAM,
PerceptronTrainer.PERCEPTRON_VALUE);
mlParams.put(TrainingParameters.ITERATIONS_PARAM,
Integer.toString(300));
mlParams.put(TrainingParameters.CUTOFF_PARAM,
Integer.toString(0));

TokenNameFinderModel model = NameFinderME.train("it",
"person", samples, mlParams, factory);

try (BufferedOutputStream modelOut = new
BufferedOutputStream(new FileOutputStream(SERIALIZED))) {
model.serialize(modelOut);
}

} catch (Exception ex) {
ex.printStackTrace();
}

}
}
}

public static POSModel loadPosTagger (String modelName) {

try (InputStream modelIn = new FileInputStream(modelName)) {
POSModel model = new POSModel(modelIn);
return model;
}
catch (Exception ex) { ex.printStackTrace();  }

return null;
}
}

*GENERATORS:*
























*OUTPUT (with error):*


*Indexing events using cutoff of 0 Computing event counts...  done. 30
events Indexing...  done.Collecting events... Done indexing.Incorporating
indexed data for training...  done. Number of Event Tokens: 30Number of
Outcomes: 2  Number of Predicates: 144Computing model
parameters...Performing 300 iterations.  1:  . (27/30) 0.9  2:  . (30/30)
1.0  3:  . (30/30) 1.0  4:  . (30/30) 1.0  5:  . (30/30) 1.0Stopping:
change in training set accuracy less than 1.0E-5Stats: (30/30)
1.0...done.Compressed 144 parameters to 621 outcome
patternsjava.lang.IllegalStateException: Missing serializer for
postagger.bin at
opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
com.damiano.trainer.Test.(Test.java:75) at
com.damiano.trainer.Test.main(Test.java:31)*

2017-06-07 15:48 GMT+02:00 Damiano Porta :

> Hmm let me try again, yes i copied it badly, i think the names are
> correct, i will give you a working example.
>
> 2017-06-07 15:46 GMT+02:00 Joern Kottmann :
>
>> Ok, but are you sure you used matching names? The exception states
>> it-pos-maxent.bin,
>> which object did you map to it?
>>
>> Jörn
>>
>> O

Re: Missing serializer for postagger.bin

2017-06-07 Thread Damiano Porta
Hmm let me try again, yes i copied it badly, i think the names are correct,
i will give you a working example.

2017-06-07 15:46 GMT+02:00 Joern Kottmann :

> Ok, but are you sure you used matching names? The exception states
> it-pos-maxent.bin,
> which object did you map to it?
>
> Jörn
>
> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta 
> wrote:
>
> > Hi Jorn! Yes
> >
> > 
> > org.apache.opennlp
> > opennlp-tools
> > 1.8.0
> > 
> >
> > Do i need others dependencies too?
> >
> >
> >
> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann :
> >
> > > This should be working. Did you test with 1.8.0?
> > >
> > > Jörn
> > >
> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta 
> > > wrote:
> > >
> > > > Hello,
> > > > i am using the POSTaggerFeatureGenerator via generators.xml
> > > >
> > > > 
> > > >
> > > > during the training i add this model in the resources doing:
> > > >
> > > > HashMap map = new HashMap<>();
> > > > map.put("postagger.bin", myPostaggerModel);
> > > >
> > > >
> > > >  factory = new TokenNameFinderFactory(
> > > >IOUtils.toByteArray(in),
> > > >map,
> > > >new BioCodec()
> > > >  );
> > > >
> > > > I get this error:
> > > >
> > > > java.lang.IllegalStateException: Missing serializer for
> > > it-pos-maxent.bin
> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
> > > > at com.damiano.nlp.ner.trainer.Trainer.(Trainer.java:187)
> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> > > > 2017-06-05 15:37:35 INFO  Trainer:192 -
> java.lang.IllegalStateExceptio
> > n:
> > > > Missing serializer for postagger.bin
> > > >
> > > > Do i have to change the extension of the file?
> > > >
> > > > Thanks
> > > >
> > >
> >
>


Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
Ok, but are you sure you used matching names? The exception states
it-pos-maxent.bin,
which object did you map to it?

Jörn

On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta 
wrote:

> Hi Jorn! Yes
>
> 
> org.apache.opennlp
> opennlp-tools
> 1.8.0
> 
>
> Do i need others dependencies too?
>
>
>
> 2017-06-07 14:53 GMT+02:00 Joern Kottmann :
>
> > This should be working. Did you test with 1.8.0?
> >
> > Jörn
> >
> > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta 
> > wrote:
> >
> > > Hello,
> > > i am using the POSTaggerFeatureGenerator via generators.xml
> > >
> > > 
> > >
> > > during the training i add this model in the resources doing:
> > >
> > > HashMap map = new HashMap<>();
> > > map.put("postagger.bin", myPostaggerModel);
> > >
> > >
> > >  factory = new TokenNameFinderFactory(
> > >IOUtils.toByteArray(in),
> > >map,
> > >new BioCodec()
> > >  );
> > >
> > > I get this error:
> > >
> > > java.lang.IllegalStateException: Missing serializer for
> > it-pos-maxent.bin
> > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
> > > at com.damiano.nlp.ner.trainer.Trainer.(Trainer.java:187)
> > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> > > 2017-06-05 15:37:35 INFO  Trainer:192 - java.lang.IllegalStateExceptio
> n:
> > > Missing serializer for postagger.bin
> > >
> > > Do i have to change the extension of the file?
> > >
> > > Thanks
> > >
> >
>


Re: Missing serializer for postagger.bin

2017-06-07 Thread Damiano Porta
Hi Jorn! Yes


org.apache.opennlp
opennlp-tools
1.8.0


Do i need others dependencies too?



2017-06-07 14:53 GMT+02:00 Joern Kottmann :

> This should be working. Did you test with 1.8.0?
>
> Jörn
>
> On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta 
> wrote:
>
> > Hello,
> > i am using the POSTaggerFeatureGenerator via generators.xml
> >
> > 
> >
> > during the training i add this model in the resources doing:
> >
> > HashMap map = new HashMap<>();
> > map.put("postagger.bin", myPostaggerModel);
> >
> >
> >  factory = new TokenNameFinderFactory(
> >IOUtils.toByteArray(in),
> >map,
> >new BioCodec()
> >  );
> >
> > I get this error:
> >
> > java.lang.IllegalStateException: Missing serializer for
> it-pos-maxent.bin
> > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
> > at com.damiano.nlp.ner.trainer.Trainer.(Trainer.java:187)
> > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> > 2017-06-05 15:37:35 INFO  Trainer:192 - java.lang.IllegalStateException:
> > Missing serializer for postagger.bin
> >
> > Do i have to change the extension of the file?
> >
> > Thanks
> >
>


Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
This should be working. Did you test with 1.8.0?

Jörn

On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta 
wrote:

> Hello,
> i am using the POSTaggerFeatureGenerator via generators.xml
>
> 
>
> during the training i add this model in the resources doing:
>
> HashMap map = new HashMap<>();
> map.put("postagger.bin", myPostaggerModel);
>
>
>  factory = new TokenNameFinderFactory(
>IOUtils.toByteArray(in),
>map,
>new BioCodec()
>  );
>
> I get this error:
>
> java.lang.IllegalStateException: Missing serializer for it-pos-maxent.bin
> at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
> at com.damiano.nlp.ner.trainer.Trainer.(Trainer.java:187)
> at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> 2017-06-05 15:37:35 INFO  Trainer:192 - java.lang.IllegalStateException:
> Missing serializer for postagger.bin
>
> Do i have to change the extension of the file?
>
> Thanks
>