Hi,

You can do all those tasks by using the create method in the
TokenNameFinderFactory:

http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/TokenNameFinderFactory.java?revision=1712553&view=markup#l100

For that you need to:

1. Provide the name of the factory class you are using, it could be
the same factory class: TokenNameFinderFactory.class.getName()
2. Create an XML descriptor and pass it as a byte[] array
3. Load the resources (e.g., clusters) in a resources map consisting
of the id of the resource and the serializer.
4. The sequenceCodec: BIO or BILOU.

There Namefinder documentation was already updated:

http://svn.apache.org/viewvc/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml?view=markup

There is sample code to do that in the CLI class:

http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/namefind/TokenNameFinderTrainerTool.java?revision=1674262&view=markup

and to run it from the CLI:

1. Create an XML feature descriptor, e.g., brown-feature.xml

<generators>
  <cache>
    <generators>
      <window prevLength = "2" nextLength = "2">
        <token/>
      </window>
      <window prevLength = "2" nextLength = "2">
        <brownclustertoken dict="brownBllipClusters" />
      </window>
    </generators>
  </cache>
</generators>

2. Put your clustering lexicon(s) in a directory, .e.g, clusters
3. Train: bin/opennlp TokenNameFinderTrainer -featuregen brown.xml
-resources clusters/ -params lang/ml/PerceptronTrainerParams.txt -lang
en -model brown.bin -data
~/experiments/nerc/opennlp/en/conll03/en-testb.opennlp -encoding UTF-8

If you open the brown.bin model you will see the cluster lexicon
seralized inside the model.

You can now use it like any other model, the TokenNameFinderFactory
will read again all the required resources when loading the model in
the TokenNameFinderME class.

HTH,

R






On Mon, Feb 15, 2016 at 7:59 AM, Cohan Sujay Carlos <co...@aiaioo.com> wrote:
> Hi,
>
> I noticed that in the OpenNLP SVM 'trunk', the formerly deprecated
> constructors for the class *NameFinderME*:
>
> *public NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator
> generator, int beamSize, SequenceValidator<String> sequenceValidator);*
>
> and
>
>
> *public NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator
> generator, int beamSize)*
>
> have been removed, along with
>
> *public NameFinderME(TokenNameFinderModel model, int beamSize)*
>
> The deprecation comments said:
>
> @deprecated the beam size is now configured during training time in the
> trainer parameter file via beamSearch.beamSize
>
> and
>
> @deprecated Use {@link #NameFinderME(TokenNameFinderModel)} instead and use
> the {@link TokenNameFinderFactory} to configure it.
>
> I wanted to point out a few potential problems:
>
> 1.  The corresponding train methods have not been removed.  So, it is
> possible to train a NameFinderME using a *custom* AdaptiveFeatureGenerator
> class to do feature engineering, but once a model has been so trained,
> there is no way to load and use the stored model with the same
> AdaptiveFeatureGenerator.
>
> 2.  There is still no documentation on the TokenNameFinderFactory which is
> supposed to replace the constructor with the AdaptiveFeatureGenerator.
>
> 3.  I went over the code of TokenNameFinderFactory and a few places where
> it is used and it seemed to be designed for working with an XML
> specification of feature combinations.  I have also in the references
> included a mailing list conversation that says this class should be used
> with an XML file.  However, it turns out that custom feature sets for
> sequential classification are often important, so might we be dropping
> valuable feature engineering support?
>
> Finally, in light of the above, could we keep the deprecated constructors
> around until the alternative constructor (using TokenNameFinderFactory)
> enters into production, and examples and documentation for it become widely
> available?
>
> References:
>
> On the TokenNameFinderFactory using XML:
> https://mail-archives.apache.org/mod_mbox/opennlp-dev/201410.mbox/%3CCAKvDkVDfAx5BMvwVOrbvpZm7xV9erRQzrzbCDpfd+Cq6m=x...@mail.gmail.com%3E
>
> Relevant JIRA issues:
> https://issues.apache.org/jira/browse/OPENNLP-718
> https://issues.apache.org/jira/browse/OPENNLP-717
>
> Thank you,
>
> Cohan Sujay Carlos

Reply via email to