[ https://issues.apache.org/jira/browse/OPENNLP-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256750#comment-16256750 ]
Koji Sekiguchi commented on OPENNLP-1154: ----------------------------------------- As Joern suggested, this should be used for not only NameFinder but also POS Tagger, I added "POS Tagger" to the title. > change the XML format for feature generator config in NameFinder and POS > Tagger > ------------------------------------------------------------------------------- > > Key: OPENNLP-1154 > URL: https://issues.apache.org/jira/browse/OPENNLP-1154 > Project: OpenNLP > Issue Type: Improvement > Components: Name Finder > Affects Versions: 1.8.3 > Reporter: Koji Sekiguchi > Assignee: Koji Sekiguchi > > NameFinder provides many kinds of feature generator (factories). Users can > define their config via XML which looks like: > {code:xml} > <generators> > <cache> > <generators> > <window prevLength = "2" nextLength = "2"> > <tokenclass/> > </window> > <window prevLength = "2" nextLength = "2"> > <token/> > </window> > <definition/> > <prevmap/> > <bigram/> > <sentence begin="true" end="false"/> > </generators> > </cache> > </generators> > {code} > If a user wants to implement their own feature generator, he can use <custom > .../>, but if he wants to have two or more feature generators at once, he may > be able to implement it by providing a wrapper feature generator which wraps > two or more feature generators that he originally wants to have, but it is > not good. > I'd like to suggest that we make the config format more flexible like below: > {code:xml} > <generator > class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory"> > <args> > <generator > class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory"> > <args> > <generator > class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory"> > <args> > <generator > class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory"> > <args> > <int name="prevLength">2</int> > <int name="nextLength">2</int> > <generator > class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/> > </args> > </generator> > <generator > class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory"> > <args> > <int name="prevLength">2</int> > <int name="nextLength">2</int> > <generator > class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/> > </args> > </generator> > </args> > </generator> > </args> > </generator> > </args> > </generator> > {code} > If <args>...</args> is too noisy, I'm thinking another format as well: > {code:xml} > <generator > class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory"> > <generator > class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory"> > <generator > class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory"> > <generator > class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory"> > <int name="prevLength">2</int> > <int name="nextLength">2</int> > <generator > class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/> > </generator> > <generator > class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory"> > <int name="prevLength">2</int> > <int name="nextLength">2</int> > <generator > class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/> > </generator> > </generator> > </generator> > </generator> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)