Re: Hardcoded length in prefix and suffix feature generators
Looks good! Thanks for the unit tests. Please open a Jira, squash your commits and open the PR. 2017-02-09 19:55 GMT-02:00 Jeffrey Zemerick: > Hi, > > I noticed that the length is hardcoded to 4 in the PrefixFeatureGenerator > and the SuffixFeatureGenerator. I made this value configurable in the XML > for each feature generator. I also add a check for the length to keep > duplicate prefixes or suffixes being returned. (If the token is "yes" with > a length of 4 there would be two "yes" features returned.) If a value is > not provided in the XML it uses the default value of 4. > > You can preview the changes here: > https://github.com/apache/opennlp/compare/master... > jzonthemtn:prefixsuffix?expand=1 > > If this is a change that's desired by the group I can make a JIRA and a > pull request. > > Thanks, > Jeff >
Hardcoded length in prefix and suffix feature generators
Hi, I noticed that the length is hardcoded to 4 in the PrefixFeatureGenerator and the SuffixFeatureGenerator. I made this value configurable in the XML for each feature generator. I also add a check for the length to keep duplicate prefixes or suffixes being returned. (If the token is "yes" with a length of 4 there would be two "yes" features returned.) If a value is not provided in the XML it uses the default value of 4. You can preview the changes here: https://github.com/apache/opennlp/compare/master...jzonthemtn:prefixsuffix?expand=1 If this is a change that's desired by the group I can make a JIRA and a pull request. Thanks, Jeff
Re: Generators
Hello William! Thanks! Yes i know that i can use DictionaryNameFinder but i was studying how the generators work and how they impact the names recognition. If we use DictionaryFeatureGenerator each token will labeled with a specific code i read :*w:dic* and the *token* ( https://github.com/apache/opennlp/blob/164331477b1cea0942dcf6f07714fd50d8e2687e/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/InSpanGenerator.java#L72-L73 ) In this case we only label the tokens with specific "codes" like other features do, we are not saying THOSE tokens are entities. Right? So for example: *sentence:* I am John *training:* I feature1 feature2 am feature1 featureN John featureX + w:dic so in this case the algorthm understands that "John" labeled with featureX + w:dic is an entity. We *must* add the entries of the dictionary inside the training data otherwise the machine learning will not "associate" *w:dic* to the entity. Right ? More features we add more easier will be the classification. I wrote it badly but hope it makes sense :) Damiano Il 17/Ago/2016 14:13, "William Colen" <william.co...@gmail.com> ha scritto: > Features does not guarantee that the token will be marked as a NE. Its is > like saying to the model that in the dictionary the token can be a NE, but > of course it will be evaluated with other features. > Remember it is machine learning. You can skip the machine learning using a > DictionaryNameFinder. > > http://opennlp.apache.org/documentation/1.6.0/apidocs/ > opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html > > Regards > William > > 2016-08-16 15:50 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > Hello, > > > > pardon guys for all these questions but i am trying to study OpenNLP > > deeply. > > I write a simple code, you can see it here: > > https://issues.apache.org/jira/browse/OPENNLP-859?jql=projec > > t%20%3D%20OPENNLP > > I am trying to understand what the generators are and what is their job. > > I know they add features on the tokens list, but what does it mean in > > simple words? (just adding simple codes on each token?) because for > example > > i tried the DictionaryFeatureGenerator with a simple list of names but > they > > are not recognized when i use the NameFinderME( see the link on jira ) > > > > How can i read those features after the find() ? > > > > Thank you so much! > > Damiano > > >
Re: Generators
Features does not guarantee that the token will be marked as a NE. Its is like saying to the model that in the dictionary the token can be a NE, but of course it will be evaluated with other features. Remember it is machine learning. You can skip the machine learning using a DictionaryNameFinder. http://opennlp.apache.org/documentation/1.6.0/apidocs/ opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html Regards William 2016-08-16 15:50 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > Hello, > > pardon guys for all these questions but i am trying to study OpenNLP > deeply. > I write a simple code, you can see it here: > https://issues.apache.org/jira/browse/OPENNLP-859?jql=projec > t%20%3D%20OPENNLP > I am trying to understand what the generators are and what is their job. > I know they add features on the tokens list, but what does it mean in > simple words? (just adding simple codes on each token?) because for example > i tried the DictionaryFeatureGenerator with a simple list of names but they > are not recognized when i use the NameFinderME( see the link on jira ) > > How can i read those features after the find() ? > > Thank you so much! > Damiano >
Generators
Hello, pardon guys for all these questions but i am trying to study OpenNLP deeply. I write a simple code, you can see it here: https://issues.apache.org/jira/browse/OPENNLP-859?jql=project%20%3D%20OPENNLP I am trying to understand what the generators are and what is their job. I know they add features on the tokens list, but what does it mean in simple words? (just adding simple codes on each token?) because for example i tried the DictionaryFeatureGenerator with a simple list of names but they are not recognized when i use the NameFinderME( see the link on jira ) How can i read those features after the find() ? Thank you so much! Damiano