Re: Hardcoded length in prefix and suffix feature generators

2017-02-09 Thread William Colen
Looks good! Thanks for the unit tests.
Please open a Jira, squash your commits and open the PR.

2017-02-09 19:55 GMT-02:00 Jeffrey Zemerick :

> Hi,
>
> I noticed that the length is hardcoded to 4 in the PrefixFeatureGenerator
> and the SuffixFeatureGenerator. I made this value configurable in the XML
> for each feature generator. I also add a check for the length to keep
> duplicate prefixes or suffixes being returned. (If the token is "yes" with
> a length of 4 there would be two "yes" features returned.) If a value is
> not provided in the XML it uses the default value of 4.
>
> You can preview the changes here:
> https://github.com/apache/opennlp/compare/master...
> jzonthemtn:prefixsuffix?expand=1
>
> If this is a change that's desired by the group I can make a JIRA and a
> pull request.
>
> Thanks,
> Jeff
>


Hardcoded length in prefix and suffix feature generators

2017-02-09 Thread Jeffrey Zemerick
Hi,

I noticed that the length is hardcoded to 4 in the PrefixFeatureGenerator
and the SuffixFeatureGenerator. I made this value configurable in the XML
for each feature generator. I also add a check for the length to keep
duplicate prefixes or suffixes being returned. (If the token is "yes" with
a length of 4 there would be two "yes" features returned.) If a value is
not provided in the XML it uses the default value of 4.

You can preview the changes here:
https://github.com/apache/opennlp/compare/master...jzonthemtn:prefixsuffix?expand=1

If this is a change that's desired by the group I can make a JIRA and a
pull request.

Thanks,
Jeff


Re: Generators

2016-08-17 Thread Damiano Porta
Hello William! Thanks!

Yes i know that i can use DictionaryNameFinder but i was studying how the
generators work and how they impact the names recognition.

If we use DictionaryFeatureGenerator each token will labeled with a
specific code i read :*w:dic* and the *token* (
https://github.com/apache/opennlp/blob/164331477b1cea0942dcf6f07714fd50d8e2687e/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/InSpanGenerator.java#L72-L73
)

In this case we only label the tokens with specific "codes" like other
features do, we are not saying THOSE tokens are entities. Right? So for
example:

*sentence:*

I am  John 

*training:*

I feature1 feature2

am feature1 featureN

John featureX + w:dic

so in this case the algorthm understands that "John" labeled with featureX
+ w:dic is an entity. We *must* add the entries of the dictionary inside
the training data otherwise the machine learning will not "associate"
*w:dic* to the entity. Right ?
More features we add more easier will be the classification.

I wrote it badly but hope it makes sense :)

Damiano


Il 17/Ago/2016 14:13, "William Colen" <william.co...@gmail.com> ha scritto:

> Features does not guarantee that the token will be marked as a NE. Its is
> like saying to the model that in the dictionary the token can be a NE, but
> of course it will be evaluated with other features.
> Remember it is machine learning. You can skip the machine learning using a
> DictionaryNameFinder.
>
> http://opennlp.apache.org/documentation/1.6.0/apidocs/
> opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html
>
> Regards
> William
>
> 2016-08-16 15:50 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
>
> > Hello,
> >
> > pardon guys for all these questions but i am trying to study OpenNLP
> > deeply.
> > I write a simple code, you can see it here:
> > https://issues.apache.org/jira/browse/OPENNLP-859?jql=projec
> > t%20%3D%20OPENNLP
> > I am trying to understand what the generators are and what is their job.
> > I know they add features on the tokens list, but what does it mean in
> > simple words? (just adding simple codes on each token?) because for
> example
> > i tried the DictionaryFeatureGenerator with a simple list of names but
> they
> > are not recognized when i use the NameFinderME( see the link on jira )
> >
> > How can i read those features after the find() ?
> >
> > Thank you so much!
> > Damiano
> >
>


Re: Generators

2016-08-17 Thread William Colen
Features does not guarantee that the token will be marked as a NE. Its is
like saying to the model that in the dictionary the token can be a NE, but
of course it will be evaluated with other features.
Remember it is machine learning. You can skip the machine learning using a
DictionaryNameFinder.

http://opennlp.apache.org/documentation/1.6.0/apidocs/
opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html

Regards
William

2016-08-16 15:50 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:

> Hello,
>
> pardon guys for all these questions but i am trying to study OpenNLP
> deeply.
> I write a simple code, you can see it here:
> https://issues.apache.org/jira/browse/OPENNLP-859?jql=projec
> t%20%3D%20OPENNLP
> I am trying to understand what the generators are and what is their job.
> I know they add features on the tokens list, but what does it mean in
> simple words? (just adding simple codes on each token?) because for example
> i tried the DictionaryFeatureGenerator with a simple list of names but they
> are not recognized when i use the NameFinderME( see the link on jira )
>
> How can i read those features after the find() ?
>
> Thank you so much!
> Damiano
>


Generators

2016-08-16 Thread Damiano Porta
Hello,

pardon guys for all these questions but i am trying to study OpenNLP deeply.
I write a simple code, you can see it here:
https://issues.apache.org/jira/browse/OPENNLP-859?jql=project%20%3D%20OPENNLP
I am trying to understand what the generators are and what is their job.
I know they add features on the tokens list, but what does it mean in
simple words? (just adding simple codes on each token?) because for example
i tried the DictionaryFeatureGenerator with a simple list of names but they
are not recognized when i use the NameFinderME( see the link on jira )

How can i read those features after the find() ?

Thank you so much!
Damiano