Re: Chunker - proposal to change API (break compatibility)

2016-11-10 Thread William Colen
I tried that, but we have an issue with the factories we created. To
customize we extend the factory, but the method we need to override don't
allow using generic.

  public SequenceValidator getSequenceValidator() {

return new DefaultChunkerSequenceValidator();

  }

I tried to change String to ?, but it breaks a lot of code. I am not sure
if it is a simple change anymore.

Thank you
William

2016-11-10 10:23 GMT-02:00 Joern Kottmann :

> The sequence we have today is usually of type String, but it is generic so
> it could also be about a wrapper object which has the token and tag, e.g.
> TokenWithPos.
> On such a sequence we should be able to use most of the existing interfaces
> without too much change, right?
>
> Jörn
>
> On Thu, Nov 10, 2016 at 10:33 AM, William Colen 
> wrote:
>
> > Hi,
> >
> > Today the Chunker sequence is the sentences pos tags.
> >
> > Although we use both the tokens and tags in the context generator, in the
> > current API we ca not use the token in the sequence validator, because we
> > do not have access to it.
> >
> > In Portuguese, I know there will never be some combinations of word + tag
> > in a specific kind of phrase. Today I can not set a rule with this filter
> > to the sequence validator.
> >
> > I know maybe it is better to train the model so it will learn, but the
> hack
> > of adding this rule to the sequence validator is helpful.
> >
> > Do you think we can change it for the release 1.7.0? I already tried this
> > change in a local branch for a personal project and it works (although it
> > was OpenNLP 1.5.3).
> >
> > This would break API backward compatibility, but the exiting models would
> > not be affected.
> >
> > Thank you
> > William
> >
>


Re: Chunker - proposal to change API (break compatibility)

2016-11-10 Thread Joern Kottmann
The sequence we have today is usually of type String, but it is generic so
it could also be about a wrapper object which has the token and tag, e.g.
TokenWithPos.
On such a sequence we should be able to use most of the existing interfaces
without too much change, right?

Jörn

On Thu, Nov 10, 2016 at 10:33 AM, William Colen 
wrote:

> Hi,
>
> Today the Chunker sequence is the sentences pos tags.
>
> Although we use both the tokens and tags in the context generator, in the
> current API we ca not use the token in the sequence validator, because we
> do not have access to it.
>
> In Portuguese, I know there will never be some combinations of word + tag
> in a specific kind of phrase. Today I can not set a rule with this filter
> to the sequence validator.
>
> I know maybe it is better to train the model so it will learn, but the hack
> of adding this rule to the sequence validator is helpful.
>
> Do you think we can change it for the release 1.7.0? I already tried this
> change in a local branch for a personal project and it works (although it
> was OpenNLP 1.5.3).
>
> This would break API backward compatibility, but the exiting models would
> not be affected.
>
> Thank you
> William
>


Chunker - proposal to change API (break compatibility)

2016-11-10 Thread William Colen
Hi,

Today the Chunker sequence is the sentences pos tags.

Although we use both the tokens and tags in the context generator, in the
current API we ca not use the token in the sequence validator, because we
do not have access to it.

In Portuguese, I know there will never be some combinations of word + tag
in a specific kind of phrase. Today I can not set a rule with this filter
to the sequence validator.

I know maybe it is better to train the model so it will learn, but the hack
of adding this rule to the sequence validator is helpful.

Do you think we can change it for the release 1.7.0? I already tried this
change in a local branch for a personal project and it works (although it
was OpenNLP 1.5.3).

This would break API backward compatibility, but the exiting models would
not be affected.

Thank you
William