Re: Chunker - proposal to change API (break compatibility)

Joern Kottmann Thu, 10 Nov 2016 04:23:54 -0800

The sequence we have today is usually of type String, but it is generic so
it could also be about a wrapper object which has the token and tag, e.g.
TokenWithPos.
On such a sequence we should be able to use most of the existing interfaces
without too much change, right?


Jörn

On Thu, Nov 10, 2016 at 10:33 AM, William Colen <[email protected]>
wrote:

> Hi,
>
> Today the Chunker sequence is the sentences pos tags.
>
> Although we use both the tokens and tags in the context generator, in the
> current API we ca not use the token in the sequence validator, because we
> do not have access to it.
>
> In Portuguese, I know there will never be some combinations of word + tag
> in a specific kind of phrase. Today I can not set a rule with this filter
> to the sequence validator.
>
> I know maybe it is better to train the model so it will learn, but the hack
> of adding this rule to the sequence validator is helpful.
>
> Do you think we can change it for the release 1.7.0? I already tried this
> change in a local branch for a personal project and it works (although it
> was OpenNLP 1.5.3).
>
> This would break API backward compatibility, but the exiting models would
> not be affected.
>
> Thank you
> William
>

Re: Chunker - proposal to change API (break compatibility)

Reply via email to