The sequence we have today is usually of type String, but it is generic so it could also be about a wrapper object which has the token and tag, e.g. TokenWithPos. On such a sequence we should be able to use most of the existing interfaces without too much change, right?
Jörn On Thu, Nov 10, 2016 at 10:33 AM, William Colen <[email protected]> wrote: > Hi, > > Today the Chunker sequence is the sentences pos tags. > > Although we use both the tokens and tags in the context generator, in the > current API we ca not use the token in the sequence validator, because we > do not have access to it. > > In Portuguese, I know there will never be some combinations of word + tag > in a specific kind of phrase. Today I can not set a rule with this filter > to the sequence validator. > > I know maybe it is better to train the model so it will learn, but the hack > of adding this rule to the sequence validator is helpful. > > Do you think we can change it for the release 1.7.0? I already tried this > change in a local branch for a personal project and it works (although it > was OpenNLP 1.5.3). > > This would break API backward compatibility, but the exiting models would > not be affected. > > Thank you > William >
