Re: Chunker - proposal to change API (break compatibility)
I tried that, but we have an issue with the factories we created. To customize we extend the factory, but the method we need to override don't allow using generic. public SequenceValidator getSequenceValidator() { return new DefaultChunkerSequenceValidator(); } I tried to change String to ?, but it breaks a lot of code. I am not sure if it is a simple change anymore. Thank you William 2016-11-10 10:23 GMT-02:00 Joern Kottmann : > The sequence we have today is usually of type String, but it is generic so > it could also be about a wrapper object which has the token and tag, e.g. > TokenWithPos. > On such a sequence we should be able to use most of the existing interfaces > without too much change, right? > > Jörn > > On Thu, Nov 10, 2016 at 10:33 AM, William Colen > wrote: > > > Hi, > > > > Today the Chunker sequence is the sentences pos tags. > > > > Although we use both the tokens and tags in the context generator, in the > > current API we ca not use the token in the sequence validator, because we > > do not have access to it. > > > > In Portuguese, I know there will never be some combinations of word + tag > > in a specific kind of phrase. Today I can not set a rule with this filter > > to the sequence validator. > > > > I know maybe it is better to train the model so it will learn, but the > hack > > of adding this rule to the sequence validator is helpful. > > > > Do you think we can change it for the release 1.7.0? I already tried this > > change in a local branch for a personal project and it works (although it > > was OpenNLP 1.5.3). > > > > This would break API backward compatibility, but the exiting models would > > not be affected. > > > > Thank you > > William > > >
Re: Chunker - proposal to change API (break compatibility)
The sequence we have today is usually of type String, but it is generic so it could also be about a wrapper object which has the token and tag, e.g. TokenWithPos. On such a sequence we should be able to use most of the existing interfaces without too much change, right? Jörn On Thu, Nov 10, 2016 at 10:33 AM, William Colen wrote: > Hi, > > Today the Chunker sequence is the sentences pos tags. > > Although we use both the tokens and tags in the context generator, in the > current API we ca not use the token in the sequence validator, because we > do not have access to it. > > In Portuguese, I know there will never be some combinations of word + tag > in a specific kind of phrase. Today I can not set a rule with this filter > to the sequence validator. > > I know maybe it is better to train the model so it will learn, but the hack > of adding this rule to the sequence validator is helpful. > > Do you think we can change it for the release 1.7.0? I already tried this > change in a local branch for a personal project and it works (although it > was OpenNLP 1.5.3). > > This would break API backward compatibility, but the exiting models would > not be affected. > > Thank you > William >
Chunker - proposal to change API (break compatibility)
Hi, Today the Chunker sequence is the sentences pos tags. Although we use both the tokens and tags in the context generator, in the current API we ca not use the token in the sequence validator, because we do not have access to it. In Portuguese, I know there will never be some combinations of word + tag in a specific kind of phrase. Today I can not set a rule with this filter to the sequence validator. I know maybe it is better to train the model so it will learn, but the hack of adding this rule to the sequence validator is helpful. Do you think we can change it for the release 1.7.0? I already tried this change in a local branch for a personal project and it works (although it was OpenNLP 1.5.3). This would break API backward compatibility, but the exiting models would not be affected. Thank you William