Re: Chunker - proposal to change API (break compatibility)

2016-11-10 Thread William Colen
I tried that, but we have an issue with the factories we created. To
customize we extend the factory, but the method we need to override don't
allow using generic.

  public SequenceValidator getSequenceValidator() {

return new DefaultChunkerSequenceValidator();

  }

I tried to change String to ?, but it breaks a lot of code. I am not sure
if it is a simple change anymore.

Thank you
William

2016-11-10 10:23 GMT-02:00 Joern Kottmann :

> The sequence we have today is usually of type String, but it is generic so
> it could also be about a wrapper object which has the token and tag, e.g.
> TokenWithPos.
> On such a sequence we should be able to use most of the existing interfaces
> without too much change, right?
>
> Jörn
>
> On Thu, Nov 10, 2016 at 10:33 AM, William Colen 
> wrote:
>
> > Hi,
> >
> > Today the Chunker sequence is the sentences pos tags.
> >
> > Although we use both the tokens and tags in the context generator, in the
> > current API we ca not use the token in the sequence validator, because we
> > do not have access to it.
> >
> > In Portuguese, I know there will never be some combinations of word + tag
> > in a specific kind of phrase. Today I can not set a rule with this filter
> > to the sequence validator.
> >
> > I know maybe it is better to train the model so it will learn, but the
> hack
> > of adding this rule to the sequence validator is helpful.
> >
> > Do you think we can change it for the release 1.7.0? I already tried this
> > change in a local branch for a personal project and it works (although it
> > was OpenNLP 1.5.3).
> >
> > This would break API backward compatibility, but the exiting models would
> > not be affected.
> >
> > Thank you
> > William
> >
>


Re: Next release

2016-11-10 Thread William Colen
Cool. There is a lot of PlainTextByLineStream references in deprecated
methods, specially main methods. I will ignore them and you can remove the
main method when you go through each tool.
I will focus on PlainTextByLineStream that are not inside deprecated
methods.


2016-11-10 6:39 GMT-02:00 Joern Kottmann :

> Ok, I created a couple of issues and will go through them rather quickly.
>
> Jörn
>
> On Thu, Nov 10, 2016 at 3:36 AM, William Colen 
> wrote:
>
> > Jörn, I can help removing deprecated code. I started with
> > PlainTextByLineStream. It is used everywhere so there is a lot to change.
> >
> >
> > 2016-11-08 9:08 GMT-02:00 Joern Kottmann :
> >
> > > I suggest we remove more deprecated code, there is still a lot which
> > could
> > > be removed and is really old.
> > > It is a bit of a boring task, if anyone has some spare cycles help
> would
> > be
> > > welcome.
> > >
> > > Jörn
> > >
> > > On Tue, Nov 8, 2016 at 9:59 AM, Aliaksandr Autayeu <
> > aliaksa...@autayeu.com
> > > >
> > > wrote:
> > >
> > > > +1 for 1.7 (also due to lemmatized changes and removal of deprecated
> > > code).
> > > >
> > > > On 8 November 2016 at 09:48, Rodrigo Agerri 
> > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > +1 1.7.0 in next release and +1 for a yearly release
> > > > >
> > > > > Just to provide some info, the main changes in the lemmatizer have
> > > been:
> > > > >
> > > > > 1. Added a supervised statistical lemmatizer, usable from the CLI
> and
> > > > > API. The supervised lemmaitzer now provides a much better coverage
> > for
> > > > > unknown words with respect to the previously existing
> > dictionary-based
> > > > > one.
> > > > > 2. The lemmatizer component has been rewritten and the API
> therefore
> > > > > has substantially changed. Thus, the changes in the
> Dictionary-based
> > > > > lemmatizer are not backward compatible. In any case, I do not think
> > > > > that so many people was using it and the change at using the API is
> > > > > minor.
> > > > >
> > > > > The new statistical lemmatizer can support the Dictionary-based
> > > > > lemmatizers often used to provide features for components such as
> > Word
> > > > > Sense Disambiguation, Opinion Mining/Sentiment Analysis, etc. In
> this
> > > > > regard, it will be nice to aim at working on the development of
> those
> > > > > two components for their release. Maybe the next release is too
> > close,
> > > > > but definitely for the next one.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Rodrigo
> > > > >
> > > > > On Mon, Nov 7, 2016 at 7:01 PM, Russ, Daniel (NIH/CIT) [E]
> > > > >  wrote:
> > > > > > Also the lemmatizer has significantly changed.  I vote 1.7
> > > > > >
> > > > > > On 11/7/16, 12:59 PM, "Joern Kottmann" 
> wrote:
> > > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > since our last release it has been a while and we received
> > quite
> > > a
> > > > > few
> > > > > > changes which would be nice to get released.
> > > > > >
> > > > > > There are still some open Jira issues, but mostly smaller
> > things
> > > > that
> > > > > > can be wrapped up rather quickly.
> > > > > >
> > > > > > Is there anything important missing which should go into the
> > next
> > > > > > release? Otherwise I think we should also aim for more
> frequent
> > > > > > released and just make one again early next year, with all
> the
> > > > stuff
> > > > > we
> > > > > > might miss out now.
> > > > > >
> > > > > > We took in a patch - as part of OPENNLP-830 - to replace our
> > > > > self-made
> > > > > > hash table with the java.util.HashMap. This change is not
> > > backward
> > > > > > compatible for folks who extend AbstractModel.
> > > > > >
> > > > > > Should we go with 1.6.1 as a next version or should we make
> > 1.7.0
> > > > to
> > > > > > reflect that?
> > > > > >
> > > > > > Previously we only had backward incompatible changes in
> > versions
> > > > > which
> > > > > > bumped by the second number. Maybe that is better choice. It
> > will
> > > > > > probably break some peoples code when they update.
> > > > > >
> > > > > > We also have lots of deprecated API still in OpenNLP, should
> we
> > > try
> > > > > to
> > > > > > remove as much as possible of it now?
> > > > > >
> > > > > > Jörn
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>