Re: Tackling the "legacy dilemma"

Ted Dunning Tue, 15 Apr 2014 19:59:22 -0700

Manoj,

Sounds like a fair trade there.


Hopefully, you would consider upgrading if we get Andy's code ported to the
DSL or if we incorporate the h2o random forest implementation.




On Tue, Apr 15, 2014 at 7:51 PM, Manoj Awasthi <awasthi.ma...@gmail.com>wrote:

> >  * remove Random Forest as we cannot even answer questions to the
> > implementation on the mailinglist
> >
>      -1 to removing present Random Forests. I think it is being used - we
> (at adobe) are playing around with it a bit.  If the reason for removal is
> that there no active maintainer that can be resolved by people using it
> getting more active on this - a community action. FWIW, I vote against
> throwing away this code.
>
>
>
> On Tue, Apr 15, 2014 at 2:38 PM, Sebastian Schelter <s...@apache.org>
> wrote:
>
> > On 04/15/2014 11:07 AM, Suneel Marthi wrote:
> >
> >> On Tue, Apr 15, 2014 at 12:57 AM, Sebastian Schelter <s...@apache.org>
> >> wrote:
> >>
> >>  Hi,
> >>>
> >>>  From reading the thread, I have the impression that we agree on the
> >>> following actions:
> >>>
> >>>
> >>>   * reject any future MR algorithm contributions, prominently state
> this
> >>> on the website and in talks
> >>>   * make all existing algorithm code compatible with Hadoop 2, if there
> >>> is
> >>> no one willing to make an existing algorithm compatible, remove the
> >>> algorithm
> >>>   * deprecate Canopy clustering
> >>>   * email the original FPM and random forest authors to ask for
> >>> maintenance
> >>> of the algorithms
> >>>   * rename core to "mr-legacy" (and  gradually pull items we really
> need
> >>> out of that later)
> >>>
> >>> I will create jira tickets for those action points. I think the biggest
> >>> challenge here is the Hadoop 2 compatibility, is someone volunteering
> to
> >>> drive that? Would be awesome.
> >>>
> >>>
> >> With things settling down at work for me, I have time now to dedicate
> back
> >> to Mahout. I can drive this effort.
> >>
> >
> > That is great news!
> >
> >
> >
> >>
> >>> Best,
> >>> Sebastian
> >>>
> >>>
> >>> On 04/13/2014 07:19 PM, Andrew Musselman wrote:
> >>>
> >>>  This is a good summary of how I feel too.
> >>>>
> >>>>   On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <s...@apache.org>
> >>>> wrote:
> >>>>
> >>>>>
> >>>>> Unfortunately, its not that easy to get enough voluntary work. I
> issued
> >>>>> the third call for working on the documentation today as there are
> >>>>> still
> >>>>> lots of open issues. That's why I'm trying to suggest a move that
> >>>>> involves
> >>>>> as few work as possible.
> >>>>>
> >>>>> We should get the MR codebase into a state that we all can live with
> >>>>> and
> >>>>> then focus on new stuff like the scala DSL.
> >>>>>
> >>>>> --sebastian
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>   On 04/13/2014 07:09 PM, Giorgio Zoppi wrote:
> >>>>>
> >>>>>> The best thing, should be do a plan, and see how much effort do you
> >>>>>> need to
> >>>>>> this. Then find out voluntaries to accomplish the task. Quite sure
> >>>>>> that
> >>>>>> there a lot of people around there that they are willing to help
> out.
> >>>>>>
> >>>>>> BR,
> >>>>>> deneb.
> >>>>>>
> >>>>>>
> >>>>>> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <s...@apache.org>:
> >>>>>>
> >>>>>>
> >>>>>>   Hi,
> >>>>>>
> >>>>>>>
> >>>>>>> I took some days to let the latest discussion about the state and
> >>>>>>> future
> >>>>>>> of Mahout go through my head. I think the most important thing to
> >>>>>>> address
> >>>>>>> right now is the MapReduce "legacy" codebase. A lot of the MR
> >>>>>>> algorithms
> >>>>>>> are currently unmaintained, documentation is outdated and the
> >>>>>>> original
> >>>>>>> authors have abandoned Mahout. For some algorithms it is hard to
> get
> >>>>>>> even
> >>>>>>> questions answered on the mailinglist (e.g. RandomForest). I agree
> >>>>>>> with
> >>>>>>> Sean's comments that letting the code linger around is no option
> and
> >>>>>>> will
> >>>>>>> continue to harm Mahout.
> >>>>>>>
> >>>>>>> In the previous discussion, I suggested to make a radical move and
> >>>>>>> aim
> >>>>>>> to
> >>>>>>> delete this codebase, but there were serious objections from
> >>>>>>> committers and
> >>>>>>> users that convinced me that there is still usage of and interested
> >>>>>>> in
> >>>>>>> that
> >>>>>>> codebase.
> >>>>>>>
> >>>>>>> That puts us into a "legacy dilemma". We cannot delete the code
> >>>>>>> without
> >>>>>>> harming our userbase. On the other hand, I don't see anyone willing
> >>>>>>> to
> >>>>>>> rework the codebase. Further, the code cannot linger around anymore
> >>>>>>> as
> >>>>>>> it
> >>>>>>> is doing now, especially when we fail to answer questions or don't
> >>>>>>> provide
> >>>>>>> documentation.
> >>>>>>>
> >>>>>>> *We have to make a move*!
> >>>>>>>
> >>>>>>> I suggest the following actions with regard to the MR codebase. I
> >>>>>>> hope
> >>>>>>> that they find consent. If there are objections, please give
> >>>>>>> alternatives,
> >>>>>>> *keeping everything as-is is not an option*:
> >>>>>>>
> >>>>>>>    * reject any future MR algorithm contributions, prominently
> state
> >>>>>>> this on
> >>>>>>> the website and in talks
> >>>>>>>    * make all existing algorithm code compatible with Hadoop 2, if
> >>>>>>> there is
> >>>>>>> no one willing to make an existing algorithm compatible, remove the
> >>>>>>> algorithm
> >>>>>>>    * deprecate the existing MR algorithms, yet still take bug fix
> >>>>>>> contributions
> >>>>>>>    * remove Random Forest as we cannot even answer questions to the
> >>>>>>> implementation on the mailinglist
> >>>>>>>
> >>>>>>> There are two more actions that I would like to see, but'd be
> willing
> >>>>>>> to
> >>>>>>> give up if there are objections:
> >>>>>>>
> >>>>>>>    * move the MR algorithms into a separate maven module
> >>>>>>>    * remove Frequent Pattern Mining again (we already aimed for
> that
> >>>>>>> in
> >>>>>>> 0.9
> >>>>>>> but had one user who shouted but never returned to us)
> >>>>>>>
> >>>>>>> Let me know what you think.
> >>>>>>>
> >>>>>>> --sebastian
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >
>

Re: Tackling the "legacy dilemma"

Reply via email to