Re: Board Report

Dmitriy Lyubimov Mon, 07 Apr 2014 12:16:09 -0700

Actually my vision is very close to that of Julia
http://julialang.org/blog/2012/02/why-we-created-julia/,


except
(1) i don't want to be in business of creating a yet-another language;
(2) and i don't want to be in business of creating a yet-another
distributed computations engine.


On Mon, Apr 7, 2014 at 11:52 AM, Dmitriy Lyubimov <[email protected]> wrote:

>
> Do you suggest we should leave the blackbox stuff to MLBase/Oryx and
>> solely focus on providing high level ML constructs?
>>
> No. I suggest nothing of the kind. I would still expect healthy stream of
> blackbox contributions to be important. The ML environment provided by
> generalist contributors
>
> (1) helps to attract blackbox contributions from specialist contributors
> by making it easier;
> (2) facilitates user devs to assemble and customize blackbox building
> blocks into application with ease (as we know, 90% of application effort is
> feature prep and vectorization effort, along with custom business rules and
> metrics -- and we just want to hand out  the end user means to do it
> quickly and painlessly. Blackbox building bricks will never cover multitude
> of those entirely);
> (3) provides reliable differentiation of Mahout from other similar
> attempts.
>
> I am just saying report fails to mention it and leaves an impression  we
> have no difference in vision from mllib, let alone MLI. This is simply not
> true, at least in my case.
>
>
>> @Sean How much can you agree on the vision I suggested? It meets your
>> demand of having a plan to solve the problems with the MR codebase (by
>> getting rid of it in the near future) and provides a direction for Spark as
>> the new underlying execution system, with optional support for Stratosphere
>> and H20, if those communities manage to convince us that it is worth to
>> integrate.
>>
>> --sebastian
>>
>>
>>
>>
>> 2014-04-07 19:29 GMT+02:00 Pat Ferrel <[email protected]>:
>> The document does not mention the state of the existing Spark work in the
>> snapshot codebase. Shouldn't this be noted?
>>
>> On Apr 7, 2014, at 5:06 AM, Sebastian Schelter <[email protected]> wrote:
>>
>> I think we should mention the redesign/rework of the website and the
>> completion of the move from the old wiki to Apache CMS.
>>
>> --sebastian
>>
>> On 04/07/2014 02:04 PM, Grant Ingersoll wrote:
>> > Here is my proposed report.  For the most part, I think the only right
>> thing to do vis-a-vis the Board is to report that we are in the midst of a
>> healthy (yes, I believe it is, for the most part healthy and normal)
>> discussion on where to go next.
>> >
>> > PMC Members: this is checked into SVN at
>> https://svn.apache.org/repos/asf/mahout/pmc/board-reports/2014/board-report-apr.txt.
>>  It is due on Wednesday.  If you object to this approach of reporting,
>> please let me know ASAP and suggest alternatives.
>> >
>> > === Apache Mahout Status Report: April 2014 ===
>> >
>> > -----
>> >
>> > Apache Mahout has implementations of a wide range of machine learning
>> and
>> > data mining algorithms: clustering, classification, collaborative
>> filtering
>> > and frequent pattern mining
>> >
>> > Project Status
>> > --------------
>> >
>> > The project continues to have a large and active user base.  While
>> > the developer base has continued to grow, there is a very active
>> > and healthy debate going on about where Mahout goes next.  Please
>> > see the Issues section below for more details.
>> >
>> > Community
>> > ---------
>> >
>> > * Andrew Musselman was voted in as new committer.
>> > * No changes to the PMC in the reporting period.
>> >
>> > * The main issue concerning the community right now is the addition
>> > of new contributions from 0xData and the integration of Mahout with
>> Spark.
>> >
>> > Community Objectives
>> > --------------------
>> >
>> > Our goal is to build scalable machine learning libraries. See the Issues
>> > section below for the debate in the community about our objectives.
>> >
>> >
>> > Releases
>> > --------
>> >
>> > In addition to an ongoing debate on Mahout's future, the community is
>> actively
>> >  working on integrating Mahout with Scala/Spark, updating
>> > documentation, and bringing in new code and committers to update the
>> core project.
>> >
>> >
>> > Issues
>> > ------
>> > The Mahout community is at a crossroads in terms of where
>> > to go next.  While the project has a broad number of users and
>> interested
>> > parties, most committers are trying to maintain the code base on a
>> purely
>> > part time basis, when the amount of work to sustain these users
>> > clearly points to it needing to
>> > be full time.  Furthermore, much of our original code base is written
>> > for Hadoop MapReduce 1.0, which many in the community have come to
>> realize
>> > is not well-suited for solving the kinds of problems that Mahout has set
>> > out to solve.  There have been several lengthy discussions and
>> prototypes
>> > going on to work out next directions along the lines of the Spark and
>> > 0xData contributions (there are numerous threads on the [email protected]
>> > mailing list.)
>> >
>> > The PMC does not think this requires Board intervention at this time
>> > as the debate is, as far as we can tell, healthy.  We do, however,
>> > expect that this debate will take some time to resolve and may mean we
>> > won't be shipping a 1.0 release any time soon.  We will keep the Board
>> > apprised of our next steps as we work through the process.
>> >
>> >
>> >
>> >
>> > On Apr 7, 2014, at 4:53 AM, Grant Ingersoll <[email protected]>
>> wrote:
>> >
>> >> To Sean's point, if Mahout were "my company", I would do the
>> following, albeit pragmatic and not so pleasant thing, assuming, of course,
>> I had the $$$ to do so:
>> >>
>> >> 1. Clean up existing code with a laser focus on a few key areas
>> (Sebastian's list makes sense) using a part of the team and call it 1.0 and
>> ship it, as it has a number of users and they deserve to not have the rug
>> pulled out from under them.
>> >>
>> >> 2. Spin out a subset of the team to explore and prototype 2.0 based on
>> two very positive and re-energizing looking ideas:
>> >>      a. Scala DSL (and maybe Spark)
>> >>      b. 0xData
>> >>
>> >>      All of the work for #2 would be done in a clean repo and would
>> only bring in legacy code where it was truly beneficial (back compat. can
>> come later, if at all).
>> >>      It would then benchmark those two approaches as well as look at
>> where they overlap and are mutually beneficial and then go forward with the
>> winner.
>> >>
>> >> 3. Once #2 is viable, put most effort into it and maintain 1.0 with as
>> minimal support as possible, encouraging, neh -- actively helping -- 1.0
>> customers upgrade as quickly as possible.
>> >>
>> >> The tricky part then becomes how do you make sure to still make your
>> sales #'s while also convincing them that your roadmap is what they are
>> really buying.
>> >>
>> >> If I didn't have the $$$ to do both of these (i.e. we need a massive
>> turn around and we have one last shot), I would be all in on #2.
>> >>
>> >> -----------------------------------
>> >>
>> >> That being said, Mahout is not "my company".  Heck, Mahout is not even
>> a "company", so we don't need to be bound by company conventions and
>> thought processes, even if that fits with all of our individual day jobs.
>>  And, thankfully, we don't have any sales numbers to make.
>> >>
>> >> We are chartered with one and only one mission: produce open source,
>> scalable machine learning libraries under the Apache license and community
>> driven principles.  We are not required by the Board or anyone else to
>> support version X for Y years or to use Hadoop or Scala or Java.  We are
>> also not required to implement any specific algorithms or deliver them on
>> specific time frames.  We are also not required to provide users upgrade
>> paths or the like.  Naturally, we _want_ to do these things for the sake of
>> the community, but let's be clear: it is not a requirement from the ASF.
>>  We are, however, required, to have a sustaining community.
>> >>
>> >> ------------------------------------
>> >>
>> >> I personally think we should start clean on #2, throwing off the
>> shackles of the past and emerge 6-9 months later with Mahout 2.0 (and yes,
>> call it that, not 0.1 as Sebastian suggests, for marketing reasons) built
>> on a completely new and fresh repository, likely bringing in only the
>> Math/collections underpinnings and maybe the build system.  This new
>> repository would have only a handful of core algorithms that we know are
>> well implemented, sustainable and best in class.
>> >>
>> >> I think we should look at the lead up to 0.9 as an experiment that
>> proved out a lot of interesting ideas, including the fact that Mahout
>> proved there is vast interest in open source large scale machine learning
>> and that it is the benchmark for comparison.  Not many other ML projects
>> can say that, even if they have better technical implementations or are
>> less fragmented.  Once you realize something has outlived it's usefulness
>> in software, however, there is no point in lingering.
>> >>
>> >> That being said, at least for the foreseeable future, I am not in a
>> position to contribute much code.  So, from my perspective, the ASF
>> Meritocratic approach takes over:  those who do the work make the
>> decisions.  If you want something in, then put up the patch and ask for
>> feedback.  If no one provides feedback, assume lazy consensus and move
>> forward.  Nothing convinces people better than actual, real, executing
>> code.  For my part, I am happy to continue to work the bureaucratic side of
>> things to make sure reports get filed, credentials get created, etc. and
>> the occasional patch.  I hope one day I will have time to contribute again.
>> >>
>> >> I will follow up w/ a separate email on what I am going to put in the
>> Board Report.
>> >>
>> >> On Apr 7, 2014, at 1:52 AM, Sean Owen <[email protected]> wrote:
>> >>
>> >>> No, it's about the opposite. I'm referring to the default, current
>> >>> state of play here.
>> >>>
>> >>> The issues for a vendor are demand and supportability. Do people want
>> >>> to pay for support of X? Can you honestly say you have expertise to
>> >>> support and influence X over at least a major release cycle (12-18
>> >>> months)? The latter needs a reasonably reliable roadmap and
>> >>> continuity.
>> >>>
>> >>> I'm suggesting that in the current state, demand is low and going
>> >>> down. The current code base seems de facto deprecated/unsupported
>> >>> already, and possibly to be removed or dramatically changed into
>> >>> something as-yet unclear. Nobody here seems to have taken a hard
>> >>> decision regarding a next major release, but, the trajectory of that
>> >>> decision seems clear if the current state remains the same.
>> >>>
>> >>> From my perspective, "middle-ground" new directions like adding a bit
>> >>> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only
>> >>> worse. I can see why there may be a little renewed demand for the new
>> >>> bits, but then, why not go all in on one of them?
>> >>>
>> >>> Because a substantially all-new direction is a different story. If a
>> >>> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a
>> >>> lot of renewed demand. And a clearer underlying roadmap sounds
>> >>> possible. It would remain to be seen, but there's nothing stopping
>> >>> those ideas from becoming part of a distro too.
>> >>>
>> >>>
>> >>> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <[email protected]>
>> wrote:
>> >>>> Please be explicit here.  It sounds like you are saying that if
>> Mahout goes
>> >>>> in the proposed new direction that Cloudera will drop Mahout.
>> >>>>
>> >>>> Is that what you mean to say?
>> >>
>> >>
>> >
>> > --------------------------------------------
>> > Grant Ingersoll | @gsingers
>> > http://www.lucidworks.com
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>

Re: Board Report

Reply via email to