On Apr 8, 2014 1:15 PM, "Sebastian Schelter" <[email protected]> wrote:
>
> Hi,
>
> I'm still letting the current discussion settle in my head. I'll try to
come up with new thoughts in a few days. I think we should first identify
things that a majority can agree on and then walk through the controversary
stuff later.
>
> One thing that I would like to start now and that I think is consensus,
(given that it was in the board report) is to rename the upcoming release
to 0.10.

Can we have fun just like hbase did when they jumped from 0.22 to 0.89?

Where are we really? I suggest 0.33

We already stated in the board report that we "won't be shipping a 1.0
release any time soon"
>
> Shout if you disagree, otherwise I will do the renaming end of the week.
>
> --sebastian
>
>
>
> On 04/08/2014 03:17 PM, Grant Ingersoll wrote:
>>
>>
>> On Apr 7, 2014, at 11:03 AM, Pat Ferrel <[email protected]> wrote:
>>
>>> Mahout needs a reboot. Grant has the right perspective, but I'd take it
further. His #2 (two efforts) is not and never would be reasonable in
anything but a huge company.
>>>
>>
>> FWIW, that was my view _if_ I were in a company funding it.  Further
down, my take is that for the most part we should follow the natural Apache
way and let those who do the work make the choices, which AFAICT, point at
forgetting about #1 and pursuing #2 only.
>>
>> -Grant
>>
>>
>>> I have never and would never take a team the size of Mahout (even with
some new commiters) and split a reboot into two parts on two engines. No
sane project manager would allow this. Why do we think it will work here?
>>>
>>> The recent Gigaom article left me sympathetic with how confused the
readers must be, let alone potential users or contributors.
>>>
>>> Sean is not being nihilistic, two directions will not work for Mahout.
Mahout has a bad reputation already for being a poorly documented and a
poorly integrated loose collections of code with a lot of technical debt.
Honestly has anyone reading this seen increasing interest in the project? A
reboot is the only thing I can imagine to re-energize it and even that must
be done with the utmost in clear communication.
>>>
>>> If you accept the above then there seem to be some ways forward:
>>> 1) reboot on Spark, let 0xdata do what they will.
>>> 2) reboot on 0xdata and let the Spark commiters consider becoming MLlib
commiters or other.
>>> 3) fail by issuing confusing direction statements, spending too much
time supporting and reconciling multiple significantly disparate efforts
and dividing commiters. This is such a classic fail that I have a hard time
even considering it.
>>>
>>> I'd like to see #1 for what it's worth. A concerted effort by all on #1
would ensure Mahout is included in future distros. Maybe even #2 would be
included but #3? It's a non-starter.
>>>
>>> On Apr 7, 2014, at 4:53 AM, Grant Ingersoll <[email protected]> wrote:
>>>
>>> To Sean's point, if Mahout were "my company", I would do the following,
albeit pragmatic and not so pleasant thing, assuming, of course, I had the
$$$ to do so:
>>>
>>> 1. Clean up existing code with a laser focus on a few key areas
(Sebastian's list makes sense) using a part of the team and call it 1.0 and
ship it, as it has a number of users and they deserve to not have the rug
pulled out from under them.
>>>
>>> 2. Spin out a subset of the team to explore and prototype 2.0 based on
two very positive and re-energizing looking ideas:
>>>         a. Scala DSL (and maybe Spark)
>>>         b. 0xData
>>>
>>>         All of the work for #2 would be done in a clean repo and would
only bring in legacy code where it was truly beneficial (back compat. can
come later, if at all).
>>>         It would then benchmark those two approaches as well as look at
where they overlap and are mutually beneficial and then go forward with the
winner.
>>>
>>> 3. Once #2 is viable, put most effort into it and maintain 1.0 with as
minimal support as possible, encouraging, neh -- actively helping -- 1.0
customers upgrade as quickly as possible.
>>>
>>> The tricky part then becomes how do you make sure to still make your
sales #'s while also convincing them that your roadmap is what they are
really buying.
>>>
>>> If I didn't have the $$$ to do both of these (i.e. we need a massive
turn around and we have one last shot), I would be all in on #2.
>>>
>>> -----------------------------------
>>>
>>> That being said, Mahout is not "my company".  Heck, Mahout is not even
a "company", so we don't need to be bound by company conventions and
thought processes, even if that fits with all of our individual day jobs.
 And, thankfully, we don't have any sales numbers to make.
>>>
>>> We are chartered with one and only one mission: produce open source,
scalable machine learning libraries under the Apache license and community
driven principles.  We are not required by the Board or anyone else to
support version X for Y years or to use Hadoop or Scala or Java.  We are
also not required to implement any specific algorithms or deliver them on
specific time frames.  We are also not required to provide users upgrade
paths or the like.  Naturally, we _want_ to do these things for the sake of
the community, but let's be clear: it is not a requirement from the ASF.
 We are, however, required, to have a sustaining community.
>>>
>>> ------------------------------------
>>>
>>> I personally think we should start clean on #2, throwing off the
shackles of the past and emerge 6-9 months later with Mahout 2.0 (and yes,
call it that, not 0.1 as Sebastian suggests, for marketing reasons) built
on a completely new and fresh repository, likely bringing in only the
Math/collections underpinnings and maybe the build system.  This new
repository would have only a handful of core algorithms that we know are
well implemented, sustainable and best in class.
>>>
>>> I think we should look at the lead up to 0.9 as an experiment that
proved out a lot of interesting ideas, including the fact that Mahout
proved there is vast interest in open source large scale machine learning
and that it is the benchmark for comparison.  Not many other ML projects
can say that, even if they have better technical implementations or are
less fragmented.  Once you realize something has outlived it's usefulness
in software, however, there is no point in lingering.
>>>
>>> That being said, at least for the foreseeable future, I am not in a
position to contribute much code.  So, from my perspective, the ASF
Meritocratic approach takes over:  those who do the work make the
decisions.  If you want something in, then put up the patch and ask for
feedback.  If no one provides feedback, assume lazy consensus and move
forward.  Nothing convinces people better than actual, real, executing
code.  For my part, I am happy to continue to work the bureaucratic side of
things to make sure reports get filed, credentials get created, etc. and
the occasional patch.  I hope one day I will have time to contribute again.
>>>
>>> I will follow up w/ a separate email on what I am going to put in the
Board Report.
>>>
>>> On Apr 7, 2014, at 1:52 AM, Sean Owen <[email protected]> wrote:
>>>
>>>> No, it's about the opposite. I'm referring to the default, current
>>>> state of play here.
>>>>
>>>> The issues for a vendor are demand and supportability. Do people want
>>>> to pay for support of X? Can you honestly say you have expertise to
>>>> support and influence X over at least a major release cycle (12-18
>>>> months)? The latter needs a reasonably reliable roadmap and
>>>> continuity.
>>>>
>>>> I'm suggesting that in the current state, demand is low and going
>>>> down. The current code base seems de facto deprecated/unsupported
>>>> already, and possibly to be removed or dramatically changed into
>>>> something as-yet unclear. Nobody here seems to have taken a hard
>>>> decision regarding a next major release, but, the trajectory of that
>>>> decision seems clear if the current state remains the same.
>>>>
>>>>  From my perspective, "middle-ground" new directions like adding a bit
>>>> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only
>>>> worse. I can see why there may be a little renewed demand for the new
>>>> bits, but then, why not go all in on one of them?
>>>>
>>>> Because a substantially all-new direction is a different story. If a
>>>> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a
>>>> lot of renewed demand. And a clearer underlying roadmap sounds
>>>> possible. It would remain to be seen, but there's nothing stopping
>>>> those ideas from becoming part of a distro too.
>>>>
>>>>
>>>> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <[email protected]>
wrote:
>>>>>
>>>>> Please be explicit here.  It sounds like you are saying that if
Mahout goes
>>>>> in the proposed new direction that Cloudera will drop Mahout.
>>>>>
>>>>> Is that what you mean to say?
>>>
>>>
>>>
>>>
>>
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>>
>>
>>
>>
>>
>>
>

Reply via email to