On Fri, Feb 28, 2014 at 1:56 AM, Sean Owen <sro...@gmail.com> wrote:

> OK, your defeatism is my realism. Why has Negative Nancy intruded on
> this conversation?
>
>

> Your Reality May Vary. This seems like yellow-flag territory for an
> Apache project though, if this is representative of a wider reality.
> So a conversation about whole other projects' worth of new
> functionality feels quite disconnected -- red-flag territory.
>

Indeed it may.

(1) As far back as i could recollect tracking Mahout PMC, it has always
generally proclaimed that Mahout is about ML at scale. It was specifically
emphasized it was not about running ML on Hadoop. This ML coupling to MR
and Hadoop in particular seems to exist just in your head, but nobody
else's I talked to. Mahout has never been that dogmatic.

(2) Technology changes, and weaknesses of Mahout for most part stem from
heavily relying on aging approaches, and no amount of cleanup is going to
address that. Sad truth is that java, MR in general and Hadoop in
particular are increasingly poor fit for modern day ML. As a function of
it, I believe future holds that MR-based processing will gradually decay,
as well as direct java use for ML math. Like i said, the only thing that
Mahout is still being used for is the unqiueness of the its good (i.e. you
can't do it any other way today for free), not necessarily because of its
underpinnings. But hey, I can say the same thing about say R any day. And I
keep using both R in Mahout for that very reason.

(3) I view any project community is an evolutionary process. The 1.0
milestones IMO are pretty ephemeral if we measure them from maturity point
of view. I might very successfully argue (and ops in my last two companies
wholeheartedly agree with me) that e.g. CDH3 was much closer to "1.0" than
CDH4. Bottom line, it is neverending story. The fluff dies off, the golden
nuggets survive and evolve, even if thru other projects. Dust to dust and
so forth. No drama here whatsoever.

So is community. It ebbs and goes away. The hype is a strong motivating
force there. Look how long EJB delusion lasted. And any reasonable computer
scientist would take say Ceph over CDH any time of day thru independent
benchmarks for performance and usability, yet it is not happening en masse.
Hype is a strong force.

Another thing is... Mahout is not a good source for PhD dissertations. So
no university will ever help with it. We have to go by with what we have.
On a good day somebody would bring in an uniquely viable solution, and in
the end of the day that's the only thing that keeps things moving.
Differentiation in problem coverage. So no drama here either.


>
> To be constructive, here are four items that seem more important for
> something like "1.0.0" and are even a lot less work:
>
> - Use Hadoop .mapreduce API consistently
> - Standardize input output formats of all jobs
> - Remove use of deprecated code
> - Clear even a third of the open JIRA backlog
>

Like i said, i believe the future is in moving ahead, build on strengths
and finding unique proposition. I agree with the above in a sense  that
out-of-core stuff that runs over MR could use some unification. I know you
have done a lot in that department and I assume since you are writing to
dev list, you are looking to help with that going forward. Cause if  not...
the dev lists are not exactly created to be an open forum for just giving
lectures.

Reply via email to