On Sun, Apr 6, 2014 at 11:41 AM, Sebastian Schelter <[email protected]> wrote:
> What is going on is the process of finding the next direction for mahout.
> This process has started only recently, is still going on and involves
> talking to people and projects outside of mahout to find means where
...
> collaboration from the Spark, H2O and Stratosphere community. And there has
> been a crowded room with no chairs left at the Hadoop Summit Europe last

Agree these are intriguing ideas. They seem to be received as a
roadmap promise, when as you say this is not at all clear. The board
report is a good excuse to discuss what really is the plan and state
of things.


> I think there is a big misconception here. It is not the case that "someone
> wants to add Spark-based matrix stuff". Dmitriy has been working for several

I'm aware of all this and it looks cool. The question is whether it
fits into a cogent project vision that people can rely on.


> This is a point that needs to be discussed. With the latest release, we
> already deleted over 17,000 lines of code related to rarely used and
> unmaintained algorithms. If it is feasible to port the remaining distributed

Yes, there's a big *if* about whether existing code is deleted, or is
transformed into a quite different form -- or whether nothing changes.
The first two are coherent outcomes, but they imply that "Mahout as we
know it" is going away (and would certainly be worth a board report!).

Understanding that these very different possible outcomes are still a
big *if* -- well, that's the problem. There is no reliable vision to
plan around.


> What I see is a lively, community-driven discussion ongoing that has yet to
> produce a de-facto plan. I urge you and the major ecosystem distributor to
> participate in this discussion so that we can together produce an outcome
> that matches our interests.

We have participated more than any organization, and argued for and
contributed to standardizing, fixing, improving or else retiring
existing code. It doesn't seem to catch on. I recognize it's always
more interesting to look past obligations, to a next thing. It's about
as popular as mom saying "you have to finish your broccoli before
dessert!" even if she's right. If the reaction is just "let's talk
about dessert" then you'll continue to see the, um, consumers of the
broccoli leave, as we've observed internally. Thanks for not shooting
the messenger, but maybe the messenger deserves a line in the board
report?

I actually think the community "style" here is just fine -- for
Github, not an Apache project. That is no bad thing. This area is a
swirl of rapidly-changing ideas now and needs a context where code and
ideas can coalesce, disperse, change freely.

OK, here's a straw-man for discussion, albeit extreme:

- Retire Apache Mahout 0.x to the attic. Long live MapReduce. She
served us well.
- Move Spark-related DSL to an Apache Spark contrib repo
- 0xdata / Ted proposes incubation of "Mahout2O" and drag along
whatever bits of pieces of the math module are usable.

I'd +1 that reboot of the brand!

Reply via email to