On Sun, Apr 6, 2014 at 11:41 AM, Sebastian Schelter <[email protected]> wrote: > What is going on is the process of finding the next direction for mahout. > This process has started only recently, is still going on and involves > talking to people and projects outside of mahout to find means where ... > collaboration from the Spark, H2O and Stratosphere community. And there has > been a crowded room with no chairs left at the Hadoop Summit Europe last
Agree these are intriguing ideas. They seem to be received as a roadmap promise, when as you say this is not at all clear. The board report is a good excuse to discuss what really is the plan and state of things. > I think there is a big misconception here. It is not the case that "someone > wants to add Spark-based matrix stuff". Dmitriy has been working for several I'm aware of all this and it looks cool. The question is whether it fits into a cogent project vision that people can rely on. > This is a point that needs to be discussed. With the latest release, we > already deleted over 17,000 lines of code related to rarely used and > unmaintained algorithms. If it is feasible to port the remaining distributed Yes, there's a big *if* about whether existing code is deleted, or is transformed into a quite different form -- or whether nothing changes. The first two are coherent outcomes, but they imply that "Mahout as we know it" is going away (and would certainly be worth a board report!). Understanding that these very different possible outcomes are still a big *if* -- well, that's the problem. There is no reliable vision to plan around. > What I see is a lively, community-driven discussion ongoing that has yet to > produce a de-facto plan. I urge you and the major ecosystem distributor to > participate in this discussion so that we can together produce an outcome > that matches our interests. We have participated more than any organization, and argued for and contributed to standardizing, fixing, improving or else retiring existing code. It doesn't seem to catch on. I recognize it's always more interesting to look past obligations, to a next thing. It's about as popular as mom saying "you have to finish your broccoli before dessert!" even if she's right. If the reaction is just "let's talk about dessert" then you'll continue to see the, um, consumers of the broccoli leave, as we've observed internally. Thanks for not shooting the messenger, but maybe the messenger deserves a line in the board report? I actually think the community "style" here is just fine -- for Github, not an Apache project. That is no bad thing. This area is a swirl of rapidly-changing ideas now and needs a context where code and ideas can coalesce, disperse, change freely. OK, here's a straw-man for discussion, albeit extreme: - Retire Apache Mahout 0.x to the attic. Long live MapReduce. She served us well. - Move Spark-related DSL to an Apache Spark contrib repo - 0xdata / Ted proposes incubation of "Mahout2O" and drag along whatever bits of pieces of the math module are usable. I'd +1 that reboot of the brand!
