Actually my vision is very close to that of Julia http://julialang.org/blog/2012/02/why-we-created-julia/,
except (1) i don't want to be in business of creating a yet-another language; (2) and i don't want to be in business of creating a yet-another distributed computations engine. On Mon, Apr 7, 2014 at 11:52 AM, Dmitriy Lyubimov <[email protected]> wrote: > > Do you suggest we should leave the blackbox stuff to MLBase/Oryx and >> solely focus on providing high level ML constructs? >> > No. I suggest nothing of the kind. I would still expect healthy stream of > blackbox contributions to be important. The ML environment provided by > generalist contributors > > (1) helps to attract blackbox contributions from specialist contributors > by making it easier; > (2) facilitates user devs to assemble and customize blackbox building > blocks into application with ease (as we know, 90% of application effort is > feature prep and vectorization effort, along with custom business rules and > metrics -- and we just want to hand out the end user means to do it > quickly and painlessly. Blackbox building bricks will never cover multitude > of those entirely); > (3) provides reliable differentiation of Mahout from other similar > attempts. > > I am just saying report fails to mention it and leaves an impression we > have no difference in vision from mllib, let alone MLI. This is simply not > true, at least in my case. > > >> @Sean How much can you agree on the vision I suggested? It meets your >> demand of having a plan to solve the problems with the MR codebase (by >> getting rid of it in the near future) and provides a direction for Spark as >> the new underlying execution system, with optional support for Stratosphere >> and H20, if those communities manage to convince us that it is worth to >> integrate. >> >> --sebastian >> >> >> >> >> 2014-04-07 19:29 GMT+02:00 Pat Ferrel <[email protected]>: >> The document does not mention the state of the existing Spark work in the >> snapshot codebase. Shouldn't this be noted? >> >> On Apr 7, 2014, at 5:06 AM, Sebastian Schelter <[email protected]> wrote: >> >> I think we should mention the redesign/rework of the website and the >> completion of the move from the old wiki to Apache CMS. >> >> --sebastian >> >> On 04/07/2014 02:04 PM, Grant Ingersoll wrote: >> > Here is my proposed report. For the most part, I think the only right >> thing to do vis-a-vis the Board is to report that we are in the midst of a >> healthy (yes, I believe it is, for the most part healthy and normal) >> discussion on where to go next. >> > >> > PMC Members: this is checked into SVN at >> https://svn.apache.org/repos/asf/mahout/pmc/board-reports/2014/board-report-apr.txt. >> It is due on Wednesday. If you object to this approach of reporting, >> please let me know ASAP and suggest alternatives. >> > >> > === Apache Mahout Status Report: April 2014 === >> > >> > ----- >> > >> > Apache Mahout has implementations of a wide range of machine learning >> and >> > data mining algorithms: clustering, classification, collaborative >> filtering >> > and frequent pattern mining >> > >> > Project Status >> > -------------- >> > >> > The project continues to have a large and active user base. While >> > the developer base has continued to grow, there is a very active >> > and healthy debate going on about where Mahout goes next. Please >> > see the Issues section below for more details. >> > >> > Community >> > --------- >> > >> > * Andrew Musselman was voted in as new committer. >> > * No changes to the PMC in the reporting period. >> > >> > * The main issue concerning the community right now is the addition >> > of new contributions from 0xData and the integration of Mahout with >> Spark. >> > >> > Community Objectives >> > -------------------- >> > >> > Our goal is to build scalable machine learning libraries. See the Issues >> > section below for the debate in the community about our objectives. >> > >> > >> > Releases >> > -------- >> > >> > In addition to an ongoing debate on Mahout's future, the community is >> actively >> > working on integrating Mahout with Scala/Spark, updating >> > documentation, and bringing in new code and committers to update the >> core project. >> > >> > >> > Issues >> > ------ >> > The Mahout community is at a crossroads in terms of where >> > to go next. While the project has a broad number of users and >> interested >> > parties, most committers are trying to maintain the code base on a >> purely >> > part time basis, when the amount of work to sustain these users >> > clearly points to it needing to >> > be full time. Furthermore, much of our original code base is written >> > for Hadoop MapReduce 1.0, which many in the community have come to >> realize >> > is not well-suited for solving the kinds of problems that Mahout has set >> > out to solve. There have been several lengthy discussions and >> prototypes >> > going on to work out next directions along the lines of the Spark and >> > 0xData contributions (there are numerous threads on the [email protected] >> > mailing list.) >> > >> > The PMC does not think this requires Board intervention at this time >> > as the debate is, as far as we can tell, healthy. We do, however, >> > expect that this debate will take some time to resolve and may mean we >> > won't be shipping a 1.0 release any time soon. We will keep the Board >> > apprised of our next steps as we work through the process. >> > >> > >> > >> > >> > On Apr 7, 2014, at 4:53 AM, Grant Ingersoll <[email protected]> >> wrote: >> > >> >> To Sean's point, if Mahout were "my company", I would do the >> following, albeit pragmatic and not so pleasant thing, assuming, of course, >> I had the $$$ to do so: >> >> >> >> 1. Clean up existing code with a laser focus on a few key areas >> (Sebastian's list makes sense) using a part of the team and call it 1.0 and >> ship it, as it has a number of users and they deserve to not have the rug >> pulled out from under them. >> >> >> >> 2. Spin out a subset of the team to explore and prototype 2.0 based on >> two very positive and re-energizing looking ideas: >> >> a. Scala DSL (and maybe Spark) >> >> b. 0xData >> >> >> >> All of the work for #2 would be done in a clean repo and would >> only bring in legacy code where it was truly beneficial (back compat. can >> come later, if at all). >> >> It would then benchmark those two approaches as well as look at >> where they overlap and are mutually beneficial and then go forward with the >> winner. >> >> >> >> 3. Once #2 is viable, put most effort into it and maintain 1.0 with as >> minimal support as possible, encouraging, neh -- actively helping -- 1.0 >> customers upgrade as quickly as possible. >> >> >> >> The tricky part then becomes how do you make sure to still make your >> sales #'s while also convincing them that your roadmap is what they are >> really buying. >> >> >> >> If I didn't have the $$$ to do both of these (i.e. we need a massive >> turn around and we have one last shot), I would be all in on #2. >> >> >> >> ----------------------------------- >> >> >> >> That being said, Mahout is not "my company". Heck, Mahout is not even >> a "company", so we don't need to be bound by company conventions and >> thought processes, even if that fits with all of our individual day jobs. >> And, thankfully, we don't have any sales numbers to make. >> >> >> >> We are chartered with one and only one mission: produce open source, >> scalable machine learning libraries under the Apache license and community >> driven principles. We are not required by the Board or anyone else to >> support version X for Y years or to use Hadoop or Scala or Java. We are >> also not required to implement any specific algorithms or deliver them on >> specific time frames. We are also not required to provide users upgrade >> paths or the like. Naturally, we _want_ to do these things for the sake of >> the community, but let's be clear: it is not a requirement from the ASF. >> We are, however, required, to have a sustaining community. >> >> >> >> ------------------------------------ >> >> >> >> I personally think we should start clean on #2, throwing off the >> shackles of the past and emerge 6-9 months later with Mahout 2.0 (and yes, >> call it that, not 0.1 as Sebastian suggests, for marketing reasons) built >> on a completely new and fresh repository, likely bringing in only the >> Math/collections underpinnings and maybe the build system. This new >> repository would have only a handful of core algorithms that we know are >> well implemented, sustainable and best in class. >> >> >> >> I think we should look at the lead up to 0.9 as an experiment that >> proved out a lot of interesting ideas, including the fact that Mahout >> proved there is vast interest in open source large scale machine learning >> and that it is the benchmark for comparison. Not many other ML projects >> can say that, even if they have better technical implementations or are >> less fragmented. Once you realize something has outlived it's usefulness >> in software, however, there is no point in lingering. >> >> >> >> That being said, at least for the foreseeable future, I am not in a >> position to contribute much code. So, from my perspective, the ASF >> Meritocratic approach takes over: those who do the work make the >> decisions. If you want something in, then put up the patch and ask for >> feedback. If no one provides feedback, assume lazy consensus and move >> forward. Nothing convinces people better than actual, real, executing >> code. For my part, I am happy to continue to work the bureaucratic side of >> things to make sure reports get filed, credentials get created, etc. and >> the occasional patch. I hope one day I will have time to contribute again. >> >> >> >> I will follow up w/ a separate email on what I am going to put in the >> Board Report. >> >> >> >> On Apr 7, 2014, at 1:52 AM, Sean Owen <[email protected]> wrote: >> >> >> >>> No, it's about the opposite. I'm referring to the default, current >> >>> state of play here. >> >>> >> >>> The issues for a vendor are demand and supportability. Do people want >> >>> to pay for support of X? Can you honestly say you have expertise to >> >>> support and influence X over at least a major release cycle (12-18 >> >>> months)? The latter needs a reasonably reliable roadmap and >> >>> continuity. >> >>> >> >>> I'm suggesting that in the current state, demand is low and going >> >>> down. The current code base seems de facto deprecated/unsupported >> >>> already, and possibly to be removed or dramatically changed into >> >>> something as-yet unclear. Nobody here seems to have taken a hard >> >>> decision regarding a next major release, but, the trajectory of that >> >>> decision seems clear if the current state remains the same. >> >>> >> >>> From my perspective, "middle-ground" new directions like adding a bit >> >>> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only >> >>> worse. I can see why there may be a little renewed demand for the new >> >>> bits, but then, why not go all in on one of them? >> >>> >> >>> Because a substantially all-new direction is a different story. If a >> >>> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a >> >>> lot of renewed demand. And a clearer underlying roadmap sounds >> >>> possible. It would remain to be seen, but there's nothing stopping >> >>> those ideas from becoming part of a distro too. >> >>> >> >>> >> >>> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <[email protected]> >> wrote: >> >>>> Please be explicit here. It sounds like you are saying that if >> Mahout goes >> >>>> in the proposed new direction that Cloudera will drop Mahout. >> >>>> >> >>>> Is that what you mean to say? >> >> >> >> >> > >> > -------------------------------------------- >> > Grant Ingersoll | @gsingers >> > http://www.lucidworks.com >> > >> > >> > >> > >> > >> >> >> >> >> >
