The release gotta be by mid-April in time for the big top guys to package mahout into their distro
Sent from my iPhone > On Mar 17, 2015, at 6:41 PM, Andrew Musselman <andrew.mussel...@gmail.com> > wrote: > > Seeing so much stuff here and along the line that I think a 0.9.1 release > is in order; get things in order in parallel with more complex questions. > I can commit to working on cleanup and minor bugs next two months, plan a > release in May, for instance. > > On Tue, Mar 17, 2015 at 11:38 PM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > >> Agree DSL is a bad name; I like distributed algebra or algebraic optimizer. >> >>> On Tue, Mar 17, 2015 at 5:14 PM, Pat Ferrel <p...@occamsmachete.com> wrote: >>> >>> Yeah we need a real name that brings no baggage. R-like, based on scala, >>> big data linear algebra yada yada. Can’t say that in a descriptive phrase >>> so why not a name like Mahout-xyz? Of course with a more catchy search >>> friendly xyz >>> >>> But AP’s structure seems pretty good >>> >>> I’m nervous releasing H2O with no one supporting it. Is anyone signing up >>> for that? >>> >>> >>> On Mar 17, 2015, at 8:59 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >>> >>> I dont like the term dsl. >>> >>> It is algebtaic optimizer, folks. Calling it dsl brings in wrong and too >>> trivial ideas about it. >>>> On Mar 17, 2015 8:27 AM, "Andrew Palumbo" <ap....@outlook.com> wrote: >>>> >>>> >>>>> On 03/15/2015 01:42 PM, Pat Ferrel wrote: >>>>> >>>>> Lots of discussion off the record about doing a release but shouldn’t >>> we >>>>> plan this? >>>>> >>>>> What has to be in a release of Mahout 0.10? >>>>> >>>>> Seems like we could release as-is but it would be nice to have some of >>>>> the already completed work that isn’t committed yet: >>>>> * mrlegacy refactored out of scala, is it possible to get this in >>> Dmitriy? >>>>> >>>>> One question is how to package, with which version of Spark. There is a >>>>> bug in Spark 1.2.1 and I think in 1.2 (this is the big distro build) >>> that >>>>> requires any class that uses the JavaSerializer to set a specific >>> SparkConf >>>>> key/value to point to the guava jar on all workers. This only effects >>>>> IndexedDatasets since they use Guava’s BiMap. Rumor has it that 1.3 >>> fixes >>>>> this but I haven’t tried it yet. >>>>> >>>>> So we are currently stuck on 1.1.1 but could document how to work >>> around >>>>> to use 1.2 for a user who want’s to build Mahout from scratch. A user >>>>> source build on 1.3 may not require a work around. We seem to be good >>> on >>>>> hadoop 2.x, which in itself is a good reason to release since 0.9 was >>> not. >>>>> >>>>> What else needs to be done: >>>>> * rename module math-scala to core? >>>>> * create the distribution build. Currently this does not publish the >>>>> scaladocs and does not create artifacts for H2O or and Scala. >>>> >>>> same problem for javadocs (other than mregacy). Is this a question for >>>> INFRA? We have MAHOUT-1562 <https://issues.apache.org/ >>>> jira/browse/MAHOUT-1562> and MAHOUT-1585 <https://issues.apache.org/ >>>> jira/browse/MAHOUT-1585> >>>> open for these. Were javadocs for all modules ever hosted? there were >>>> links for them which are now dead so I removed them from the site. I'm >>>> wondering because even once we get the scaladocs published in the build >>>> will we have the same problem of them not being hosted. >>>> >>>> * is H2O really in a form to publish? >>>> >>>> In terms of scala bindings for the DRM and DSL Linear algebra >>> operations, >>>> solvers, etc.. , H2O should be good to go with the exception of one bug >>>> (MAHOUT-1638 <https://issues.apache.org/jira/browse/MAHOUT-1638>). It >>>> passes (almost) all math-scala tests. We have no other algorithms >>> (outside >>>> of math-scala solvers, decompositions, etc) for H2O. I'm not sure if its >>>> being used or how much real world testing its had; It does serve at the >>>> very least as a proof of concept for the Engine Neutral DSL. >>>> >>>> >>>>> Docs >>>>> * IMO we should name the Mahout Spark-Scala DSL and shell. More unique >>>>> names are easier to find in searches. Maybe Suneel can polish off his >>>>> sanskrit and suggest something. >>>>> * we should be ready to do some work here to restructure the CMS since >>> it >>>>> is very 0.9 centric with Scala stuff almost an afterthought. >>>> >>>> Agreed. What about categorizing the Documentation on the site under >>> tabs >>>> like "Mahout-DSL" "Mahout Spark-Environment" and "Mahout Map-Reduce" ? >>