On 03/15/2015 01:42 PM, Pat Ferrel wrote:
Lots of discussion off the record about doing a release but shouldn’t we plan 
this?

What has to be in a release of Mahout 0.10?

Seems like we could release as-is but it would be nice to have some of the 
already completed work that isn’t committed yet:
* mrlegacy refactored out of scala, is it possible to get this in Dmitriy?

One question is how to package, with which version of Spark. There is a bug in 
Spark 1.2.1 and I think in 1.2 (this is the big distro build) that requires any 
class that uses the JavaSerializer to set a specific SparkConf key/value to 
point to the guava jar on all workers. This only effects IndexedDatasets since 
they use Guava’s BiMap. Rumor has it that 1.3 fixes this but I haven’t tried it 
yet.

So we are currently stuck on 1.1.1 but could document how to work around to use 
1.2 for a user who want’s to build Mahout from scratch. A user source build on 
1.3 may not require a work around. We seem to be good on hadoop 2.x, which in 
itself is a good reason to release since 0.9 was not.

What else needs to be done:
* rename module math-scala to core?
* create the distribution build. Currently this does not publish the scaladocs 
and does not create artifacts for H2O or and Scala.

same problem for javadocs (other than mregacy). Is this a question for INFRA? We have MAHOUT-1562 <https://issues.apache.org/jira/browse/MAHOUT-1562> and MAHOUT-1585 <https://issues.apache.org/jira/browse/MAHOUT-1585> open for these. Were javadocs for all modules ever hosted? there were links for them which are now dead so I removed them from the site. I'm wondering because even once we get the scaladocs published in the build will we have the same problem of them not being hosted.

* is H2O really in a form to publish?

In terms of scala bindings for the DRM and DSL Linear algebra operations, solvers, etc.. , H2O should be good to go with the exception of one bug (MAHOUT-1638 <https://issues.apache.org/jira/browse/MAHOUT-1638>). It passes (almost) all math-scala tests. We have no other algorithms (outside of math-scala solvers, decompositions, etc) for H2O. I'm not sure if its being used or how much real world testing its had; It does serve at the very least as a proof of concept for the Engine Neutral DSL.


Docs
* IMO we should name the Mahout Spark-Scala DSL and shell. More unique names 
are easier to find in searches. Maybe Suneel can polish off his sanskrit and 
suggest something.
* we should be ready to do some work here to restructure the CMS since it is 
very 0.9 centric with Scala stuff almost an afterthought.

Agreed. What about categorizing the Documentation on the site under tabs like "Mahout-DSL" "Mahout Spark-Environment" and "Mahout Map-Reduce" ?


Reply via email to