On 03/15/2015 01:42 PM, Pat Ferrel wrote:
Lots of discussion off the record about doing a release but shouldn’t we plan
this?
What has to be in a release of Mahout 0.10?
Seems like we could release as-is but it would be nice to have some of the
already completed work that isn’t committed yet:
* mrlegacy refactored out of scala, is it possible to get this in Dmitriy?
One question is how to package, with which version of Spark. There is a bug in
Spark 1.2.1 and I think in 1.2 (this is the big distro build) that requires any
class that uses the JavaSerializer to set a specific SparkConf key/value to
point to the guava jar on all workers. This only effects IndexedDatasets since
they use Guava’s BiMap. Rumor has it that 1.3 fixes this but I haven’t tried it
yet.
So we are currently stuck on 1.1.1 but could document how to work around to use
1.2 for a user who want’s to build Mahout from scratch. A user source build on
1.3 may not require a work around. We seem to be good on hadoop 2.x, which in
itself is a good reason to release since 0.9 was not.
What else needs to be done:
* rename module math-scala to core?
* create the distribution build. Currently this does not publish the scaladocs
and does not create artifacts for H2O or and Scala.
same problem for javadocs (other than mregacy). Is this a question for
INFRA? We have MAHOUT-1562
<https://issues.apache.org/jira/browse/MAHOUT-1562> and MAHOUT-1585
<https://issues.apache.org/jira/browse/MAHOUT-1585>
open for these. Were javadocs for all modules ever hosted? there were
links for them which are now dead so I removed them from the site. I'm
wondering because even once we get the scaladocs published in the build
will we have the same problem of them not being hosted.
* is H2O really in a form to publish?
In terms of scala bindings for the DRM and DSL Linear algebra
operations, solvers, etc.. , H2O should be good to go with the exception
of one bug (MAHOUT-1638
<https://issues.apache.org/jira/browse/MAHOUT-1638>). It passes
(almost) all math-scala tests. We have no other algorithms (outside of
math-scala solvers, decompositions, etc) for H2O. I'm not sure if its
being used or how much real world testing its had; It does serve at the
very least as a proof of concept for the Engine Neutral DSL.
Docs
* IMO we should name the Mahout Spark-Scala DSL and shell. More unique names
are easier to find in searches. Maybe Suneel can polish off his sanskrit and
suggest something.
* we should be ready to do some work here to restructure the CMS since it is
very 0.9 centric with Scala stuff almost an afterthought.
Agreed. What about categorizing the Documentation on the site under
tabs like "Mahout-DSL" "Mahout Spark-Environment" and "Mahout Map-Reduce" ?