On 03/17/2015 01:59 PM, Dmitriy Lyubimov wrote:
as long as tests run, i don't care about h20. our methods don't have a
real published benchmark either (which what it really needs).

All algebra and decomposition tests (everything in math-scala except NB) are passing for H20. The only issue that I know of is the the minor one with setting string keys for DrmLike[String]. The only test from Math-Scala that fails because of this is the Naive Bayes test from the abstract implementation of NB, I may will remove this (the Math-Scala NB Implementation) because I'm not sure that it's going to be used and is really kind of a hack. Eithier way we should have test for setting String keys for a Drm which would at this point fail in H20 (it is probably an easy fix).

i think we need a good name encompassing not just algebra, but also
the entire scope of base capabilities in environment like what is
called R-base in R. This includes basic stats, too. btw i've already
done stat bindings 2 for colt stuff 2 times already. it is not a big
deal (especially if it is just a bridge to somethig extra common else
like apache-math)


On Tue, Mar 17, 2015 at 9:14 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
Yeah we need a real name that brings no baggage. R-like, based on scala, big 
data linear algebra yada yada. Can’t say that in a descriptive phrase so why 
not a name like Mahout-xyz? Of course with a more catchy search friendly xyz

But AP’s structure seems pretty good

I’m nervous releasing H2O with no one supporting it. Is anyone signing up for 
that?


On Mar 17, 2015, at 8:59 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

I dont like the term dsl.

It is algebtaic optimizer, folks. Calling it dsl brings in wrong and too
trivial ideas about it.
On Mar 17, 2015 8:27 AM, "Andrew Palumbo" <ap....@outlook.com> wrote:

On 03/15/2015 01:42 PM, Pat Ferrel wrote:

Lots of discussion off the record about doing a release but shouldn’t we
plan this?

What has to be in a release of Mahout 0.10?

Seems like we could release as-is but it would be nice to have some of
the already completed work that isn’t committed yet:
* mrlegacy refactored out of scala, is it possible to get this in Dmitriy?

One question is how to package, with which version of Spark. There is a
bug in Spark 1.2.1 and I think in 1.2 (this is the big distro build) that
requires any class that uses the JavaSerializer to set a specific SparkConf
key/value to point to the guava jar on all workers. This only effects
IndexedDatasets since they use Guava’s BiMap. Rumor has it that 1.3 fixes
this but I haven’t tried it yet.

So we are currently stuck on 1.1.1 but could document how to work around
to use 1.2 for a user who want’s to build Mahout from scratch. A user
source build on 1.3 may not require a work around. We seem to be good on
hadoop 2.x, which in itself is a good reason to release since 0.9 was not.

What else needs to be done:
* rename module math-scala to core?
* create the distribution build. Currently this does not publish the
scaladocs and does not create artifacts for H2O or and Scala.

same problem for javadocs (other than mregacy).  Is this a question for
INFRA?  We have MAHOUT-1562 <https://issues.apache.org/
jira/browse/MAHOUT-1562> and MAHOUT-1585 <https://issues.apache.org/
jira/browse/MAHOUT-1585>
open for these.  Were javadocs for all modules ever hosted? there were
links for them which are now dead so I removed them from the site.  I'm
wondering because even once we get the scaladocs published in the build
will we have the same problem of them not being hosted.

* is H2O really in a form to publish?
In terms of scala bindings for the DRM and  DSL Linear algebra operations,
solvers, etc.. , H2O should be good to go with the exception of one bug
(MAHOUT-1638 <https://issues.apache.org/jira/browse/MAHOUT-1638>).   It
passes (almost) all math-scala tests.  We have no other algorithms (outside
of math-scala solvers, decompositions, etc) for H2O. I'm not sure if its
being used or how much real world testing its had; It does serve at the
very least as a proof of concept for the Engine Neutral DSL.


Docs
* IMO we should name the Mahout Spark-Scala DSL and shell. More unique
names are easier to find in searches. Maybe Suneel can polish off his
sanskrit and suggest something.
* we should be ready to do some work here to restructure the CMS since it
is very 0.9 centric with Scala stuff almost an afterthought.

Agreed.  What about categorizing  the Documentation on the site under tabs
like "Mahout-DSL"  "Mahout Spark-Environment" and "Mahout Map-Reduce" ?




Reply via email to