Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

Andrew Palumbo Tue, 05 Sep 2017 15:03:29 -0700

+1 to an h2o profile do you want to target 0.13.1 for this.?  I would like to 
keep officially supporting h2o as long as we can, since it highlights the 
abstraction so well (using custom H20Matrix classes rather than Mahout 
Matrices).



>Maybe its time to drop H2O "official support" and move Flink Batch / H2O
into a "mahout/community/engines" folder.


Interesting idea re: "mahout/community/engines" folder, not sure how it would 
make a difference, but


> I'd put FlinkStreaming as another community engine.
 + 1


> Speaking of Beam, I've heard rumblings here and there of people talking
about making a Beam engine- this might motivate people to get started (no
one person feels responsible for "boiling the ocean" and throwing down an
entire engine in one go- but instead can hack out the portions they need.


+1


> If we did that, I'd say- by convention we need a Markdown document in
mahout/community/engines that has a table of what is implemented on what.


Agreed - at least a single Doc.  We have to be very careful about ending up in 
into "what is mahout territory".


I believe that at this Juncture though we need to come up with a solid 
structure to avoid confusion.  Yes streaming engines will be very useful.


Maybe someone could start a GDoc to begin outlining a streaming engine plan - 
It would be good to structure streaming engines in the same way as we now have 
batch IMO - something like a "streaming-math-scala" with a familiar DSL for eg 
Flink and Spark Streaming, Though its easy to see quickly how these two would 
differ, and in some ways would not be as easy to "extend" modules as we do for 
batch.


I would think that this should be targeting 0.14.x, which we'd discussed long 
ago as being mainly an algo series of releases, however adding in Streaming 
Engines would be similar to adding in an algo which probes for MPI - something 
that we've discussed as well for 0.14.x.

I would like to get JCuda in 0.13.1 or 0.13.2 (if we do an 0.13.2 and depending 
on the timeline of 0.13.1)


--andy

________________________________
From: Andrew Palumbo <ap....@outlook.com>
Sent: Tuesday, September 5, 2017 5:04:40 PM
To: dev@mahout.apache.org
Subject: [DISCUSS} New feature - DRM and in-core matrix sort and required test 
suites for modules.

I've found a need for the sorting a Drm as well as In-core matrices, something 
like eg.: DrmLike.sortByColumn(...). I would like to implement this at the 
math-scala engine neutral level with pass through functions to underlying back 
ends.


In-core would be engine neutral by current design (in-core matrices are all 
Mahout matrices with the exception of h2o.. which causes some concern.)


For Spark, we can use  RDD.sortBy(...).


Flink we can use DataSet.sortPartition(...).setParallelism(1).  (There may be a 
better method will look deeper).


h2o has an implementation, I'm sure, but this brings me to a more important 
point: If we want to stub out a method in a back end module, Eg: h2o, which 
test suites do we want make a requirements?


We've not set any specific rules for which test suites must pass for each 
module. We've had a soft requirement for inheriting and passing all test suites 
from math-scala.


Setting a rule for this is something that we need to IMO.


An easy option that I'm thinking would be to set the current core math-scala 
suites as a requirement, and then allow for an optional suite for methods which 
will be stubbed out.


Thoughts?


--andy

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

Reply via email to