I've found a need for the sorting a Drm as well as In-core matrices, something 
like eg.: DrmLike.sortByColumn(...). I would like to implement this at the 
math-scala engine neutral level with pass through functions to underlying back 
ends.


In-core would be engine neutral by current design (in-core matrices are all 
Mahout matrices with the exception of h2o.. which causes some concern.)


For Spark, we can use  RDD.sortBy(...).


Flink we can use DataSet.sortPartition(...).setParallelism(1).  (There may be a 
better method will look deeper).


h2o has an implementation, I'm sure, but this brings me to a more important 
point: If we want to stub out a method in a back end module, Eg: h2o, which 
test suites do we want make a requirements?


We've not set any specific rules for which test suites must pass for each 
module. We've had a soft requirement for inheriting and passing all test suites 
from math-scala.


Setting a rule for this is something that we need to IMO.


An easy option that I'm thinking would be to set the current core math-scala 
suites as a requirement, and then allow for an optional suite for methods which 
will be stubbed out.


Thoughts?


--andy


Reply via email to