Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

Trevor Grant Tue, 05 Sep 2017 14:25:17 -0700

Ever since we moved Flink to its own profile, I have been thinking we ought
to do the same to H2O but haven't been to motivated bc it was never causing
anyone any problems.

Maybe its time to drop H2O "official support" and move Flink Batch / H2O
into a "mahout/community/engines" folder.

Ive been doing a lot of Flink Streaming the last couple weeks and already
bootlegged a few of the 'Algorithms" into Flink.  Pretty sure we could
support those easily- and I _think_ we could do the same with the
distributed (e.g. wrap a DataStream[(Key, MahoutVector)] and implement the
the Operators on that.

I'd put FlinkStreaming as another community engine.

If we did that, I'd say- by convention we need a Markdown document in
mahout/community/engines that has a table of what is implemented on what.

That is to say, even if we only were able to implement the "algos" on Flink
Streaming- there would still be a lot of value to that for many
applications (esp considering the state of FlinkML).  Also beats having a
half cooked engine sitting on a feature branch.

Beam does something similar to that for their various engines.

Speaking of Beam, I've heard rumblings here and there of people tlaking
about making a Beam engine- this might motivate people to get started (no
one person feels responsible for "boiling the ocean" and throwing down an
entire engine in one go- but instead can hack out the portions they need.

My .02

tg

On Tue, Sep 5, 2017 at 4:04 PM, Andrew Palumbo <ap....@outlook.com> wrote:

> I've found a need for the sorting a Drm as well as In-core matrices,
> something like eg.: DrmLike.sortByColumn(...). I would like to implement
> this at the math-scala engine neutral level with pass through functions to
> underlying back ends.
>
>
> In-core would be engine neutral by current design (in-core matrices are
> all Mahout matrices with the exception of h2o.. which causes some concern.)
>
>
> For Spark, we can use  RDD.sortBy(...).
>
>
> Flink we can use DataSet.sortPartition(...).setParallelism(1).  (There
> may be a better method will look deeper).
>
>
> h2o has an implementation, I'm sure, but this brings me to a more
> important point: If we want to stub out a method in a back end module, Eg:
> h2o, which test suites do we want make a requirements?
>
>
> We've not set any specific rules for which test suites must pass for each
> module. We've had a soft requirement for inheriting and passing all test
> suites from math-scala.
>
>
> Setting a rule for this is something that we need to IMO.
>
>
> An easy option that I'm thinking would be to set the current core
> math-scala suites as a requirement, and then allow for an optional suite
> for methods which will be stubbed out.
>
>
> Thoughts?
>
>
> --andy
>
>
>

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

Reply via email to