[
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984052#comment-13984052
]
Dmitriy Lyubimov commented on MAHOUT-1529:
------------------------------------------
my thoughts on this:
(1) factor out DRMLike and logical operators into math-scala module.
(2) keep spark-specific physical op translations in the spark module.
(3) create StorageLevel's verbatim analog in Mahout (this probably needs more
careful handling -- needs investigation how it really would map into
Stratoshpere, if it all. But assuming for now we want to just walk away from
direct Spark dependency in the code, a simple 1:1 translation is probably
enough;
(4) For drmParallelize() etc. set of routines I see really two ways of doing
this.
(4a) wrap engine-specific context into "Either-or" Mahout context.
(4b) rely on assumption that these routines are not really used in
engine-agnostic algorithms, so individual engine will provide semantically
identical versions of those by import. At the very least, this will be required
for createMahoutContext() call.
I am really inclined to do (4a) not to lock ourselves into any assuptions
except for createMahoutContext() which will have to go into engine-specifc
package.
I will have to think about CheckpointedDRM and CheckpointedDRM$rdd. Maybe the
whole CheckpointedDRM also needs to be an engine-specific class.
> Finalize abstraction of distributed logical plans from backend operations
> -------------------------------------------------------------------------
>
> Key: MAHOUT-1529
> URL: https://issues.apache.org/jira/browse/MAHOUT-1529
> Project: Mahout
> Issue Type: Improvement
> Reporter: Dmitriy Lyubimov
>
> We have a few situations when algorithm-facing API has Spark dependencies
> creeping in.
> In particular, we know of the following cases:
> (1) checkpoint() accepts Spark constant StorageLevel directly;
> (2) certain things in CheckpointedDRM;
> (3) drmParallelize etc. routines in the "drm" and "sparkbindings" package.
--
This message was sent by Atlassian JIRA
(v6.2#6252)