[ 
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984052#comment-13984052
 ] 

Dmitriy Lyubimov commented on MAHOUT-1529:
------------------------------------------

my thoughts on this: 

(1) factor out DRMLike and logical operators into math-scala module.
(2) keep spark-specific physical op translations in the spark module.
(3) create StorageLevel's verbatim analog in Mahout (this probably needs more 
careful handling -- needs investigation how it really would map into 
Stratoshpere, if it all. But assuming for now we want to just walk away from 
direct Spark dependency in the code, a simple 1:1 translation is probably 
enough;
(4) For drmParallelize() etc. set of routines I see really two ways of doing 
this.
(4a) wrap engine-specific context into "Either-or" Mahout context. 
(4b) rely on assumption that these routines are not really used in 
engine-agnostic algorithms, so individual engine will provide semantically 
identical versions of those by import. At the very least, this will be required 
for createMahoutContext() call. 
I am really inclined to do (4a) not to lock ourselves into any assuptions 
except for createMahoutContext() which will have to go into engine-specifc 
package.

I will have to think about CheckpointedDRM and CheckpointedDRM$rdd. Maybe the 
whole CheckpointedDRM also needs to be an engine-specific class. 


> Finalize abstraction of distributed logical plans from backend operations
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1529
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>
> We have a few situations when algorithm-facing API has Spark dependencies 
> creeping in. 
> In particular, we know of the following cases:
> (1) checkpoint() accepts Spark constant StorageLevel directly;
> (2) certain things in CheckpointedDRM;
> (3) drmParallelize etc. routines in the "drm" and "sparkbindings" package. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to