[ https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984052#comment-13984052 ]
Dmitriy Lyubimov commented on MAHOUT-1529: ------------------------------------------ my thoughts on this: (1) factor out DRMLike and logical operators into math-scala module. (2) keep spark-specific physical op translations in the spark module. (3) create StorageLevel's verbatim analog in Mahout (this probably needs more careful handling -- needs investigation how it really would map into Stratoshpere, if it all. But assuming for now we want to just walk away from direct Spark dependency in the code, a simple 1:1 translation is probably enough; (4) For drmParallelize() etc. set of routines I see really two ways of doing this. (4a) wrap engine-specific context into "Either-or" Mahout context. (4b) rely on assumption that these routines are not really used in engine-agnostic algorithms, so individual engine will provide semantically identical versions of those by import. At the very least, this will be required for createMahoutContext() call. I am really inclined to do (4a) not to lock ourselves into any assuptions except for createMahoutContext() which will have to go into engine-specifc package. I will have to think about CheckpointedDRM and CheckpointedDRM$rdd. Maybe the whole CheckpointedDRM also needs to be an engine-specific class. > Finalize abstraction of distributed logical plans from backend operations > ------------------------------------------------------------------------- > > Key: MAHOUT-1529 > URL: https://issues.apache.org/jira/browse/MAHOUT-1529 > Project: Mahout > Issue Type: Improvement > Reporter: Dmitriy Lyubimov > > We have a few situations when algorithm-facing API has Spark dependencies > creeping in. > In particular, we know of the following cases: > (1) checkpoint() accepts Spark constant StorageLevel directly; > (2) certain things in CheckpointedDRM; > (3) drmParallelize etc. routines in the "drm" and "sparkbindings" package. -- This message was sent by Atlassian JIRA (v6.2#6252)