[ 
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987443#comment-13987443
 ] 

Anand Avati commented on MAHOUT-1529:
-------------------------------------

Some thoughts -

As an algo implementor, does one really care about platform specific details 
like checkpoint(mem) vs checkpoint(disk) vs cache() etc.? would it not be 
enough to present one generic call, like .materialize() which would either 
trigger the computation in the physical layer (or give it a hint)? For 
persistence, why not just have the explicit .writeDRM() and be done? So as an 
API consumer there is:

.materialize() -- (trigger optimizer and computation, thereby avoiding future 
duplicate evaluations, translates to .checkpoint(MEM) in spark for e.g).
.writeDRM(filename) -- serialize computed DRM to persistence store (implies 
materialization if not already)


> Finalize abstraction of distributed logical plans from backend operations
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1529
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> We have a few situations when algorithm-facing API has Spark dependencies 
> creeping in. 
> In particular, we know of the following cases:
> (1) checkpoint() accepts Spark constant StorageLevel directly;
> (2) certain things in CheckpointedDRM;
> (3) drmParallelize etc. routines in the "drm" and "sparkbindings" package. 
> (5) drmBroadcast returns a Spark-specific Broadcast object



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to