[ 
https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005289#comment-14005289
 ] 

Dmitriy Lyubimov commented on MAHOUT-1490:
------------------------------------------

And i guess i am still dubious whether it in reality will be a problem. 
Generally Spark's way of dealing with this is "make them smaller". I.e. 
reallistically it is only a problem for tasks the load data off hdfs. At which 
point one just reduces split size until it fits. once tasks goes over task 
"boundary" (or whatever their term for data spilling over the jvm bounds) it 
will already compressed per above and the next task (such as post-shuffle task) 
will automatically load it in the compressed form. Ideally.

> Data frame R-like bindings
> --------------------------
>
>                 Key: MAHOUT-1490
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1490
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Saikat Kanjilal
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>   Original Estimate: 20h
>  Remaining Estimate: 20h
>
> Create Data frame R-like bindings for spark



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to