[ 
https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332960#comment-15332960
 ] 

Joseph Fourny edited comment on SPARK-15690 at 6/16/16 3:00 AM:
----------------------------------------------------------------

I am trying to develop single-node clusters on large servers (30+ CPU cores) 
with 2-3 TB or RAM. Our use cases involve small to medium size datasets, but 
with a huge amount of concurrent jobs (shared, multi-tenant environments). 
Efficiency and sub-second response times are the primary requirements. This 
shuffle between stages is the current bottleneck. Writing anything to disk is 
just a waste of time if all computations are done in the same JVM (or a small 
set of JVMs on the same machine). We tried using RAMFS to avoid disk I/O, but 
still a lot of CPU time is spent in compression and serialization. I would 
rather not hack my way out of this one. Is it wishful thinking to have this 
officially supported?


was (Author: josephfourny):
+1 on this. I am trying to develop single-node clusters on large servers (30+ 
CPU cores) with 2-3 TB or RAM. Our use cases involve small to medium size 
datasets, but with a huge amount of concurrent jobs (shared, multi-tenant 
environments). Efficiency and sub-second response times are the primary 
requirements. This shuffle between stages is the current bottleneck. Writing 
anything to disk is just a waste of time if all computations are done in the 
same JVM (or a small set of JVMs on the same machine). We tried using RAMFS to 
avoid disk I/O, but still a lot of CPU time is spent in compression and 
serialization. I would rather not hack my way out of this one. Is it wishful 
thinking to have this officially supported?

> Fast single-node (single-process) in-memory shuffle
> ---------------------------------------------------
>
>                 Key: SPARK-15690
>                 URL: https://issues.apache.org/jira/browse/SPARK-15690
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle, SQL
>            Reporter: Reynold Xin
>
> Spark's current shuffle implementation sorts all intermediate data by their 
> partition id, and then write the data to disk. This is not a big bottleneck 
> because the network throughput on commodity clusters tend to be low. However, 
> an increasing number of Spark users are using the system to process data on a 
> single-node. When in a single node operating against intermediate data that 
> fits in memory, the existing shuffle code path can become a big bottleneck.
> The goal of this ticket is to change Spark so it can use in-memory radix sort 
> to do data shuffling on a single node, and still gracefully fallback to disk 
> if the data size does not fit in memory. Given the number of partitions is 
> usually small (say less than 256), it'd require only a single pass do to the 
> radix sort with pretty decent CPU efficiency.
> Note that there have been many in-memory shuffle attempts in the past. This 
> ticket has a smaller scope (single-process), and aims to actually 
> productionize this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to