[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610715#comment-16610715 ] Wenchen Fan commented on SPARK-15690: - I'm removing the target version, since no one is working on it. > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin >Priority: Major > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332960#comment-15332960 ] Joseph Fourny commented on SPARK-15690: --- +1 on this. I am trying to develop single-node clusters on large servers (30+ CPU cores) with 2-3 TB or RAM. Our use cases involve small to medium size datasets, but with a huge amount of concurrent jobs (shared, multi-tenant environments). Efficiency and sub-second response times are the primary requirements. This shuffle between stages is the current bottleneck. Writing anything to disk is just a waste of time if all computations are done in the same JVM (or a small set of JVMs on the same machine). We tried using RAMFS to avoid disk I/O, but still a lot of CPU time is spent in compression and serialization. I would rather not hack my way out of this one. Is it wishful thinking to have this officially supported? > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328815#comment-15328815 ] Reynold Xin commented on SPARK-15690: - Definitely no serialization/deserialization. > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328717#comment-15328717 ] Shivaram Venkataraman commented on SPARK-15690: --- Yeah I dont think you'll see much improvement from avoiding the DAGScheduler. One more thing to try here is to avoid serialization / deserialization unless you are going to spill to disk. That'll save a lot of time inside a single node. > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328390#comment-15328390 ] Reynold Xin commented on SPARK-15690: - Yes there is definitely no reason to go through network for a single process. Technically we can even bypass the entire DAGScheduler, although that might be too much work. > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328378#comment-15328378 ] Saisai Shao commented on SPARK-15690: - I see. Since everything is in a single process, looks like netty layer could be by-passed and directly fetched the memory blocks in the reader side. It should definitely be faster than the current implementation. > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328264#comment-15328264 ] Reynold Xin commented on SPARK-15690: - Yup. Eventually we can also generalize this to multiple process (e.g. cluster). > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328255#comment-15328255 ] Saisai Shao commented on SPARK-15690: - Hi [~rxin], what's the meaning of "single-process", is that referring to something similar to local mode? > Fast single-node (single-process) in-memory shuffle > --- > > Key: SPARK-15690 > URL: https://issues.apache.org/jira/browse/SPARK-15690 > Project: Spark > Issue Type: New Feature > Components: Shuffle, SQL >Reporter: Reynold Xin > > Spark's current shuffle implementation sorts all intermediate data by their > partition id, and then write the data to disk. This is not a big bottleneck > because the network throughput on commodity clusters tend to be low. However, > an increasing number of Spark users are using the system to process data on a > single-node. When in a single node operating against intermediate data that > fits in memory, the existing shuffle code path can become a big bottleneck. > The goal of this ticket is to change Spark so it can use in-memory radix sort > to do data shuffling on a single node, and still gracefully fallback to disk > if the data size does not fit in memory. Given the number of partitions is > usually small (say less than 256), it'd require only a single pass do to the > radix sort with pretty decent CPU efficiency. > Note that there have been many in-memory shuffle attempts in the past. This > ticket has a smaller scope (single-process), and aims to actually > productionize this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org