[ 
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

uncleGen updated SPARK-3376:
----------------------------
    Description: 
I think a memory-based shuffle can reduce some overhead of disk I/O. I just 
want to know is there any plan to do something about it. Or any suggestion 
about it. Base on the work (SPARK-2044), it is feasible to have several 
implementations of  shuffle.





  was:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just 
want to know is there any plan to do something about it. Or any suggestion 
about it. Base on the work (SPARK-2044), it is feasible to have several 
implementations of  shuffle.



Following is my testing on "InMemory Shuffle"

| data size        |  partitions  |  resources |
| 5131859218  |    2000       |   50 executors/ 4 cores/ 4GB |

| settings               |  operation1                                   | 
operation2 |
| shuffle spill & lz4 |  repartition+flatMap+groupByKey | repartition + 
groupByKey | 
|memory   |   38s                   |  16s |
|sort     |   45s                   |  28s |
|hash     |   46s                   |  28s |
|no shuffle spill & lz4 | | |
| memory |   16s                         | 16s |
| | | |
|shuffle spill & lzf | | |
|memory|  28s                           | 27s |
|sort  |  29s                           | 29s |
|hash  |  41s                           | 30s |
|no shuffle spill & lzf | | |
| memory |  15s                         | 16s |


> Memory-based shuffle strategy to reduce overhead of disk I/O
> ------------------------------------------------------------
>
>                 Key: SPARK-3376
>                 URL: https://issues.apache.org/jira/browse/SPARK-3376
>             Project: Spark
>          Issue Type: Planned Work
>            Reporter: uncleGen
>            Priority: Trivial
>
> I think a memory-based shuffle can reduce some overhead of disk I/O. I just 
> want to know is there any plan to do something about it. Or any suggestion 
> about it. Base on the work (SPARK-2044), it is feasible to have several 
> implementations of  shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to