GitHub user Chopinxb opened a pull request: https://github.com/apache/spark/pull/22005
[SPARK-16817][CORE][WIP] Use Alluxio to improve stability of shuffle by replication of shuffle data ## What changes were proposed in this pull request? (In the PR, I propose to use Alluxio to help store shuffle data in order to improve the stability of complicated OLAP task. **Motivation** In original ways, when there is a shuffle fetch failure (NodeManager(shuffle service) crashed), spark will rerun previous stage to reproduce shuffle data. This way works well, but in some cases we cannot accept the recalculation price. In this PR, when there is a shuffle fetch failure , reduce will retry fetch shuffle data from Alluxio to avoid recalculation **Usage** 1. Enable this feature in spark-default.conf. `spark.alluxio.shuffle.enabled ture` ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/Chopinxb/spark spark-shuffle-alluxio Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22005.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22005 ---- commit a4371cfaf6672fc67cc2961e23f241dda49314f8 Author: XiaoBang <xiaobang213452@...> Date: 2018-06-20T00:49:24Z use alluxio to improve stability of shuffle commit 65659882839dc626e86f1d3dd73544eb2c28178b Author: xiaobang213452 <xiaobang213452@...> Date: 2018-08-06T05:12:31Z update style commit 20cabe1419f6eb382089a6faecede6cb420619d9 Author: xiaobang213452 <xiaobang213452@...> Date: 2018-08-06T08:17:42Z update style ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org