GitHub user justinuang opened a pull request: https://github.com/apache/spark/pull/23051
[AE2.3-02][SPARK-23128] Add QueryStage and the framework for adaptive execution (auto setting the number of reducer) ## What changes were proposed in this pull request? Add QueryStage and the framework for adaptive execution. The main benefit from this PR is that the reducer count is set automatically based on a target file size. We got this PR (branch ae-02) from https://github.com/Intel-bigdata/spark-adaptive/pull/43, which is based on branch ae-01, but I decided to not merge those in because they require invasive changes to spark-core, and a protocol change to the external shuffle service. This PR should be relatively safe to merge because most of the code changes are to adaptive query execution, which isn't turned on by default. ## How was this patch tested? Unit tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/palantir/spark juang/cherry-pick-ae-02 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23051.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23051 ---- commit 7827060679786afc02e7e6e3d778c2fcb2c13db9 Author: Dan Sanduleac <dsanduleac@...> Date: 2018-03-24T00:40:20Z v1-maven-build-with-version should cache by revision not buildNum since it needs to be common between different jobs commit 06290c19132f0be5a3e5f6a32b3c4458beadc394 Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-24T20:19:19Z Ignore flaky scala tests as well as hive tests (#335) commit 2bc8fafe45711a64a56f4d031e75dc609c5314e6 Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-25T18:16:16Z Treat classnames with only skipped tests as having taken 0 time (#336) commit 2c8c96be2b9719cc998d113d5c7cabf6c51a2403 Author: Robert Kruszewski <robert@...> Date: 2018-03-26T09:38:09Z Force okhttp logging interceptor(#337) commit 4c99e6354198ec11b46ffc38014fdab6b55dcffd Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-26T11:13:07Z Handle nulls in k8s responses correctly (#334) commit cf31e8342e5c0b771c2b5dcb3b9a86540adf1f92 Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-26T12:18:15Z Store/restore ~/.m2 after versioned build (since pom.xml changes) (#339) commit de656d21c658bad0b7f873e9da541b7cb303c5fa Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-27T00:12:43Z build-sbt directly, and don't restore build-maven where not necessary (#340) commit d531f534734226dc65be04ec9e9714792afa983c Author: Dan Sanduleac <dsanduleac@...> Date: 2018-03-27T11:59:34Z empty commit commit 1aeaf27ae65ff3f625235b48a3a0e75d0a3fbb11 Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-28T12:46:12Z Faster deploy by parallelizing maven and skipping unnecessary second 'mvn package' (#342) commit 44a14cdafe247f7094d7571e00cfd8e85bf0e397 Author: Jeremy Liu <jeremy.jl.liu@...> Date: 2018-03-28T19:54:33Z Move RBackend to member variable commit 5d88c9527b602728ccaf0a48d0106b2729d46a2a Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-29T14:02:06Z [SPARK-23795][LAUNCHER] Make AbstractLauncher#self() protected (#341) commit 41415d4865b625da8516739e0e63acdb1137a3b0 Author: mcheah <mcheah@...> Date: 2018-03-07T01:59:03Z Rebase to upstream's version of Kubernetes support. commit 4ac24329b53e51cdc3990f634ed7a2249c8423e3 Author: mcheah <mcheah@...> Date: 2018-03-12T20:46:21Z Replace manifest commit 6d23bae6fcccb483128c9d70438653b0c239c8c6 Author: Ilan Filonenko <if56@...> Date: 2018-03-19T18:29:56Z [SPARK-22839][K8S] Remove the use of init-container for downloading remote dependencies Removal of the init-container for downloading remote dependencies. Built off of the work done by vanzin in an attempt to refactor driver/executor configuration elaborated in [this](https://issues.apache.org/jira/browse/SPARK-22839) ticket. This patch was tested with unit and integration tests. Author: Ilan Filonenko <i...@cornell.edu> Closes #20669 from ifilonenko/remove-init-container. commit 1d60e389e6b84b158a91e1a9cdeeb124949c4d07 Author: mcheah <mcheah@...> Date: 2018-03-29T18:51:02Z Match entrypoint as well commit 5774deb0022235455e84387a304fa6823f939f74 Author: amenck <amenck@...> Date: 2018-03-29T19:56:56Z Merge pull request #343 from jeremyjliu/jl/expose-r-backend Move RBackend to member variable commit 4e7f4f09512a5a30f72ed679fad594f87b12db75 Author: Dan SÄnduleac <dansanduleac@...> Date: 2018-03-29T20:29:52Z Properly remove hive from modules (#338) commit 95cf5f7523f60cfdd7fdc9d76dfd2668c287785c Author: mccheah <mcheah@...> Date: 2018-03-29T22:57:40Z Merge pull request #324 from palantir/use-upstream-kubernetes Rebase to upstream's version of Kubernetes support. commit a7383de811ea01f60aeb642b6c192aecef14ff6a Author: Robert Kruszewski <robertk@...> Date: 2018-03-30T02:30:51Z mapexpressions preserving origin commit 479cf4633bc415b33fd80fc969be885b0decc5cb Author: Robert Kruszewski <robertk@...> Date: 2018-03-30T02:34:47Z correct place commit bb10a57784784fa0f661540aa5cf3acb4dad7651 Author: mccheah <mcheah@...> Date: 2018-03-30T19:13:37Z Merge pull request #344 from palantir/rk/mapproductiteratorwithorigin transformexpression with origin commit 1010acc31543cf2c8db9396c3877d941b3dedaf1 Author: Patrick Woody <patrick.woody1@...> Date: 2018-03-30T21:34:22Z Revert "Move RBackend to member variable" (#345) commit 2223db05d722ec5720bf7ce3563ef6da9e180b9e Author: mccheah <mcheah@...> Date: 2018-03-31T13:49:40Z Support mounting files with --files using secrets. (#327) commit 29579ecf4743465a0cc370a1e8ecc109da73f34e Author: Robert Kruszewski <robertk@...> Date: 2018-03-31T14:15:14Z Merge branch 'master' into rk/upstream commit 78b34b40a7e034dd641418b804e6e2606b216ba4 Author: Robert Kruszewski <robert@...> Date: 2018-04-01T12:41:47Z Fix publish after k8s rebase (#347) commit 2788441fb6f945d1d945caa4675c97b8b2f5a472 Author: Patrick Woody <patrick.woody1@...> Date: 2018-04-02T17:54:15Z Revert "transformexpression with origin" (#350) commit 4cc4dee11883bf1954181ec808f0f57a9ee55c55 Author: Patrick Woody <patrick.woody1@...> Date: 2018-04-02T17:54:25Z Add reminder for upstream ticket/PR to github template (#351) commit 078066bdc9a77dd0c241fae544806d043cb0b167 Author: Robert Kruszewski <robertk@...> Date: 2018-03-31T14:25:56Z resolve conflicts commit fe35b58a9e8b1bdde111b542371123907686ba97 Author: mcheah <mcheah@...> Date: 2018-04-02T21:37:23Z Empty commit to clear Circle cache. commit 1264fb5908d3eab2cccfaf9b22b6975c7afd20d4 Author: mcheah <mcheah@...> Date: 2018-04-03T00:29:36Z Empty commit to tag 2.4.0-palantir.12 and trigger publish. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org