GitHub user jiangxb1987 opened a pull request:
https://github.com/apache/spark/pull/21918
[SPARK-24821][Core] Fail fast when submitted job compute on a subset of all
the partitions for a barrier stage
## What changes were proposed in this pull request?
Check on `DAGScheduler.submitJob()` to make sure we are not launching a
barrier stage on only a subset of all the partitions(one example is the
`first()` operation), otherwise shall fail fast.
## How was this patch tested?
Add new test case in `BarrierStageOnSubmittedSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jiangxb1987/spark SPARK-24821
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21918.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21918
----
commit b93d21267d6204f25c8fabeec681d1b6e9ebffb6
Author: Xingbo Jiang <xingbo.jiang@...>
Date: 2018-07-30T15:30:33Z
Fail fast when submitted job compute on a subset of all the partitions for
a barrier stage
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]