[GitHub] spark pull request #21870: Branch 2.3

2018-08-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21870


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21870: Branch 2.3

2018-07-25 Thread lovezeropython
GitHub user lovezeropython opened a pull request:

https://github.com/apache/spark/pull/21870

Branch 2.3

EOFError
# ConnectionResetError: [Errno 54] Connection reset by peer
(Please fill in changes proposed in this fix)

```
/pyspark.zip/pyspark/worker.py", line 255, in main
if read_int(infile) == SpecialLengths.END_OF_STREAM:
  File 
"/Users/songhao/apps/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py",
 line 683, in read_int
length = stream.read(4)
ConnectionResetError: [Errno 54] Connection reset by peerenter code here
```


![markdown20180725153615](https://user-images.githubusercontent.com/35518020/43186593-ba74b8ea-9021-11e8-8c02-c013d987f204.png)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21870


commit 3737c3d32bb92e73cadaf3b1b9759d9be00b288d
Author: gatorsmile 
Date:   2018-02-13T06:05:13Z

[SPARK-20090][FOLLOW-UP] Revert the deprecation of `names` in PySpark

## What changes were proposed in this pull request?
Deprecating the field `name` in PySpark is not expected. This PR is to 
revert the change.

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #20595 from gatorsmile/removeDeprecate.

(cherry picked from commit 407f67249639709c40c46917700ed6dd736daa7d)
Signed-off-by: hyukjinkwon 

commit 1c81c0c626f115fbfe121ad6f6367b695e9f3b5f
Author: guoxiaolong 
Date:   2018-02-13T12:23:10Z

[SPARK-23384][WEB-UI] When it has no incomplete(completed) applications 
found, the last updated time is not formatted and client local time zone is not 
show in history server web ui.

## What changes were proposed in this pull request?

When it has no incomplete(completed) applications found, the last updated 
time is not formatted and client local time zone is not show in history server 
web ui. It is a bug.

fix before:

![1](https://user-images.githubusercontent.com/26266482/36070635-264d7cf0-0f3a-11e8-8426-14135ffedb16.png)

fix after:

![2](https://user-images.githubusercontent.com/26266482/36070651-8ec3800e-0f3a-11e8-991c-6122cc9539fe.png)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Author: guoxiaolong 

Closes #20573 from guoxiaolongzte/SPARK-23384.

(cherry picked from commit 300c40f50ab4258d697f06a814d1491dc875c847)
Signed-off-by: Sean Owen 

commit dbb1b399b6cf8372a3659c472f380142146b1248
Author: huangtengfei 
Date:   2018-02-13T15:59:21Z

[SPARK-23053][CORE] taskBinarySerialization and task partitions calculate 
in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status

## What changes were proposed in this pull request?

When we run concurrent jobs using the same rdd which is marked to do 
checkpoint. If one job has finished running the job, and start the process of 
RDD.doCheckpoint, while another job is submitted, then submitStage and 
submitMissingTasks will be called. In 
[submitMissingTasks](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L961),
 will serialize taskBinaryBytes and calculate task partitions which are both 
affected by the status of checkpoint, if the former is calculated before 
doCheckpoint finished, while the latter is calculated after doCheckpoint 
finished, when run task, rdd.compute will be called, for some rdds with 
particular partition type such as 
[UnionRDD](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala)
 who will do partition type cast, will get a ClassCastException because the 
part params is actually a CheckpointRDDPartition.
This error occurs  because rdd.doCheckpoint occurs in the same thread that 
called sc.runJob, while the task serialization occurs in the DAGSchedulers 
event loop.

## How was this patch tested?

the exist uts and also add a test case in DAGScheduerSuite to show the 
exception case.

Author: huangtengfei 

Closes #20244 from ivoson/branch-taskpart-mistype.

(cherry picked from commit 091a000d27f324de8c5c527880854ecfcf5de9a4)
Signed-off-by: Imran Rashid 

commit ab01ba718c7752b564e801a1ea546aedc2055dc0
Author: Bogdan Raducanu 
Date:   2018-02-13T17:49:52Z

[SPARK-23316][SQL]