[GitHub] spark pull request #21417: Branch 2.0

2018-05-24 Thread gentlewangyu
Github user gentlewangyu closed the pull request at:

https://github.com/apache/spark/pull/21417


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21417: Branch 2.0

2018-05-23 Thread gentlewangyu
GitHub user gentlewangyu opened a pull request:

https://github.com/apache/spark/pull/21417

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21417


commit 050b8177e27df06d33a6f6f2b3b6a952b0d03ba6
Author: cody koeninger 
Date:   2016-10-12T22:22:06Z

[SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of 
poll twice

## What changes were proposed in this pull request?

Alternative approach to https://github.com/apache/spark/pull/15387

Author: cody koeninger 

Closes #15401 from koeninger/SPARK-17782-alt.

(cherry picked from commit f9a56a153e0579283160519065c7f3620d12da3e)
Signed-off-by: Shixiong Zhu 

commit 5903dabc57c07310573babe94e4f205bdea6455f
Author: Brian Cho 
Date:   2016-10-13T03:43:18Z

[SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics

## What changes were proposed in this pull request?

Fix a bug where spill metrics were being reported as shuffle metrics. 
Eventually these spill metrics should be reported (SPARK-3577), but separate 
from shuffle metrics. The fix itself basically reverts the line to what it was 
in 1.6.

## How was this patch tested?

Cherry-picked from master (#15347)

Author: Brian Cho 

Closes #15455 from dafrista/shuffle-metrics-2.0.

commit ab00e410c6b1d7dafdfabcea1f249c78459b94f0
Author: Burak Yavuz 
Date:   2016-10-13T04:40:45Z

[SPARK-17876] Write StructuredStreaming WAL to a stream instead of 
materializing all at once

## What changes were proposed in this pull request?

The CompactibleFileStreamLog materializes the whole metadata log in memory 
as a String. This can cause issues when there are lots of files that are being 
committed, especially during a compaction batch.
You may come across stacktraces that look like:
```
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at 
org.apache.spark.sql.execution.streaming.FileStreamSinkLog.serialize(FileStreamSinkLog.scala:127)

```
The safer way is to write to an output stream so that we don't have to 
materialize a huge string.

## How was this patch tested?

Existing unit tests

Author: Burak Yavuz 

Closes #15437 from brkyvz/ser-to-stream.

(cherry picked from commit edeb51a39d76d64196d7635f52be1b42c7ec4341)
Signed-off-by: Shixiong Zhu 

commit d38f38a093b4dff32c686675d93ab03e7a8f4908
Author: buzhihuojie 
Date:   2016-10-13T05:51:54Z

minor doc fix for Row.scala

## What changes were proposed in this pull request?

minor doc fix for "getAnyValAs" in class Row

## How was this patch tested?

None.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: buzhihuojie 

Closes #15452 from david-weiluo-ren/minorDocFixForRow.

(cherry picked from commit 7222a25a11790fa9d9d1428c84b6f827a785c9e8)
Signed-off-by: Reynold Xin 

commit d7fa3e32421c73adfa522adfeeb970edd4c22eb3
Author: Shixiong Zhu 
Date:   2016-10-13T20:31:50Z

[SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource 
instead of counting on KafkaConsumer

## What changes were proposed in this pull request?

Because `KafkaConsumer.poll(0)` may update the partition offsets, this PR 
just calls `seekToBeginning` to manually set the earliest offsets for the 
KafkaSource initial offsets.

## How was this patch tested?

Existing tests.

Author: Shixiong Zhu 

Closes #15397 from zsxwing/SPARK-17834.

(cherry picked from commit 08eac356095c7faa2b19d52f2fb0cbc47eb7d1d1)
Signed-off-by: Shixiong Zhu 

commit c53b8374911e801ed98c1436c384f0aef076eaab
Author: Davies Liu