[GitHub] spark pull request #22860: Branch 2.4

2018-10-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22860


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22860: Branch 2.4

2018-10-27 Thread sarojchand
GitHub user sarojchand opened a pull request:

https://github.com/apache/spark/pull/22860

Branch 2.4

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22860


commit b632e775cc057492ebba6b65647d90908aa00421
Author: Marco Gaido 
Date:   2018-09-06T07:27:59Z

[SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String

## What changes were proposed in this pull request?

SPARK-10399 introduced a performance regression on the hash computation for 
UTF8String.

The regression can be evaluated with the code attached in the JIRA. That 
code runs in about 120 us per method on my laptop (MacBook Pro 2.5 GHz Intel 
Core i7, RAM 16 GB 1600 MHz DDR3) while the code from branch 2.3 takes on the 
same machine about 45 us for me. After the PR, the code takes about 45 us on 
the master branch too.

## How was this patch tested?

running the perf test from the JIRA

Closes #22338 from mgaido91/SPARK-25317.

Authored-by: Marco Gaido 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 64c314e22fecca1ca3fe32378fc9374d8485deec)
Signed-off-by: Wenchen Fan 

commit 085f731adb9b8c82a2bf4bbcae6d889a967fbd53
Author: Shahid 
Date:   2018-09-06T16:52:58Z

[SPARK-25268][GRAPHX] run Parallel Personalized PageRank throws 
serialization Exception

## What changes were proposed in this pull request?
mapValues in scala is currently not serializable. To avoid the 
serialization issue while running pageRank, we need to use map instead of 
mapValues.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #22271 from shahidki31/master_latest.

Authored-by: Shahid 
Signed-off-by: Joseph K. Bradley 
(cherry picked from commit 3b6591b0b064b13a411e5b8f8ee4883a69c39e2d)
Signed-off-by: Joseph K. Bradley 

commit f2d5022233b637eb50567f7945042b3a8c9c6b25
Author: hyukjinkwon 
Date:   2018-09-06T15:18:49Z

[SPARK-25328][PYTHON] Add an example for having two columns as the grouping 
key in group aggregate pandas UDF

## What changes were proposed in this pull request?

This PR proposes to add another example for multiple grouping key in group 
aggregate pandas UDF since this feature could make users still confused.

## How was this patch tested?

Manually tested and documentation built.

Closes #22329 from HyukjinKwon/SPARK-25328.

Authored-by: hyukjinkwon 
Signed-off-by: Bryan Cutler 
(cherry picked from commit 7ef6d1daf858cc9a2c390074f92aaf56c219518a)
Signed-off-by: Bryan Cutler 

commit 3682d29f45870031d9dc4e812accbfbb583cc52a
Author: liyuanjian 
Date:   2018-09-06T17:17:29Z

[SPARK-25072][PYSPARK] Forbid extra value for custom Row

## What changes were proposed in this pull request?

Add value length check in `_create_row`, forbid extra value for custom Row 
in PySpark.

## How was this patch tested?

New UT in pyspark-sql

Closes #22140 from xuanyuanking/SPARK-25072.

Lead-authored-by: liyuanjian 
Co-authored-by: Yuanjian Li 
Signed-off-by: Bryan Cutler 
(cherry picked from commit c84bc40d7f33c71eca1c08f122cd60517f34c1f8)
Signed-off-by: Bryan Cutler 

commit a7cfe5158f5c25ae5f774e1fb45d63a67a4bb89c
Author: xuejianbest <384329882@...>
Date:   2018-09-06T14:17:37Z

[SPARK-25108][SQL] Fix the show method to display the wide character 
alignment problem

This is not a perfect solution. It is designed to minimize complexity on 
the basis of solving problems.

It is effective for English, Chinese characters, Japanese, Korean and so on.

```scala
before:
+---+---+-+
|id |中国 |s2   |
+---+---+-+
|1  |ab |[a]  |
|2  |null   |[中国, abc]|
|3  |ab1|[hello world]|
|4  |か行 きゃ(kya) きゅ(kyu) きょ(kyo) |[“中国]|
|5  |中国(你好)a|[“中(国), 312] |
|6