[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99235/testReport)**
 for PR 23108 at commit 
[`e731746`](https://github.com/apache/spark/commit/e731746da4643d0283b3cd788d286aea62c96215).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5324/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5323/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99234/testReport)**
 for PR 23131 at commit 
[`515c04c`](https://github.com/apache/spark/commit/515c04c8833bd5b5683c7040e7d46d0b026255e6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingItera...

2018-11-24 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/23083#discussion_r236060023
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C](
 spills.clear()
 forceSpillFiles.foreach(s => s.file.delete())
 forceSpillFiles.clear()
-if (map != null || buffer != null) {
+if (map != null || buffer != null || readingIterator != null) {
   map = null // So that the memory can be garbage-collected
   buffer = null // So that the memory can be garbage-collected
+  readingIterator = null // So that the memory can be garbage-collected
--- End diff --

Nice. Case well explained.

But I think you need to add corresponding test cases for 
`CompletionIterator` and `ExternalSorter`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23132
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99232/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23132
  
**[Test build #99232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99232/testReport)**
 for PR 23132 at commit 
[`83920b2`](https://github.com/apache/spark/commit/83920b25f586dc242841ff0a73105ae9e43295ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99231/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99231/testReport)**
 for PR 23131 at commit 
[`170262b`](https://github.com/apache/spark/commit/170262b9c16c2b8901b1ec65e7c98a25a7eef077).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-24 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/23083
  
Hi @cloud-fan 

> Looking at the code, we are trying to fix 2 memory leaks: the task 
completion listener in ShuffleBlockFetcherIterator, and the CompletionIterator. 
If that's case, can you say that in the PR description?

I've updated the description and the title of this PR correspondingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23131#discussion_r236057261
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -341,8 +341,6 @@ displayTitle: Spark SQL Upgrading Guide
APIs. Instead, `DataFrame` remains the primary programming abstraction, 
which is analogous to the
single-node data frame notion in these languages.
 
- - Dataset and DataFrame API `unionAll` has been deprecated and replaced 
by `union`
--- End diff --

That's my fault for making this suggestion. Yeah maybe best to leave this 
statement, and add a note here or the the 3.0 migration guide that it has been 
subsequently un-deprecated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23133
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23133
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5322/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23133
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5322/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23131#discussion_r236057105
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -341,8 +341,6 @@ displayTitle: Spark SQL Upgrading Guide
APIs. Instead, `DataFrame` remains the primary programming abstraction, 
which is analogous to the
single-node data frame notion in these languages.
 
- - Dataset and DataFrame API `unionAll` has been deprecated and replaced 
by `union`
--- End diff --

Ur, we cannot change the history. Until Spark 2.4.0, we are showing the 
deprecation warning.
```scala
scala> spark.version
res2: String = 2.4.0

scala> df.unionAll(df2)
:28: warning: method unionAll in class Dataset is deprecated: use 
union()
   df.unionAll(df2)
  ^
```
Shall we keep the history in this specific migration doc, `Upgrading From 
Spark SQL 1.6 to 2.0`, and add some comment it's added back in 3.0.0 instead?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-24 Thread szhem
Github user szhem commented on a diff in the pull request:

https://github.com/apache/spark/pull/23083#discussion_r236057101
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C](
 spills.clear()
 forceSpillFiles.foreach(s => s.file.delete())
 forceSpillFiles.clear()
-if (map != null || buffer != null) {
+if (map != null || buffer != null || readingIterator != null) {
   map = null // So that the memory can be garbage-collected
   buffer = null // So that the memory can be garbage-collected
+  readingIterator = null // So that the memory can be garbage-collected
--- End diff --

@advancedxy I've tried to remove all the modifications except for this one 
and got OutOfMemoryErrors once again. Here are the details:

1. Now there are 4 `ExternalSorter` remained 
2 of them are not closed ones ...

![1_readingiterator_isnull_nonclosed_externalsorter](https://user-images.githubusercontent.com/1523889/48973288-2218d180-f04d-11e8-9329-27b3edf33c48.png)
and 2 of them are closed ones ...

![2_readingiterator_isnull_closed_externalsorter](https://user-images.githubusercontent.com/1523889/48973295-483e7180-f04d-11e8-83cf-23361515363f.png)
as expected
2. There are 2 `SpillableIterator`s (which consume a significant part of 
memory) of already closed `ExternalSorter`s remained

![4_readingiterator_isnull_spillableiterator_of_closed_externalsorter](https://user-images.githubusercontent.com/1523889/48973318-cf8be500-f04d-11e8-912f-74be7420ca95.png)
3. These `SpillableIterator`s are referenced by `CompletionIterator`s ...

![6_completioniterator_of_blockstoreshufflereader](https://user-images.githubusercontent.com/1523889/48973357-a6b81f80-f04e-11e8-810f-dc8941430f34.png)
... which in their order seem to be referenced by the `cur` field ...

![7_coalescedrdd_compute_flatmap](https://user-images.githubusercontent.com/1523889/48973491-7e7df000-f051-11e8-8864-7e9e7f3f994b.png)
... of the standard `Iterator`'s `flatMap` that is used in the `compute` 
method of `CoalescedRDD`

![image](https://user-images.githubusercontent.com/1523889/48973401-7fae1d80-f04f-11e8-8cf2-043c808173d9.png)

Standard `Iterator`'s `flatMap` does not clean up its `cur` field before 
obtaining the next value for it which in its order will consume quite a lot of 
memory too 

![image](https://user-images.githubusercontent.com/1523889/48973418-dfa4c400-f04f-11e8-8f0e-b464567d43de.png)
.. and in case of Spark that means that the previous iterator consuming the 
memory will live there while fetching the next value for it

![8_coalescedrdd_compute_flatmap_cur_isnotassigned](https://user-images.githubusercontent.com/1523889/48974089-0000-f05f-11e8-8319-f7d1f778f381.png)

So I've returned the changes made to the `CompletionIterator` to reassign 
the reference of its sub-iterator to the `empty` iterator ...

![image](https://user-images.githubusercontent.com/1523889/48973472-27781b00-f051-11e8-86e1-cd6b87cd114b.png)

... and that has helped. 

P.S. I believe that cleaning up the standard `flatMap`'s iterator `cur` 
field before calling `nextCur` could help too
```scala
  def flatMap[B](f: A => GenTraversableOnce[B]): Iterator[B] = new 
AbstractIterator[B] {
private var cur: Iterator[B] = empty
private def nextCur() { cur = f(self.next()).toIterator }
def hasNext: Boolean = {
  // Equivalent to cur.hasNext || self.hasNext && { nextCur(); hasNext }
  // but slightly shorter bytecode (better JVM inlining!)
  while (!cur.hasNext) {
cur = empty
if (!self.hasNext) return false
nextCur()
  }
  true
}
def next(): B = (if (hasNext) cur else empty).next()
  }
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23133: [MINOR][K8S] Invalid property "spark.driver.pod.n...

2018-11-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23133


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23133
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5322/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23133
  
Thank you, @Leemoonsoo . Merged to `master/branch-2.4`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23133
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23133
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99233/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23133
  
**[Test build #99233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99233/testReport)**
 for PR 23133 at commit 
[`d50b312`](https://github.com/apache/spark/commit/d50b31236bcf04d636362f6b19e318eca66bc01f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23133
  
**[Test build #99233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99233/testReport)**
 for PR 23133 at commit 
[`d50b312`](https://github.com/apache/spark/commit/d50b31236bcf04d636362f6b19e318eca66bc01f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99228/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23133
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99228/testReport)**
 for PR 23131 at commit 
[`133246d`](https://github.com/apache/spark/commit/133246d973eb516ebc12ba5bb49cd30ba4f108f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23133
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23133
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23133
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23133: [MINOR][K8S] Invalid property "spark.driver.pod.n...

2018-11-24 Thread Leemoonsoo
GitHub user Leemoonsoo opened a pull request:

https://github.com/apache/spark/pull/23133

[MINOR][K8S] Invalid property "spark.driver.pod.name" is referenced in docs.

## What changes were proposed in this pull request?

"Running on Kubernetes" references `spark.driver.pod.name` few places, and 
it should be `spark.kubernetes.driver.pod.name`.

## How was this patch tested?
See changes


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Leemoonsoo/spark fix-driver-pod-name-prop

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23133


commit d50b31236bcf04d636362f6b19e318eca66bc01f
Author: Lee moon soo 
Date:   2018-11-24T23:28:54Z

spark.driver.pod.name -> spark.kubernetes.driver.pod.name in doc




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23132
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5321/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23132
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23132
  
**[Test build #99232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99232/testReport)**
 for PR 23132 at commit 
[`83920b2`](https://github.com/apache/spark/commit/83920b25f586dc242841ff0a73105ae9e43295ed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23132: [SPARK-26163][SQL] Parsing decimals from JSON usi...

2018-11-24 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/23132

[SPARK-26163][SQL] Parsing decimals from JSON using locale

## What changes were proposed in this pull request?

In the PR, I propose using of the locale option to parse (and infer) 
decimals from JSON input. After the changes, `JacksonParser` converts input 
string to `BigDecimal` and to Spark's Decimal by using 
`java.text.DecimalFormat`. New behaviour can be switched off via SQL config 
`spark.sql.legacy.decimalParsing.enabled`.

## How was this patch tested?

Added 2 tests to `JsonExpressionsSuite` for the `en-US`, `ko-KR`, `ru-RU`, 
`de-DE` locales: 
- Inferring decimal type using locale from JSON field values
- Converting JSON field values to specified decimal type using the locales. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 json-decimal-parsing-locale

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23132


commit 506417ed4b643298560a66c043f7b31beb489da3
Author: Maxim Gekk 
Date:   2018-11-10T19:49:12Z

Test for parsing decimals using locale

commit ac25fb6ed1d3d6689ad8841476c025848c87f2a3
Author: Maxim Gekk 
Date:   2018-11-10T19:51:48Z

Parsing decimals using locale

commit b784003078270a46aaf8aceb2d86dd9f13f3500c
Author: Maxim Gekk 
Date:   2018-11-10T19:54:00Z

Updating the migration guide

commit d0522093dfe1823f91100cbec3ef5d6c8a372f27
Author: Maxim Gekk 
Date:   2018-11-24T19:32:45Z

Merge remote-tracking branch 'origin/master' into 
json-decimal-parsing-locale

# Conflicts:
#   
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala

commit 722e135cf3b94e22079fd9a0fe1d309a04e76a64
Author: Maxim Gekk 
Date:   2018-11-24T19:58:39Z

Add SQL config spark.sql.legacy.decimalParsing.enabled

commit dc6c0ac0a97b31441d99ff1dd71608ae5e2eca73
Author: Maxim Gekk 
Date:   2018-11-24T20:00:31Z

Updating the migration guide

commit f15b1817fb51d453487665122473855712214692
Author: Maxim Gekk 
Date:   2018-11-24T20:38:24Z

Added a test for parsing

commit ab781d54e4f7604d64c72c3c383c549abab0a9a9
Author: Maxim Gekk 
Date:   2018-11-24T20:52:03Z

Fix test

commit 163a8b9d7d017409ae4dfa40e492680bf0e4f935
Author: Maxim Gekk 
Date:   2018-11-24T20:52:26Z

Create getDecimalParser

commit 8fb65c0db85f4bd2f76d473c5e31e772ff0d4c1d
Author: Maxim Gekk 
Date:   2018-11-24T22:11:42Z

Add a test for inferring decimals

commit 7e3a2906a96894cadc58771131d07d06ba265382
Author: Maxim Gekk 
Date:   2018-11-24T22:12:35Z

Change JsonSuite to adopt it for JsonInferSchema class

commit 83920b25f586dc242841ff0a73105ae9e43295ed
Author: Maxim Gekk 
Date:   2018-11-24T22:13:01Z

Inferring decimals from JSON




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5320/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99231/testReport)**
 for PR 23131 at commit 
[`170262b`](https://github.com/apache/spark/commit/170262b9c16c2b8901b1ec65e7c98a25a7eef077).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23128
  
**[Test build #4440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4440/testReport)**
 for PR 23128 at commit 
[`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5319/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99230/testReport)**
 for PR 23131 at commit 
[`f0dfe7b`](https://github.com/apache/spark/commit/f0dfe7ba56daee34a37ba727ac76e29325b7e995).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99230/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99230/testReport)**
 for PR 23131 at commit 
[`f0dfe7b`](https://github.com/apache/spark/commit/f0dfe7ba56daee34a37ba727ac76e29325b7e995).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99229/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99229/testReport)**
 for PR 23131 at commit 
[`12dfd77`](https://github.com/apache/spark/commit/12dfd77c665f38b450e4dc3e48a32bf651a3179e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99229 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99229/testReport)**
 for PR 23131 at commit 
[`12dfd77`](https://github.com/apache/spark/commit/12dfd77c665f38b450e4dc3e48a32bf651a3179e).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23131#discussion_r236054278
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1852,6 +1852,19 @@ class Dataset[T] private[sql](
 CombineUnions(Union(logicalPlan, other.logicalPlan))
   }
 
+  /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23130
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99227/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23130
  
**[Test build #99227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99227/testReport)**
 for PR 23130 at commit 
[`b200a50`](https://github.com/apache/spark/commit/b200a50de58641d297381bec45687317bd21dfb7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23113: [SPARK-26019][PYTHON] Fix race condition in accumulators...

2018-11-24 Thread Tagar
Github user Tagar commented on the issue:

https://github.com/apache/spark/pull/23113
  
@HyukjinKwon  This is somehow related to the fact that it is reproducible 
when PySpark was created using an existing py4j gateway (for example, it 
reproduces from within Zeppelin)


https://github.com/apache/spark/blob/f83fedc9f20869ab4c62bb07bac50113d921207f/python/pyspark/context.py#L99
 

In case of Zeppelin, it creates its own Gateway 


https://github.com/apache/zeppelin/blob/adf83a3f5d009efd0e0416a7bf1d8b6fd86ea58a/spark/interpreter/src/main/resources/python/zeppelin_ipyspark.py#L33

And if `PY4J_GATEWAY_SECRET` environment variable wasn't set, it creates 
Gateway without `auth_token` set:
`gateway = JavaGateway(GatewayClient(address="${JVM_GATEWAY_ADDRESS}", 
port=${JVM_GATEWAY_PORT}), auto_convert=True)`

py4j allows working without auth tokens :

https://github.com/bartdag/py4j/blob/079fcf11ec18058af0b356f78974125eb7384711/py4j-python/src/py4j/java_gateway.py#L774

As I mentioned in SPARK-26019, it's only first request that gets this error 

```
  File "../lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 
263, in handle
poll(authenticate_and_accum_updates)
  File "../lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 
238, in poll
if func():
  File "../lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 
251, in authenticate_and_accum_updates
received_token = self.rfile.read(len(auth_token))
TypeError: object of type 'NoneType' has no len()
 
```

Rerunning same request works later on fine. 

Is it because auth_token gets set? Or just because there are no accumulator 
update requests for following requests? I don't know internals of Spark that 
well to answer that, but hopefully this provides some more context to help 
assess this problem and why this PR fixes it.

Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5318/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23131
  
**[Test build #99228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99228/testReport)**
 for PR 23131 at commit 
[`133246d`](https://github.com/apache/spark/commit/133246d973eb516ebc12ba5bb49cd30ba4f108f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23131#discussion_r236052557
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1852,6 +1852,19 @@ class Dataset[T] private[sql](
 CombineUnions(Union(logicalPlan, other.logicalPlan))
   }
 
+  /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
--- End diff --

say that this is an alias of union.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23131
  
cc @rxin @srowen @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/23131

[SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

## What changes were proposed in this pull request?
This PR is to add back `unionAll`, which is widely used. The name is also 
consistent with our ANSI SQL. We also have the corresponding `IntersectAll` and 
`exceptAll`, which were introduced in Spark 2.4.

## How was this patch tested?
Added a test case in DataFrameSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark addBackUnionAll

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23131


commit 133246d973eb516ebc12ba5bb49cd30ba4f108f9
Author: gatorsmile 
Date:   2018-11-24T20:04:52Z

Add back unionAll




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23130
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23130
  
@cloud-fan @HyukjinKwon Please, take a look at the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23130
  
**[Test build #99227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99227/testReport)**
 for PR 23130 at commit 
[`b200a50`](https://github.com/apache/spark/commit/b200a50de58641d297381bec45687317bd21dfb7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22938: [SPARK-25935][SQL] Prevent null rows from JSON pa...

2018-11-24 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22938#discussion_r236049216
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1892,7 +1898,7 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 .text(path)
 
   val jsonDF = spark.read.option("multiLine", true).option("mode", 
"PERMISSIVE").json(path)
-  assert(jsonDF.count() === corruptRecordCount)
+  assert(jsonDF.count() === corruptRecordCount + 1) // null row for 
empty file
--- End diff --

@cloud-fan @HyukjinKwon Here is a PR 
https://github.com/apache/spark/pull/23130 which does this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5317/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23130
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-24 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/23130

[SPARK-26161][SQL] Ignore empty files in load

## What changes were proposed in this pull request?

In the PR, I propose filtering out all empty files inside of 
`DataSourceScanExec` and exclude them from file splits. It should reduce 
overhead of opening and reading files without any data, and as consequence 
datasources will not produce empty partitions for such files.

## How was this patch tested?

Added a test which creates an empty and non-empty files. If empty files are 
ignored in load, Text datasource in the `wholetext` mode must create only one 
partition for non-empty file.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 ignore-empty-files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23130.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23130


commit 36c64f0a8c47763dd99f0ed07cdf5a1813b0f2b7
Author: Maxim Gekk 
Date:   2018-11-24T17:29:25Z

Test checks empty files are not loaded

commit e428b83c82623f33e5b5e5073251d73d18a3c903
Author: Maxim Gekk 
Date:   2018-11-24T17:29:41Z

Fix json tests

commit 4212add0bcbcaaec5ee8e5483ea16d3e7929dcb6
Author: Maxim Gekk 
Date:   2018-11-24T17:34:24Z

Fix test

commit 2458882037349987396e8456799c451e80566442
Author: Maxim Gekk 
Date:   2018-11-24T17:36:06Z

Filtering out empty files

commit b200a50de58641d297381bec45687317bd21dfb7
Author: Maxim Gekk 
Date:   2018-11-24T17:53:45Z

Fix imports




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23128
  
**[Test build #4440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4440/testReport)**
 for PR 23128 at commit 
[`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is ...

2018-11-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22779


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22683
  
Yeah, there are going to be several more tests that fail because they are 
expecting a string like 'KB'. Hopefully easy to fix.

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99226/testReport/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23125: [SPARK-26156][WebUI] Revise summary section of st...

2018-11-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23125


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23125: [SPARK-26156][WebUI] Revise summary section of stage pag...

2018-11-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23125
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...

2018-11-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22779
  
Merged to master, but I'm also going to try to back port to 2.4 and 2.3 as 
a bug fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23129: [MINOR] Update all DOI links to preferred resolver

2018-11-24 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23129
  
Jenkins, test this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22779
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99224/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22779
  
**[Test build #99224 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99224/testReport)**
 for PR 22779 at commit 
[`55db355`](https://github.com/apache/spark/commit/55db355fc4cbb6d036ec45895c5869690165e706).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99226/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99226 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99226/testReport)**
 for PR 22683 at commit 
[`3bf6ca5`](https://github.com/apache/spark/commit/3bf6ca58904f4f1d363e8505bd9d14e5aad0ebd7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99225/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23128
  
**[Test build #99225 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99225/testReport)**
 for PR 23128 at commit 
[`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22683: [SPARK-25696] The storage memory displayed on spa...

2018-11-24 Thread httfighter
GitHub user httfighter reopened a pull request:

https://github.com/apache/spark/pull/22683

[SPARK-25696] The storage memory displayed on spark Application UI is…

… incorrect.

## What changes were proposed in this pull request?
In the reported heartbeat information, the unit of the memory data is 
bytes, which is converted by the formatBytes() function in the utils.js file 
before being displayed in the interface. The cardinality of the unit conversion 
in the formatBytes function is 1000, which should be 1024. 
Change the cardinality of the unit conversion in the formatBytes function 
to 1024.

## How was this patch tested?
 manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/httfighter/spark SPARK-25696

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22683


commit 9e45697296039e55e85dd204788e287c9c60fceb
Author: 韩田田00222924 
Date:   2018-10-10T06:47:36Z

[SPARK-25696] The storage memory displayed on spark Application UI is 
incorrect.

commit 3bf6ca58904f4f1d363e8505bd9d14e5aad0ebd7
Author: 韩田田00222924 
Date:   2018-11-24T08:53:12Z

Supplement the modification of the memory unit displayed on the UI




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-24 Thread httfighter
Github user httfighter commented on the issue:

https://github.com/apache/spark/pull/22683
  
@srowen@ajbozarth I have added the changes, could you help me review the 
code? Thank you very much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22683: [SPARK-25696] The storage memory displayed on spa...

2018-11-24 Thread httfighter
Github user httfighter closed the pull request at:

https://github.com/apache/spark/pull/22683


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99226/testReport)**
 for PR 22683 at commit 
[`3bf6ca5`](https://github.com/apache/spark/commit/3bf6ca58904f4f1d363e8505bd9d14e5aad0ebd7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5316/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

2018-11-24 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23105#discussion_r236036258
  
--- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+/**
+ * An interface for reporting shuffle read metrics, for each shuffle. This 
interface assumes
+ * all the methods are called on a single-threaded, i.e. concrete 
implementations would not need
+ * to synchronize.
+ *
+ * All methods have additional Spark visibility modifier to allow public, 
concrete implementations
+ * that still have these methods marked as private[spark].
+ */
+private[spark] trait ShuffleReadMetricsReporter {
--- End diff --

https://github.com/apache/spark/pull/23128 :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23128
  
**[Test build #99225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99225/testReport)**
 for PR 23128 at commit 
[`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23129: Hyperlink DOIs to preferred resolver

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23129
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...

2018-11-24 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/23128
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23129: Hyperlink DOIs to preferred resolver

2018-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23129
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >