date:20180524

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21246
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91095/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21391
  
**[Test build #91093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91093/testReport)**
 for PR 21391 at commit 
[`d9a440a`](https://github.com/apache/spark/commit/d9a440a9814913827fcfcff644c741a43332b02d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21391
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91093/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21391
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21422: [Spark-24376][doc]Summary:compiling spark with sc...

2018-05-24 Thread gentlewangyu

Github user gentlewangyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21422#discussion_r190541918
  
--- Diff: docs/building-spark.md ---
@@ -92,10 +92,10 @@ like ZooKeeper and Hadoop itself.
 ./build/mvn -Pmesos -DskipTests clean package
 
 ## Building for Scala 2.10
-To produce a Spark package compiled with Scala 2.10, use the 
`-Dscala-2.10` property:
+To produce a Spark package compiled with Scala 2.10, use the 
`-Pscala-2.10` property:
 
 ./dev/change-scala-version.sh 2.10
-./build/mvn -Pyarn -Dscala-2.10 -DskipTests clean package
+./build/mvn -Pyarn -scala-2.10 -DskipTestsP clean package
--- End diff --

sorry , It's -Pscala-2.10


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-24 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/21369#discussion_r190542635
  
--- Diff: 
core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala
 ---
@@ -414,6 +415,99 @@ class ExternalAppendOnlyMapSuite extends SparkFunSuite 
with LocalSparkContext {
 sc.stop()
   }
 
+  test("spill during iteration") {
+val size = 1000
+val conf = createSparkConf(loadDefaults = true)
+sc = new SparkContext("local-cluster[1,1,1024]", "test", conf)
+val map = createExternalMap[Int]
+
+map.insertAll((0 until size).iterator.map(i => (i / 10, i)))
+assert(map.numSpills == 0, "map was not supposed to spill")
+
+val it = map.iterator
+assert( it.isInstanceOf[CompletionIterator[_, _]])
+val underlyingIt = map.readingIterator
+assert( underlyingIt != null )
+val underlyingMapIterator = underlyingIt.upstream
+assert(underlyingMapIterator != null)
+val underlyingMapIteratorClass = underlyingMapIterator.getClass
+assert(underlyingMapIteratorClass.getEnclosingClass == 
classOf[AppendOnlyMap[_, _]])
+
+val underlyingMap = map.currentMap
+assert(underlyingMap != null)
+
+val first50Keys = for ( _ <- 0 until 50) yield {
+  val (k, vs) = it.next
+  val sortedVs = vs.sorted
+  assert(sortedVs.seq == (0 until 10).map(10 * k + _))
+  k
+}
+assert( map.numSpills == 0 )
+map.spill(Long.MaxValue, null)
+// these asserts try to show that we're no longer holding references 
to the underlying map.
+// it'd be nice to use something like
+// 
https://github.com/scala/scala/blob/2.13.x/test/junit/scala/tools/testing/AssertUtil.scala
+// (lines 69-89)
+assert(map.currentMap == null)
+assert(underlyingIt.upstream ne underlyingMapIterator)
+assert(underlyingIt.upstream.getClass != underlyingMapIteratorClass)
+assert(underlyingIt.upstream.getClass.getEnclosingClass != 
classOf[AppendOnlyMap[_, _]])
--- End diff --

hmm, we can in line 508 but not in this test.
in this test we look at the iterator immediately after a spill, at this 
point upstream is supposed to be replaced by a `DiskMapIterator`, I guess we 
can check for this directly (after relaxing its visibility to package private).

in line 508, we can simply compare with Iterator.empty


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21420
  
**[Test build #91090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91090/testReport)**
 for PR 21420 at commit 
[`a41c99b`](https://github.com/apache/spark/commit/a41c99bf311aa8f4e0c2e07c1288f5a11e057ea4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91090/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21246
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21246
  
**[Test build #91100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91100/testReport)**
 for PR 21246 at commit 
[`6fd8f2f`](https://github.com/apache/spark/commit/6fd8f2fbd37e5193f0ffb1a25a8f4a8c71ab55bd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21246
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21246
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3548/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21369
  
**[Test build #91101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91101/testReport)**
 for PR 21369 at commit 
[`bc7dc11`](https://github.com/apache/spark/commit/bc7dc11383db8370f755a058f4b908588f93edc8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19602
  
@cloud-fan
Thanks a lot for looking into this.
I updated the change and generalized `ExtractAttribute`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21369
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3549/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19560
  
**[Test build #91092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91092/testReport)**
 for PR 19560 at commit 
[`78b34bd`](https://github.com/apache/spark/commit/78b34bd7b79550b23730e1c9cdf06620e52b66f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21369
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19560
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91092/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19560
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21391
  
**[Test build #91102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91102/testReport)**
 for PR 21391 at commit 
[`8967660`](https://github.com/apache/spark/commit/896766016e9576f1eb70cea62d38bf2ed897b1d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #91103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91103/testReport)**
 for PR 19602 at commit 
[`76676c1`](https://github.com/apache/spark/commit/76676c1982adc9a73c3c5c41c6ddaf50332d4240).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3550/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21423: [SPARK-24378][SQL] Fix date_trunc function incorr...

2018-05-24 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/21423

[SPARK-24378][SQL] Fix date_trunc function incorrect examples

## What changes were proposed in this pull request?

Fix `date_trunc` function incorrect examples.

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-24378

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21423.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21423


commit b8b0c9dd21bbb4a5d29174d778165a2bd72403e5
Author: Yuming Wang 
Date:   2018-05-24T11:46:28Z

Fix incorrect examples




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21423
  
**[Test build #91104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91104/testReport)**
 for PR 21423 at commit 
[`b8b0c9d`](https://github.com/apache/spark/commit/b8b0c9dd21bbb4a5d29174d778165a2bd72403e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-24 Thread jinxing64

GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/21424

[SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and 
re-throw SparkFatalException, which wraps SparkOutOfMemory inside.

## What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/20014, Spark won't fails the 
entire executor but only fails the task suffering `SparkOutOfMemoryError`. 
After https://github.com/apache/spark/pull/21342,  `BroadcastExchangeExec` 
try-catch `OutOfMemoryError`. Think about below scenario:

1. `SparkOutOfMemoryError`(subclass of `OutOfMemoryError`) is thrown in 
`scala.concurrent.Future`;
2. `SparkOutOfMemoryError` is caught and an `OutOfMemoryError` is wrapped 
in `SparkFatalException` and re-thrown;
3. `ThreadUtils.awaitResult` catches `SparkFatalException` and a 
`OutOfMemoryError` is thrown;
4. The `OutOfMemoryErro`r will go to 
`SparkUncaughtExceptionHandler.uncaughtException` and Executor fails.
So it makes more sense to catch `SparkOutOfMemory` and re-throw 
`SparkFatalException`, which wraps `SparkOutOfMemory` inside.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-24379

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21424


commit aa10470b7b09a100ee80afedb29b24548fbe5512
Author: jinxing 
Date:   2018-05-24T11:51:40Z

[SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and 
re-throw SparkFatalException, which wraps SparkOutOfMemory inside.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21423
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21423
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3551/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-24 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/21424
  
cc @cloud-fan @JoshRosen 
Would you please help take a look at this when you have time ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21424
  
**[Test build #91105 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91105/testReport)**
 for PR 21424 at commit 
[`aa10470`](https://github.com/apache/spark/commit/aa10470b7b09a100ee80afedb29b24548fbe5512).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3552/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21424
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21420
  
**[Test build #91106 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91106/testReport)**
 for PR 21420 at commit 
[`c8521cc`](https://github.com/apache/spark/commit/c8521cc0de9de2e113a72e8379272b6fd009279a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3553/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21415
  
**[Test build #91096 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91096/testReport)**
 for PR 21415 at commit 
[`0aef16b`](https://github.com/apache/spark/commit/0aef16b5e9017fb398e0df2f3694a1db1f4d7cb8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21295
  
**[Test build #91097 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91097/testReport)**
 for PR 21295 at commit 
[`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21415
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91097/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91096/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-24 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21390
  
YARN will clean container local dirs when container (executor) is exited, 
so this may not be a problem in YARN.

YARN has a useful configuration "yarn.nodemanager.delete.debug-delay-sec" 
to delay the container dir cleanup for a specified time, which is quite useful 
for debug. Maybe we can add a similar config here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-24 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21390#discussion_r190571272
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -97,6 +99,10 @@ private[deploy] class Worker(
   private val APP_DATA_RETENTION_SECONDS =
 conf.getLong("spark.worker.cleanup.appDataTtl", 7 * 24 * 3600)
 
+  // Whether or not cleanup the non-shuffle files on executor death.
+  private val CLEANUP_NON_SHUFFLE_FILES_ENABLED =
+conf.getBoolean("spark.storage.cleanupFilesAfterExecutorDeath", true)
--- End diff --

Shall we rename this config to 
"spark.storage.cleanupFilesAfterExecutorExit"? Seems from the code that normal 
executor exit (dynamic allocation) will also trigger the cleanup, this config 
may be a little misleading. Please correct me if I'm wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18304: [SPARK-21098] Set lineseparator csv multiline and csv wr...

2018-05-24 Thread cse68197

Github user cse68197 commented on the issue:

https://github.com/apache/spark/pull/18304
  
Could you please validate that is this has been fixed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #91103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91103/testReport)**
 for PR 19602 at commit 
[`76676c1`](https://github.com/apache/spark/commit/76676c1982adc9a73c3c5c41c6ddaf50332d4240).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91103/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...

2018-05-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21295


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21295
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18304: [SPARK-21098] Set lineseparator csv multiline and csv wr...

2018-05-24 Thread cse68197

Github user cse68197 commented on the issue:

https://github.com/apache/spark/pull/18304
  
I am writing data to a file like below-
allDF.rdd.map(rec => 
rec.mkString("|")).repartition(1).saveAsTextFile("location for file")

but when I opening that file in notepad, that is opening in  single line 
but the same file is opening fine in notepad++ and I can see all the data in 
new lines.

I tried with below options (one by one) before saving as well but those 
also not worked.
spark.conf.set("textinputformat.record.delimeter","\r\n")
spark.conf.set("textinputformat.record.delimeter","\n")

So could you please help me to understand the any alternative way to fix it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21424#discussion_r190577287
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -115,9 +116,9 @@ case class BroadcastExchangeExec(
   // SPARK-24294: To bypass scala bug: 
https://github.com/scala/bug/issues/9554, we throw
   // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
   // will catch this exception and re-throw the wrapped fatal 
throwable.
-  case oe: OutOfMemoryError =>
+  case oe: SparkOutOfMemoryError =>
 throw new SparkFatalException(
-  new OutOfMemoryError(s"Not enough memory to build and 
broadcast the table to " +
+  new SparkOutOfMemoryError(s"Not enough memory to build and 
broadcast the table to " +
--- End diff --

since we fully control the creation of `SparkOutOfMemoryError`, can we move 
the error message to where we throw `SparkOutOfMemoryError` when building hash 
relation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-24 Thread andrusha

Github user andrusha commented on a diff in the pull request:

https://github.com/apache/spark/pull/21260#discussion_r190577601
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Extract Spark volume configuration properties with a given name 
prefix.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseVolumesWithPrefix(
--- End diff --

Tests are missing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21331
  
**[Test build #91099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91099/testReport)**
 for PR 21331 at commit 
[`a0af525`](https://github.com/apache/spark/commit/a0af52524e30a9ace9d9a6239de79a7251a2499c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21331
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91099/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19602#discussion_r190579088
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -53,7 +52,7 @@ class HiveClientSuite(version: String)
   for {
 ds <- 20170101 to 20170103
 h <- 0 to 23
-chunk <- Seq("aa", "ab", "ba", "bb")
+chunk <- Seq("11", "12", "21", "22")
--- End diff --

The first point looks fine, for the second one, can we generate new data 
for your new test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19602#discussion_r190579351
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 
 val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
 
+object ExtractAttribute {
+  def unapply(expr: Expression): Option[Attribute] = {
+expr match {
+  case attr: Attribute => Some(attr)
+  case cast @ Cast(child, dt: StringType, _) if 
child.dataType.isInstanceOf[NumericType] =>
+unapply(child)
+  case cast @ Cast(child, dt: NumericType, _) if child.dataType == 
StringType =>
--- End diff --

I don't think this is safe. It assumes spark and hive has the same behavior 
when converting invalid string to numbers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21369#discussion_r190583904
  
--- Diff: 
core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala
 ---
@@ -414,7 +415,106 @@ class ExternalAppendOnlyMapSuite extends 
SparkFunSuite with LocalSparkContext {
 sc.stop()
   }
 
-  test("external aggregation updates peak execution memory") {
+  test("SPARK-22713 spill during iteration leaks internal map") {
+val size = 1000
+val conf = createSparkConf(loadDefaults = true)
+sc = new SparkContext("local-cluster[1,1,1024]", "test", conf)
+val map = createExternalMap[Int]
+
+map.insertAll((0 until size).iterator.map(i => (i / 10, i)))
+assert(map.numSpills == 0, "map was not supposed to spill")
+
+val it = map.iterator
+assert(it.isInstanceOf[CompletionIterator[_, _]])
+val underlyingIt = map.readingIterator
+assert( underlyingIt != null )
--- End diff --

`assert(underlyingIt != null)`, we should not put space around. can you fix 
all of them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21369#discussion_r190584765
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
 ---
@@ -585,17 +591,25 @@ class ExternalAppendOnlyMap[K, V, C](
   } else {
 logInfo(s"Task ${context.taskAttemptId} force spilling in-memory 
map to disk and " +
   s"it will release 
${org.apache.spark.util.Utils.bytesToString(getUsed())} memory")
-nextUpstream = spillMemoryIteratorToDisk(upstream)
+val nextUpstream = spillMemoryIteratorToDisk(upstream)
+assert(!upstream.hasNext)
 hasSpilled = true
+upstream = nextUpstream
 true
   }
 }
 
+private def destroy() : Unit = {
+  freeCurrentMap()
+  upstream = Iterator.empty
+}
+
+private[ExternalAppendOnlyMap]
--- End diff --

It's weird to see a class private method. I'd suggest just remove 
`private[ExternalAppendOnlyMap]`. `spill` is only called in 
`ExternalAppendOnlyMap` and it's public.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21383
  
**[Test build #91107 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91107/testReport)**
 for PR 21383 at commit 
[`f0f80ed`](https://github.com/apache/spark/commit/f0f80ed1b8333bbab841a59f151deff18bc73447).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21383
  
**[Test build #91107 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91107/testReport)**
 for PR 21383 at commit 
[`f0f80ed`](https://github.com/apache/spark/commit/f0f80ed1b8333bbab841a59f151deff18bc73447).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21383
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91107/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21383
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21383
  
**[Test build #91108 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91108/testReport)**
 for PR 21383 at commit 
[`d59f0d5`](https://github.com/apache/spark/commit/d59f0d5a2735713bb7e218cfcda2b494edfcf522).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21369
  
**[Test build #91109 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91109/testReport)**
 for PR 21369 at commit 
[`807032d`](https://github.com/apache/spark/commit/807032dcded2d7ec9b879176b7c5116df0f424ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21369
  
the patch LGTM, but I'm not sure the test is useful. it's too coupled with 
the implementation and if we have reference leak again, I don't think the test 
can help to detect it.

Can we copy-paste 
https://github.com/scala/scala/blob/2.13.x/test/junit/scala/tools/testing/AssertUtil.scala#L69-L90
 to the test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21369
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21369
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3554/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21383
  
**[Test build #91108 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91108/testReport)**
 for PR 21383 at commit 
[`d59f0d5`](https://github.com/apache/spark/commit/d59f0d5a2735713bb7e218cfcda2b494edfcf522).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21383
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21383
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91108/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190598227
  
--- Diff: python/pyspark/util.py ---
@@ -89,6 +89,23 @@ def majorMinorVersion(sparkVersion):
  " version numbers.")
 
 
+def fail_on_stopiteration(f):
+"""
+Wraps the input function to fail on 'StopIteration' by raising a 
'RuntimeError'
+prevents silent loss of data when 'f' is used in a for loop
+"""
--- End diff --

```
"""
Wraps the input function to fail on 'StopIteration' by raising a 
'RuntimeError'
prevents silent loss of data when 'f' is used in a for loop
"""
```

per PEP 8


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190598641
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,25 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def stopit(*x):
+raise StopIteration()
+
+seq_rdd = self.sc.parallelize(range(10))
+keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10))
+exc = Py4JJavaError, RuntimeError
--- End diff --

Hm .. can we just check one of explicit exception if it's not hard? 
Py4JJavaError or RuntimeError somehow sounds a bit two arbitrary exceptions ...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-24 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21394
  
@HyukjinKwon  @gengliangwang @maropu Please, look at the PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21415
  
jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21415
  
**[Test build #91110 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91110/testReport)**
 for PR 21415 at commit 
[`0aef16b`](https://github.com/apache/spark/commit/0aef16b5e9017fb398e0df2f3694a1db1f4d7cb8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21394
  
Sounds reasonable for now. LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21394
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread e-dorigatti

Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190567010
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,31 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def a_rdd(keyed=False):
+return self.sc.parallelize(
+((x % 2, x) if keyed else x)
+for x in range(10)
+)
+
+def stopit(*x):
+raise StopIteration()
+
+def do_test(action, *args, **kwargs):
+with self.assertRaises((Py4JJavaError, RuntimeError)) as cm:
--- End diff --

Can you clarify?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread e-dorigatti

Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190603773
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,25 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def stopit(*x):
+raise StopIteration()
+
+seq_rdd = self.sc.parallelize(range(10))
+keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10))
+exc = Py4JJavaError, RuntimeError
--- End diff --

Both of them can happen, depending on where the `StopIteration` is raised. 
Consider for example `RDD.reduce`: if the exception is raised when reducing 
inside a partition, the user will get a `Py4JJavaError`, but if the error is 
raised when reducing locally the results 
[here](https://github.com/e-dorigatti/spark/blob/fix_spark_23754/python/pyspark/rdd.py#L858),
 it will be a `RuntimeError` (the one we raise in `fail_on_stopiteration`)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

2018-05-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21380


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21394: [SPARK-24329][SQL] Test for skipping multi-space ...

2018-05-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21394


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190605953
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,25 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def stopit(*x):
+raise StopIteration()
+
+seq_rdd = self.sc.parallelize(range(10))
+keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10))
+exc = Py4JJavaError, RuntimeError
--- End diff --

Got it. Makes sense. Let's add a single comment while we are here if you 
don't mind. Seems few changes are needed anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21383
  
LGTM too if the tests pass.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190607843
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,25 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def stopit(*x):
+raise StopIteration()
+
+seq_rdd = self.sc.parallelize(range(10))
+keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10))
+exc = Py4JJavaError, RuntimeError
--- End diff --

Wait .. just for clarification, one of both exception can be arbitrarily 
raised for each execution?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...

2018-05-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21423
  
> How was this patch tested?

I believe you manually tested though :-).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21246
  
**[Test build #91100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91100/testReport)**
 for PR 21246 at commit 
[`6fd8f2f`](https://github.com/apache/spark/commit/6fd8f2fbd37e5193f0ffb1a25a8f4a8c71ab55bd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...

2018-05-24 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21410
  
> Is there a way to identify where in the schema the issue is occurring?

We can catch the exceptions on each level of schema tree traversal, and 
show sub-trees in each catch. For example: `array>>>` , the first exception will point out `struct`, 
the second one `array>` and up to the "root" schema. 

> e.g., a.b.c where this is happening, is required to easily isolate the 
issue in the input data and resolve it.

I guess in the case of arrays and maps, you want to see indexes and keys. 
Could you provide concrete example with values and a schema (array, struct, 
map), and what kind of info the error should contain.

Just in case, I would propose to make such improvements in a separate PR.   


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21246
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21246
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91100/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...

2018-05-24 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21410
  
@gatorsmile Could you look at the PR, please. The changes should help us in 
trouble shooting of customer's issues.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21391
  
**[Test build #91102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91102/testReport)**
 for PR 21391 at commit 
[`8967660`](https://github.com/apache/spark/commit/896766016e9576f1eb70cea62d38bf2ed897b1d0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21391
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21391
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91102/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21425: Add unit tests for NOT IN subquery around null va...

2018-05-24 Thread mgyucht

GitHub user mgyucht opened a pull request:

https://github.com/apache/spark/pull/21425

Add unit tests for NOT IN subquery around null values

## What changes were proposed in this pull request?
This PR adds several unit tests along the `cols NOT IN (subquery)` pathway. 
There are a scattering of tests here and there which cover this codepath, but 
there doesn't seem to be a unified unit test of the correctness of null-aware 
anti joins anywhere. I have also added a brief explanation of how this 
expression behaves in SubquerySuite. Lastly, I made some clarifying changes in 
the NOT IN pathway in RewritePredicateSubquery.

## How was this patch tested?
Added unit tests! There should be no behavioral change in this PR


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgyucht/spark-1 spark-24381

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21425


commit d6040ea0028754c7fe39ddcebb6bd027749acc4e
Author: Miles Yucht 
Date:   2018-05-24T15:16:37Z

Add tests, and small clean-up of the NOT IN pathway




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21369
  
**[Test build #91101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91101/testReport)**
 for PR 21369 at commit 
[`bc7dc11`](https://github.com/apache/spark/commit/bc7dc11383db8370f755a058f4b908588f93edc8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21369
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91101/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21369
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21425: Add unit tests for NOT IN subquery around null values

2018-05-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21425
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

101 - 200 of 516 matches

Mail list logo