[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21246 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91095/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21391 **[Test build #91093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91093/testReport)** for PR 21391 at commit [`d9a440a`](https://github.com/apache/spark/commit/d9a440a9814913827fcfcff644c741a43332b02d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21391 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91093/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21391 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21422: [Spark-24376][doc]Summary:compiling spark with sc...
Github user gentlewangyu commented on a diff in the pull request: https://github.com/apache/spark/pull/21422#discussion_r190541918 --- Diff: docs/building-spark.md --- @@ -92,10 +92,10 @@ like ZooKeeper and Hadoop itself. ./build/mvn -Pmesos -DskipTests clean package ## Building for Scala 2.10 -To produce a Spark package compiled with Scala 2.10, use the `-Dscala-2.10` property: +To produce a Spark package compiled with Scala 2.10, use the `-Pscala-2.10` property: ./dev/change-scala-version.sh 2.10 -./build/mvn -Pyarn -Dscala-2.10 -DskipTests clean package +./build/mvn -Pyarn -scala-2.10 -DskipTestsP clean package --- End diff -- sorry , It's -Pscala-2.10 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...
Github user eyalfa commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r190542635 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -414,6 +415,99 @@ class ExternalAppendOnlyMapSuite extends SparkFunSuite with LocalSparkContext { sc.stop() } + test("spill during iteration") { +val size = 1000 +val conf = createSparkConf(loadDefaults = true) +sc = new SparkContext("local-cluster[1,1,1024]", "test", conf) +val map = createExternalMap[Int] + +map.insertAll((0 until size).iterator.map(i => (i / 10, i))) +assert(map.numSpills == 0, "map was not supposed to spill") + +val it = map.iterator +assert( it.isInstanceOf[CompletionIterator[_, _]]) +val underlyingIt = map.readingIterator +assert( underlyingIt != null ) +val underlyingMapIterator = underlyingIt.upstream +assert(underlyingMapIterator != null) +val underlyingMapIteratorClass = underlyingMapIterator.getClass +assert(underlyingMapIteratorClass.getEnclosingClass == classOf[AppendOnlyMap[_, _]]) + +val underlyingMap = map.currentMap +assert(underlyingMap != null) + +val first50Keys = for ( _ <- 0 until 50) yield { + val (k, vs) = it.next + val sortedVs = vs.sorted + assert(sortedVs.seq == (0 until 10).map(10 * k + _)) + k +} +assert( map.numSpills == 0 ) +map.spill(Long.MaxValue, null) +// these asserts try to show that we're no longer holding references to the underlying map. +// it'd be nice to use something like +// https://github.com/scala/scala/blob/2.13.x/test/junit/scala/tools/testing/AssertUtil.scala +// (lines 69-89) +assert(map.currentMap == null) +assert(underlyingIt.upstream ne underlyingMapIterator) +assert(underlyingIt.upstream.getClass != underlyingMapIteratorClass) +assert(underlyingIt.upstream.getClass.getEnclosingClass != classOf[AppendOnlyMap[_, _]]) --- End diff -- hmm, we can in line 508 but not in this test. in this test we look at the iterator immediately after a spill, at this point upstream is supposed to be replaced by a `DiskMapIterator`, I guess we can check for this directly (after relaxing its visibility to package private). in line 508, we can simply compare with Iterator.empty --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21420 **[Test build #91090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91090/testReport)** for PR 21420 at commit [`a41c99b`](https://github.com/apache/spark/commit/a41c99bf311aa8f4e0c2e07c1288f5a11e057ea4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21420 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21420 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91090/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21246 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21246 **[Test build #91100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91100/testReport)** for PR 21246 at commit [`6fd8f2f`](https://github.com/apache/spark/commit/6fd8f2fbd37e5193f0ffb1a25a8f4a8c71ab55bd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21246 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21246 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3548/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21369 **[Test build #91101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91101/testReport)** for PR 21369 at commit [`bc7dc11`](https://github.com/apache/spark/commit/bc7dc11383db8370f755a058f4b908588f93edc8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 @cloud-fan Thanks a lot for looking into this. I updated the change and generalized `ExtractAttribute` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3549/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #91092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91092/testReport)** for PR 19560 at commit [`78b34bd`](https://github.com/apache/spark/commit/78b34bd7b79550b23730e1c9cdf06620e52b66f2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91092/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21391 **[Test build #91102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91102/testReport)** for PR 21391 at commit [`8967660`](https://github.com/apache/spark/commit/896766016e9576f1eb70cea62d38bf2ed897b1d0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19602 **[Test build #91103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91103/testReport)** for PR 19602 at commit [`76676c1`](https://github.com/apache/spark/commit/76676c1982adc9a73c3c5c41c6ddaf50332d4240). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19602 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3550/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19602 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21423: [SPARK-24378][SQL] Fix date_trunc function incorr...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/21423 [SPARK-24378][SQL] Fix date_trunc function incorrect examples ## What changes were proposed in this pull request? Fix `date_trunc` function incorrect examples. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-24378 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21423.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21423 commit b8b0c9dd21bbb4a5d29174d778165a2bd72403e5 Author: Yuming Wang Date: 2018-05-24T11:46:28Z Fix incorrect examples --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21423 **[Test build #91104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91104/testReport)** for PR 21423 at commit [`b8b0c9d`](https://github.com/apache/spark/commit/b8b0c9dd21bbb4a5d29174d778165a2bd72403e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21424 [SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside. ## What changes were proposed in this pull request? After https://github.com/apache/spark/pull/20014, Spark won't fails the entire executor but only fails the task suffering `SparkOutOfMemoryError`. After https://github.com/apache/spark/pull/21342, `BroadcastExchangeExec` try-catch `OutOfMemoryError`. Think about below scenario: 1. `SparkOutOfMemoryError`(subclass of `OutOfMemoryError`) is thrown in `scala.concurrent.Future`; 2. `SparkOutOfMemoryError` is caught and an `OutOfMemoryError` is wrapped in `SparkFatalException` and re-thrown; 3. `ThreadUtils.awaitResult` catches `SparkFatalException` and a `OutOfMemoryError` is thrown; 4. The `OutOfMemoryErro`r will go to `SparkUncaughtExceptionHandler.uncaughtException` and Executor fails. So it makes more sense to catch `SparkOutOfMemory` and re-throw `SparkFatalException`, which wraps `SparkOutOfMemory` inside. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinxing64/spark SPARK-24379 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21424.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21424 commit aa10470b7b09a100ee80afedb29b24548fbe5512 Author: jinxing Date: 2018-05-24T11:51:40Z [SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21423 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21423 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3551/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21424 cc @cloud-fan @JoshRosen Would you please help take a look at this when you have time ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21424 **[Test build #91105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91105/testReport)** for PR 21424 at commit [`aa10470`](https://github.com/apache/spark/commit/aa10470b7b09a100ee80afedb29b24548fbe5512). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21424 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3552/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21424 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21420 **[Test build #91106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91106/testReport)** for PR 21420 at commit [`c8521cc`](https://github.com/apache/spark/commit/c8521cc0de9de2e113a72e8379272b6fd009279a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21420 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3553/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21420 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21415 **[Test build #91096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91096/testReport)** for PR 21415 at commit [`0aef16b`](https://github.com/apache/spark/commit/0aef16b5e9017fb398e0df2f3694a1db1f4d7cb8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21295 **[Test build #91097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91097/testReport)** for PR 21295 at commit [`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21415 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91097/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91096/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21390 YARN will clean container local dirs when container (executor) is exited, so this may not be a problem in YARN. YARN has a useful configuration "yarn.nodemanager.delete.debug-delay-sec" to delay the container dir cleanup for a specified time, which is quite useful for debug. Maybe we can add a similar config here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r190571272 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -97,6 +99,10 @@ private[deploy] class Worker( private val APP_DATA_RETENTION_SECONDS = conf.getLong("spark.worker.cleanup.appDataTtl", 7 * 24 * 3600) + // Whether or not cleanup the non-shuffle files on executor death. + private val CLEANUP_NON_SHUFFLE_FILES_ENABLED = +conf.getBoolean("spark.storage.cleanupFilesAfterExecutorDeath", true) --- End diff -- Shall we rename this config to "spark.storage.cleanupFilesAfterExecutorExit"? Seems from the code that normal executor exit (dynamic allocation) will also trigger the cleanup, this config may be a little misleading. Please correct me if I'm wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18304: [SPARK-21098] Set lineseparator csv multiline and csv wr...
Github user cse68197 commented on the issue: https://github.com/apache/spark/pull/18304 Could you please validate that is this has been fixed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19602 **[Test build #91103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91103/testReport)** for PR 19602 at commit [`76676c1`](https://github.com/apache/spark/commit/76676c1982adc9a73c3c5c41c6ddaf50332d4240). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19602 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19602 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91103/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21295 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21295 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18304: [SPARK-21098] Set lineseparator csv multiline and csv wr...
Github user cse68197 commented on the issue: https://github.com/apache/spark/pull/18304 I am writing data to a file like below- allDF.rdd.map(rec => rec.mkString("|")).repartition(1).saveAsTextFile("location for file") but when I opening that file in notepad, that is opening in single line but the same file is opening fine in notepad++ and I can see all the data in new lines. I tried with below options (one by one) before saving as well but those also not worked. spark.conf.set("textinputformat.record.delimeter","\r\n") spark.conf.set("textinputformat.record.delimeter","\n") So could you please help me to understand the any alternative way to fix it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21424#discussion_r190577287 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -115,9 +116,9 @@ case class BroadcastExchangeExec( // SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we throw // SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult // will catch this exception and re-throw the wrapped fatal throwable. - case oe: OutOfMemoryError => + case oe: SparkOutOfMemoryError => throw new SparkFatalException( - new OutOfMemoryError(s"Not enough memory to build and broadcast the table to " + + new SparkOutOfMemoryError(s"Not enough memory to build and broadcast the table to " + --- End diff -- since we fully control the creation of `SparkOutOfMemoryError`, can we move the error message to where we throw `SparkOutOfMemoryError` when building hash relation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21260: [SPARK-23529][K8s] Support mounting volumes
Github user andrusha commented on a diff in the pull request: https://github.com/apache/spark/pull/21260#discussion_r190577601 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Extract Spark volume configuration properties with a given name prefix. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseVolumesWithPrefix( --- End diff -- Tests are missing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21331 **[Test build #91099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91099/testReport)** for PR 21331 at commit [`a0af525`](https://github.com/apache/spark/commit/a0af52524e30a9ace9d9a6239de79a7251a2499c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21331 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91099/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r190579088 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -53,7 +52,7 @@ class HiveClientSuite(version: String) for { ds <- 20170101 to 20170103 h <- 0 to 23 -chunk <- Seq("aa", "ab", "ba", "bb") +chunk <- Seq("11", "12", "21", "22") --- End diff -- The first point looks fine, for the second one, can we generate new data for your new test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r190579351 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled +object ExtractAttribute { + def unapply(expr: Expression): Option[Attribute] = { +expr match { + case attr: Attribute => Some(attr) + case cast @ Cast(child, dt: StringType, _) if child.dataType.isInstanceOf[NumericType] => +unapply(child) + case cast @ Cast(child, dt: NumericType, _) if child.dataType == StringType => --- End diff -- I don't think this is safe. It assumes spark and hive has the same behavior when converting invalid string to numbers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r190583904 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -414,7 +415,106 @@ class ExternalAppendOnlyMapSuite extends SparkFunSuite with LocalSparkContext { sc.stop() } - test("external aggregation updates peak execution memory") { + test("SPARK-22713 spill during iteration leaks internal map") { +val size = 1000 +val conf = createSparkConf(loadDefaults = true) +sc = new SparkContext("local-cluster[1,1,1024]", "test", conf) +val map = createExternalMap[Int] + +map.insertAll((0 until size).iterator.map(i => (i / 10, i))) +assert(map.numSpills == 0, "map was not supposed to spill") + +val it = map.iterator +assert(it.isInstanceOf[CompletionIterator[_, _]]) +val underlyingIt = map.readingIterator +assert( underlyingIt != null ) --- End diff -- `assert(underlyingIt != null)`, we should not put space around. can you fix all of them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r190584765 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -585,17 +591,25 @@ class ExternalAppendOnlyMap[K, V, C]( } else { logInfo(s"Task ${context.taskAttemptId} force spilling in-memory map to disk and " + s"it will release ${org.apache.spark.util.Utils.bytesToString(getUsed())} memory") -nextUpstream = spillMemoryIteratorToDisk(upstream) +val nextUpstream = spillMemoryIteratorToDisk(upstream) +assert(!upstream.hasNext) hasSpilled = true +upstream = nextUpstream true } } +private def destroy() : Unit = { + freeCurrentMap() + upstream = Iterator.empty +} + +private[ExternalAppendOnlyMap] --- End diff -- It's weird to see a class private method. I'd suggest just remove `private[ExternalAppendOnlyMap]`. `spill` is only called in `ExternalAppendOnlyMap` and it's public. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21383 **[Test build #91107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91107/testReport)** for PR 21383 at commit [`f0f80ed`](https://github.com/apache/spark/commit/f0f80ed1b8333bbab841a59f151deff18bc73447). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21383 **[Test build #91107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91107/testReport)** for PR 21383 at commit [`f0f80ed`](https://github.com/apache/spark/commit/f0f80ed1b8333bbab841a59f151deff18bc73447). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21383 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91107/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21383 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21383 **[Test build #91108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91108/testReport)** for PR 21383 at commit [`d59f0d5`](https://github.com/apache/spark/commit/d59f0d5a2735713bb7e218cfcda2b494edfcf522). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21369 **[Test build #91109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91109/testReport)** for PR 21369 at commit [`807032d`](https://github.com/apache/spark/commit/807032dcded2d7ec9b879176b7c5116df0f424ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21369 the patch LGTM, but I'm not sure the test is useful. it's too coupled with the implementation and if we have reference leak again, I don't think the test can help to detect it. Can we copy-paste https://github.com/scala/scala/blob/2.13.x/test/junit/scala/tools/testing/AssertUtil.scala#L69-L90 to the test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3554/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21383 **[Test build #91108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91108/testReport)** for PR 21383 at commit [`d59f0d5`](https://github.com/apache/spark/commit/d59f0d5a2735713bb7e218cfcda2b494edfcf522). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21383 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21383 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91108/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190598227 --- Diff: python/pyspark/util.py --- @@ -89,6 +89,23 @@ def majorMinorVersion(sparkVersion): " version numbers.") +def fail_on_stopiteration(f): +""" +Wraps the input function to fail on 'StopIteration' by raising a 'RuntimeError' +prevents silent loss of data when 'f' is used in a for loop +""" --- End diff -- ``` """ Wraps the input function to fail on 'StopIteration' by raising a 'RuntimeError' prevents silent loss of data when 'f' is used in a for loop """ ``` per PEP 8 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190598641 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def stopit(*x): +raise StopIteration() + +seq_rdd = self.sc.parallelize(range(10)) +keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10)) +exc = Py4JJavaError, RuntimeError --- End diff -- Hm .. can we just check one of explicit exception if it's not hard? Py4JJavaError or RuntimeError somehow sounds a bit two arbitrary exceptions ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21394 @HyukjinKwon @gengliangwang @maropu Please, look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21415 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21415 **[Test build #91110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91110/testReport)** for PR 21415 at commit [`0aef16b`](https://github.com/apache/spark/commit/0aef16b5e9017fb398e0df2f3694a1db1f4d7cb8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21394 Sounds reasonable for now. LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21394 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190567010 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def a_rdd(keyed=False): +return self.sc.parallelize( +((x % 2, x) if keyed else x) +for x in range(10) +) + +def stopit(*x): +raise StopIteration() + +def do_test(action, *args, **kwargs): +with self.assertRaises((Py4JJavaError, RuntimeError)) as cm: --- End diff -- Can you clarify? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190603773 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def stopit(*x): +raise StopIteration() + +seq_rdd = self.sc.parallelize(range(10)) +keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10)) +exc = Py4JJavaError, RuntimeError --- End diff -- Both of them can happen, depending on where the `StopIteration` is raised. Consider for example `RDD.reduce`: if the exception is raised when reducing inside a partition, the user will get a `Py4JJavaError`, but if the error is raised when reducing locally the results [here](https://github.com/e-dorigatti/spark/blob/fix_spark_23754/python/pyspark/rdd.py#L858), it will be a `RuntimeError` (the one we raise in `fail_on_stopiteration`) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21380 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21394: [SPARK-24329][SQL] Test for skipping multi-space ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21394 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190605953 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def stopit(*x): +raise StopIteration() + +seq_rdd = self.sc.parallelize(range(10)) +keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10)) +exc = Py4JJavaError, RuntimeError --- End diff -- Got it. Makes sense. Let's add a single comment while we are here if you don't mind. Seems few changes are needed anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21383 LGTM too if the tests pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190607843 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def stopit(*x): +raise StopIteration() + +seq_rdd = self.sc.parallelize(range(10)) +keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10)) +exc = Py4JJavaError, RuntimeError --- End diff -- Wait .. just for clarification, one of both exception can be arbitrarily raised for each execution? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21423: [SPARK-24378][SQL] Fix date_trunc function incorrect exa...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21423 > How was this patch tested? I believe you manually tested though :-). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21246 **[Test build #91100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91100/testReport)** for PR 21246 at commit [`6fd8f2f`](https://github.com/apache/spark/commit/6fd8f2fbd37e5193f0ffb1a25a8f4a8c71ab55bd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21410 > Is there a way to identify where in the schema the issue is occurring? We can catch the exceptions on each level of schema tree traversal, and show sub-trees in each catch. For example: `array>>>` , the first exception will point out `struct`, the second one `array>` and up to the "root" schema. > e.g., a.b.c where this is happening, is required to easily isolate the issue in the input data and resolve it. I guess in the case of arrays and maps, you want to see indexes and keys. Could you provide concrete example with values and a schema (array, struct, map), and what kind of info the error should contain. Just in case, I would propose to make such improvements in a separate PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21246 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21246 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91100/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21410 @gatorsmile Could you look at the PR, please. The changes should help us in trouble shooting of customer's issues. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21391 **[Test build #91102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91102/testReport)** for PR 21391 at commit [`8967660`](https://github.com/apache/spark/commit/896766016e9576f1eb70cea62d38bf2ed897b1d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21391 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21391: [SPARK-24343][SQL] Avoid shuffle for the bucketed table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21391 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91102/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21425: Add unit tests for NOT IN subquery around null va...
GitHub user mgyucht opened a pull request: https://github.com/apache/spark/pull/21425 Add unit tests for NOT IN subquery around null values ## What changes were proposed in this pull request? This PR adds several unit tests along the `cols NOT IN (subquery)` pathway. There are a scattering of tests here and there which cover this codepath, but there doesn't seem to be a unified unit test of the correctness of null-aware anti joins anywhere. I have also added a brief explanation of how this expression behaves in SubquerySuite. Lastly, I made some clarifying changes in the NOT IN pathway in RewritePredicateSubquery. ## How was this patch tested? Added unit tests! There should be no behavioral change in this PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgyucht/spark-1 spark-24381 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21425.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21425 commit d6040ea0028754c7fe39ddcebb6bd027749acc4e Author: Miles Yucht Date: 2018-05-24T15:16:37Z Add tests, and small clean-up of the NOT IN pathway --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21369 **[Test build #91101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91101/testReport)** for PR 21369 at commit [`bc7dc11`](https://github.com/apache/spark/commit/bc7dc11383db8370f755a058f4b908588f93edc8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91101/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21425: Add unit tests for NOT IN subquery around null values
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21425 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org