[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99235/testReport)** for PR 23108 at commit [`e731746`](https://github.com/apache/spark/commit/e731746da4643d0283b3cd788d286aea62c96215). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5324/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5323/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99234/testReport)** for PR 23131 at commit [`515c04c`](https://github.com/apache/spark/commit/515c04c8833bd5b5683c7040e7d46d0b026255e6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingItera...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r236060023 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C]( spills.clear() forceSpillFiles.foreach(s => s.file.delete()) forceSpillFiles.clear() -if (map != null || buffer != null) { +if (map != null || buffer != null || readingIterator != null) { map = null // So that the memory can be garbage-collected buffer = null // So that the memory can be garbage-collected + readingIterator = null // So that the memory can be garbage-collected --- End diff -- Nice. Case well explained. But I think you need to add corresponding test cases for `CompletionIterator` and `ExternalSorter`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23132 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99232/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23132 **[Test build #99232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99232/testReport)** for PR 23132 at commit [`83920b2`](https://github.com/apache/spark/commit/83920b25f586dc242841ff0a73105ae9e43295ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99231/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99231/testReport)** for PR 23131 at commit [`170262b`](https://github.com/apache/spark/commit/170262b9c16c2b8901b1ec65e7c98a25a7eef077). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...
Github user szhem commented on the issue: https://github.com/apache/spark/pull/23083 Hi @cloud-fan > Looking at the code, we are trying to fix 2 memory leaks: the task completion listener in ShuffleBlockFetcherIterator, and the CompletionIterator. If that's case, can you say that in the PR description? I've updated the description and the title of this PR correspondingly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23131#discussion_r236057261 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -341,8 +341,6 @@ displayTitle: Spark SQL Upgrading Guide APIs. Instead, `DataFrame` remains the primary programming abstraction, which is analogous to the single-node data frame notion in these languages. - - Dataset and DataFrame API `unionAll` has been deprecated and replaced by `union` --- End diff -- That's my fault for making this suggestion. Yeah maybe best to leave this statement, and add a note here or the the 3.0 migration guide that it has been subsequently un-deprecated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23133 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5322/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23133 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5322/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23131#discussion_r236057105 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -341,8 +341,6 @@ displayTitle: Spark SQL Upgrading Guide APIs. Instead, `DataFrame` remains the primary programming abstraction, which is analogous to the single-node data frame notion in these languages. - - Dataset and DataFrame API `unionAll` has been deprecated and replaced by `union` --- End diff -- Ur, we cannot change the history. Until Spark 2.4.0, we are showing the deprecation warning. ```scala scala> spark.version res2: String = 2.4.0 scala> df.unionAll(df2) :28: warning: method unionAll in class Dataset is deprecated: use union() df.unionAll(df2) ^ ``` Shall we keep the history in this specific migration doc, `Upgrading From Spark SQL 1.6 to 2.0`, and add some comment it's added back in 3.0.0 instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r236057101 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C]( spills.clear() forceSpillFiles.foreach(s => s.file.delete()) forceSpillFiles.clear() -if (map != null || buffer != null) { +if (map != null || buffer != null || readingIterator != null) { map = null // So that the memory can be garbage-collected buffer = null // So that the memory can be garbage-collected + readingIterator = null // So that the memory can be garbage-collected --- End diff -- @advancedxy I've tried to remove all the modifications except for this one and got OutOfMemoryErrors once again. Here are the details: 1. Now there are 4 `ExternalSorter` remained 2 of them are not closed ones ... ![1_readingiterator_isnull_nonclosed_externalsorter](https://user-images.githubusercontent.com/1523889/48973288-2218d180-f04d-11e8-9329-27b3edf33c48.png) and 2 of them are closed ones ... ![2_readingiterator_isnull_closed_externalsorter](https://user-images.githubusercontent.com/1523889/48973295-483e7180-f04d-11e8-83cf-23361515363f.png) as expected 2. There are 2 `SpillableIterator`s (which consume a significant part of memory) of already closed `ExternalSorter`s remained ![4_readingiterator_isnull_spillableiterator_of_closed_externalsorter](https://user-images.githubusercontent.com/1523889/48973318-cf8be500-f04d-11e8-912f-74be7420ca95.png) 3. These `SpillableIterator`s are referenced by `CompletionIterator`s ... ![6_completioniterator_of_blockstoreshufflereader](https://user-images.githubusercontent.com/1523889/48973357-a6b81f80-f04e-11e8-810f-dc8941430f34.png) ... which in their order seem to be referenced by the `cur` field ... ![7_coalescedrdd_compute_flatmap](https://user-images.githubusercontent.com/1523889/48973491-7e7df000-f051-11e8-8864-7e9e7f3f994b.png) ... of the standard `Iterator`'s `flatMap` that is used in the `compute` method of `CoalescedRDD` ![image](https://user-images.githubusercontent.com/1523889/48973401-7fae1d80-f04f-11e8-8cf2-043c808173d9.png) Standard `Iterator`'s `flatMap` does not clean up its `cur` field before obtaining the next value for it which in its order will consume quite a lot of memory too ![image](https://user-images.githubusercontent.com/1523889/48973418-dfa4c400-f04f-11e8-8f0e-b464567d43de.png) .. and in case of Spark that means that the previous iterator consuming the memory will live there while fetching the next value for it ![8_coalescedrdd_compute_flatmap_cur_isnotassigned](https://user-images.githubusercontent.com/1523889/48974089-0000-f05f-11e8-8319-f7d1f778f381.png) So I've returned the changes made to the `CompletionIterator` to reassign the reference of its sub-iterator to the `empty` iterator ... ![image](https://user-images.githubusercontent.com/1523889/48973472-27781b00-f051-11e8-86e1-cd6b87cd114b.png) ... and that has helped. P.S. I believe that cleaning up the standard `flatMap`'s iterator `cur` field before calling `nextCur` could help too ```scala def flatMap[B](f: A => GenTraversableOnce[B]): Iterator[B] = new AbstractIterator[B] { private var cur: Iterator[B] = empty private def nextCur() { cur = f(self.next()).toIterator } def hasNext: Boolean = { // Equivalent to cur.hasNext || self.hasNext && { nextCur(); hasNext } // but slightly shorter bytecode (better JVM inlining!) while (!cur.hasNext) { cur = empty if (!self.hasNext) return false nextCur() } true } def next(): B = (if (hasNext) cur else empty).next() } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23133: [MINOR][K8S] Invalid property "spark.driver.pod.n...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23133 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23133 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5322/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23133 Thank you, @Leemoonsoo . Merged to `master/branch-2.4`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23133 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99233/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23133 **[Test build #99233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99233/testReport)** for PR 23133 at commit [`d50b312`](https://github.com/apache/spark/commit/d50b31236bcf04d636362f6b19e318eca66bc01f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23133 **[Test build #99233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99233/testReport)** for PR 23133 at commit [`d50b312`](https://github.com/apache/spark/commit/d50b31236bcf04d636362f6b19e318eca66bc01f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99228/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23133 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99228/testReport)** for PR 23131 at commit [`133246d`](https://github.com/apache/spark/commit/133246d973eb516ebc12ba5bb49cd30ba4f108f9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23133 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23133 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23133: [MINOR][K8S] Invalid property "spark.driver.pod.name" is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23133 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23133: [MINOR][K8S] Invalid property "spark.driver.pod.n...
GitHub user Leemoonsoo opened a pull request: https://github.com/apache/spark/pull/23133 [MINOR][K8S] Invalid property "spark.driver.pod.name" is referenced in docs. ## What changes were proposed in this pull request? "Running on Kubernetes" references `spark.driver.pod.name` few places, and it should be `spark.kubernetes.driver.pod.name`. ## How was this patch tested? See changes You can merge this pull request into a Git repository by running: $ git pull https://github.com/Leemoonsoo/spark fix-driver-pod-name-prop Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23133.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23133 commit d50b31236bcf04d636362f6b19e318eca66bc01f Author: Lee moon soo Date: 2018-11-24T23:28:54Z spark.driver.pod.name -> spark.kubernetes.driver.pod.name in doc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23132 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5321/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23132 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23132 **[Test build #99232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99232/testReport)** for PR 23132 at commit [`83920b2`](https://github.com/apache/spark/commit/83920b25f586dc242841ff0a73105ae9e43295ed). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23132: [SPARK-26163][SQL] Parsing decimals from JSON usi...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/23132 [SPARK-26163][SQL] Parsing decimals from JSON using locale ## What changes were proposed in this pull request? In the PR, I propose using of the locale option to parse (and infer) decimals from JSON input. After the changes, `JacksonParser` converts input string to `BigDecimal` and to Spark's Decimal by using `java.text.DecimalFormat`. New behaviour can be switched off via SQL config `spark.sql.legacy.decimalParsing.enabled`. ## How was this patch tested? Added 2 tests to `JsonExpressionsSuite` for the `en-US`, `ko-KR`, `ru-RU`, `de-DE` locales: - Inferring decimal type using locale from JSON field values - Converting JSON field values to specified decimal type using the locales. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 json-decimal-parsing-locale Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23132 commit 506417ed4b643298560a66c043f7b31beb489da3 Author: Maxim Gekk Date: 2018-11-10T19:49:12Z Test for parsing decimals using locale commit ac25fb6ed1d3d6689ad8841476c025848c87f2a3 Author: Maxim Gekk Date: 2018-11-10T19:51:48Z Parsing decimals using locale commit b784003078270a46aaf8aceb2d86dd9f13f3500c Author: Maxim Gekk Date: 2018-11-10T19:54:00Z Updating the migration guide commit d0522093dfe1823f91100cbec3ef5d6c8a372f27 Author: Maxim Gekk Date: 2018-11-24T19:32:45Z Merge remote-tracking branch 'origin/master' into json-decimal-parsing-locale # Conflicts: # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala commit 722e135cf3b94e22079fd9a0fe1d309a04e76a64 Author: Maxim Gekk Date: 2018-11-24T19:58:39Z Add SQL config spark.sql.legacy.decimalParsing.enabled commit dc6c0ac0a97b31441d99ff1dd71608ae5e2eca73 Author: Maxim Gekk Date: 2018-11-24T20:00:31Z Updating the migration guide commit f15b1817fb51d453487665122473855712214692 Author: Maxim Gekk Date: 2018-11-24T20:38:24Z Added a test for parsing commit ab781d54e4f7604d64c72c3c383c549abab0a9a9 Author: Maxim Gekk Date: 2018-11-24T20:52:03Z Fix test commit 163a8b9d7d017409ae4dfa40e492680bf0e4f935 Author: Maxim Gekk Date: 2018-11-24T20:52:26Z Create getDecimalParser commit 8fb65c0db85f4bd2f76d473c5e31e772ff0d4c1d Author: Maxim Gekk Date: 2018-11-24T22:11:42Z Add a test for inferring decimals commit 7e3a2906a96894cadc58771131d07d06ba265382 Author: Maxim Gekk Date: 2018-11-24T22:12:35Z Change JsonSuite to adopt it for JsonInferSchema class commit 83920b25f586dc242841ff0a73105ae9e43295ed Author: Maxim Gekk Date: 2018-11-24T22:13:01Z Inferring decimals from JSON --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5320/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99231/testReport)** for PR 23131 at commit [`170262b`](https://github.com/apache/spark/commit/170262b9c16c2b8901b1ec65e7c98a25a7eef077). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23128 **[Test build #4440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4440/testReport)** for PR 23128 at commit [`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5319/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99230/testReport)** for PR 23131 at commit [`f0dfe7b`](https://github.com/apache/spark/commit/f0dfe7ba56daee34a37ba727ac76e29325b7e995). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99230/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99230/testReport)** for PR 23131 at commit [`f0dfe7b`](https://github.com/apache/spark/commit/f0dfe7ba56daee34a37ba727ac76e29325b7e995). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99229/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99229/testReport)** for PR 23131 at commit [`12dfd77`](https://github.com/apache/spark/commit/12dfd77c665f38b450e4dc3e48a32bf651a3179e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99229/testReport)** for PR 23131 at commit [`12dfd77`](https://github.com/apache/spark/commit/12dfd77c665f38b450e4dc3e48a32bf651a3179e). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/23131#discussion_r236054278 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1852,6 +1852,19 @@ class Dataset[T] private[sql]( CombineUnions(Union(logicalPlan, other.logicalPlan)) } + /** + * Returns a new Dataset containing union of rows in this Dataset and another Dataset. --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23130 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23130 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99227/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23130 **[Test build #99227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99227/testReport)** for PR 23130 at commit [`b200a50`](https://github.com/apache/spark/commit/b200a50de58641d297381bec45687317bd21dfb7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23113: [SPARK-26019][PYTHON] Fix race condition in accumulators...
Github user Tagar commented on the issue: https://github.com/apache/spark/pull/23113 @HyukjinKwon This is somehow related to the fact that it is reproducible when PySpark was created using an existing py4j gateway (for example, it reproduces from within Zeppelin) https://github.com/apache/spark/blob/f83fedc9f20869ab4c62bb07bac50113d921207f/python/pyspark/context.py#L99 In case of Zeppelin, it creates its own Gateway https://github.com/apache/zeppelin/blob/adf83a3f5d009efd0e0416a7bf1d8b6fd86ea58a/spark/interpreter/src/main/resources/python/zeppelin_ipyspark.py#L33 And if `PY4J_GATEWAY_SECRET` environment variable wasn't set, it creates Gateway without `auth_token` set: `gateway = JavaGateway(GatewayClient(address="${JVM_GATEWAY_ADDRESS}", port=${JVM_GATEWAY_PORT}), auto_convert=True)` py4j allows working without auth tokens : https://github.com/bartdag/py4j/blob/079fcf11ec18058af0b356f78974125eb7384711/py4j-python/src/py4j/java_gateway.py#L774 As I mentioned in SPARK-26019, it's only first request that gets this error ``` File "../lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 263, in handle poll(authenticate_and_accum_updates) File "../lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 238, in poll if func(): File "../lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 251, in authenticate_and_accum_updates received_token = self.rfile.read(len(auth_token)) TypeError: object of type 'NoneType' has no len() ``` Rerunning same request works later on fine. Is it because auth_token gets set? Or just because there are no accumulator update requests for following requests? I don't know internals of Spark that well to answer that, but hopefully this provides some more context to help assess this problem and why this PR fixes it. Thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5318/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23131 **[Test build #99228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99228/testReport)** for PR 23131 at commit [`133246d`](https://github.com/apache/spark/commit/133246d973eb516ebc12ba5bb49cd30ba4f108f9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23131#discussion_r236052557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1852,6 +1852,19 @@ class Dataset[T] private[sql]( CombineUnions(Union(logicalPlan, other.logicalPlan)) } + /** + * Returns a new Dataset containing union of rows in this Dataset and another Dataset. --- End diff -- say that this is an alias of union. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/23131 cc @rxin @srowen @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/23131 [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll ## What changes were proposed in this pull request? This PR is to add back `unionAll`, which is widely used. The name is also consistent with our ANSI SQL. We also have the corresponding `IntersectAll` and `exceptAll`, which were introduced in Spark 2.4. ## How was this patch tested? Added a test case in DataFrameSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark addBackUnionAll Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23131 commit 133246d973eb516ebc12ba5bb49cd30ba4f108f9 Author: gatorsmile Date: 2018-11-24T20:04:52Z Add back unionAll --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23130 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23130 @cloud-fan @HyukjinKwon Please, take a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23130 **[Test build #99227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99227/testReport)** for PR 23130 at commit [`b200a50`](https://github.com/apache/spark/commit/b200a50de58641d297381bec45687317bd21dfb7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22938: [SPARK-25935][SQL] Prevent null rows from JSON pa...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22938#discussion_r236049216 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1892,7 +1898,7 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { .text(path) val jsonDF = spark.read.option("multiLine", true).option("mode", "PERMISSIVE").json(path) - assert(jsonDF.count() === corruptRecordCount) + assert(jsonDF.count() === corruptRecordCount + 1) // null row for empty file --- End diff -- @cloud-fan @HyukjinKwon Here is a PR https://github.com/apache/spark/pull/23130 which does this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23130 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5317/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23130 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/23130 [SPARK-26161][SQL] Ignore empty files in load ## What changes were proposed in this pull request? In the PR, I propose filtering out all empty files inside of `DataSourceScanExec` and exclude them from file splits. It should reduce overhead of opening and reading files without any data, and as consequence datasources will not produce empty partitions for such files. ## How was this patch tested? Added a test which creates an empty and non-empty files. If empty files are ignored in load, Text datasource in the `wholetext` mode must create only one partition for non-empty file. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 ignore-empty-files Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23130.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23130 commit 36c64f0a8c47763dd99f0ed07cdf5a1813b0f2b7 Author: Maxim Gekk Date: 2018-11-24T17:29:25Z Test checks empty files are not loaded commit e428b83c82623f33e5b5e5073251d73d18a3c903 Author: Maxim Gekk Date: 2018-11-24T17:29:41Z Fix json tests commit 4212add0bcbcaaec5ee8e5483ea16d3e7929dcb6 Author: Maxim Gekk Date: 2018-11-24T17:34:24Z Fix test commit 2458882037349987396e8456799c451e80566442 Author: Maxim Gekk Date: 2018-11-24T17:36:06Z Filtering out empty files commit b200a50de58641d297381bec45687317bd21dfb7 Author: Maxim Gekk Date: 2018-11-24T17:53:45Z Fix imports --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23128 **[Test build #4440 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4440/testReport)** for PR 23128 at commit [`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22779 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22683 Yeah, there are going to be several more tests that fail because they are expecting a string like 'KB'. Hopefully easy to fix. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99226/testReport/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23125: [SPARK-26156][WebUI] Revise summary section of st...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23125 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23125: [SPARK-26156][WebUI] Revise summary section of stage pag...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23125 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22779 Merged to master, but I'm also going to try to back port to 2.4 and 2.3 as a bug fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23129: [MINOR] Update all DOI links to preferred resolver
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23129 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99224/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is false ,...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22779 **[Test build #99224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99224/testReport)** for PR 22779 at commit [`55db355`](https://github.com/apache/spark/commit/55db355fc4cbb6d036ec45895c5869690165e706). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99226/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99226/testReport)** for PR 22683 at commit [`3bf6ca5`](https://github.com/apache/spark/commit/3bf6ca58904f4f1d363e8505bd9d14e5aad0ebd7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23128 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23128 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99225/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23128 **[Test build #99225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99225/testReport)** for PR 23128 at commit [`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22683: [SPARK-25696] The storage memory displayed on spa...
GitHub user httfighter reopened a pull request: https://github.com/apache/spark/pull/22683 [SPARK-25696] The storage memory displayed on spark Application UI is⦠⦠incorrect. ## What changes were proposed in this pull request? In the reported heartbeat information, the unit of the memory data is bytes, which is converted by the formatBytes() function in the utils.js file before being displayed in the interface. The cardinality of the unit conversion in the formatBytes function is 1000, which should be 1024. Change the cardinality of the unit conversion in the formatBytes function to 1024. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/httfighter/spark SPARK-25696 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22683.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22683 commit 9e45697296039e55e85dd204788e287c9c60fceb Author: é©ç°ç°00222924 Date: 2018-10-10T06:47:36Z [SPARK-25696] The storage memory displayed on spark Application UI is incorrect. commit 3bf6ca58904f4f1d363e8505bd9d14e5aad0ebd7 Author: é©ç°ç°00222924 Date: 2018-11-24T08:53:12Z Supplement the modification of the memory unit displayed on the UI --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user httfighter commented on the issue: https://github.com/apache/spark/pull/22683 @srowen@ajbozarth I have added the changes, could you help me review the code? Thank you very much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22683: [SPARK-25696] The storage memory displayed on spa...
Github user httfighter closed the pull request at: https://github.com/apache/spark/pull/22683 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99226/testReport)** for PR 22683 at commit [`3bf6ca5`](https://github.com/apache/spark/commit/3bf6ca58904f4f1d363e8505bd9d14e5aad0ebd7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23128 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5316/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23128 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r236036258 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +/** + * An interface for reporting shuffle read metrics, for each shuffle. This interface assumes + * all the methods are called on a single-threaded, i.e. concrete implementations would not need + * to synchronize. + * + * All methods have additional Spark visibility modifier to allow public, concrete implementations + * that still have these methods marked as private[spark]. + */ +private[spark] trait ShuffleReadMetricsReporter { --- End diff -- https://github.com/apache/spark/pull/23128 :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23128 **[Test build #99225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99225/testReport)** for PR 23128 at commit [`cb46bfe`](https://github.com/apache/spark/commit/cb46bfeb930b71d560340393e95097ee66303862). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23129: Hyperlink DOIs to preferred resolver
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23129 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/23128 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23129: Hyperlink DOIs to preferred resolver
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23129 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org