[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/23228 cc @JoshRosen @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23249 **[Test build #99814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99814/testReport)** for PR 23249 at commit [`130bc95`](https://github.com/apache/spark/commit/130bc957b6d8cf1b8b6c1cf77c57eb759fcc6aa6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99815/testReport)** for PR 22683 at commit [`9dff9ee`](https://github.com/apache/spark/commit/9dff9eea09cbf3d5298bd6d261e1595cafaaae69). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5846/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23239 The change looks fine. Do we already have tests for cases 2 and 4? We know test for case 3 is [here](https://github.com/apache/spark/pull/23043). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22707: [SPARK-25717][SQL] Insert overwrite a recreated external...
Github user fjh100456 commented on the issue: https://github.com/apache/spark/pull/22707 Is there any more suggestions? @wangyum @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99813/testReport)** for PR 22683 at commit [`bf150fb`](https://github.com/apache/spark/commit/bf150fb4bbc68627d19521a31a0d3a294d079862). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23249 **[Test build #99812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99812/testReport)** for PR 23249 at commit [`04be19e`](https://github.com/apache/spark/commit/04be19e62caa8fd0365b4998e22cdcad846be6b8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23239 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99801/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5845/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23239 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23245 **[Test build #99809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99809/testReport)** for PR 23245 at commit [`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ArrowCollectSerializer(Serializer):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99809/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23239 **[Test build #99801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99801/testReport)** for PR 23239 at commit [`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99804/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)** for PR 23108 at commit [`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239711919 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- That's it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99802/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23251 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23251 **[Test build #99802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99802/testReport)** for PR 23251 at commit [`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23238 Thank you for adding this to the migration doc. cc @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23238#discussion_r239708569 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -141,6 +141,8 @@ displayTitle: Spark SQL Upgrading Guide - In Spark version 2.3 and earlier, HAVING without GROUP BY is treated as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as `SELECT 1 FROM range(10) WHERE true` and returns 10 rows. This violates SQL standard, and has been fixed in Spark 2.4. Since Spark 2.4, HAVING without GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) HAVING true` will return only one row. To restore the previous behavior, set `spark.sql.legacy.parser.havingWithoutGroupByAsWhere` to `true`. + - In version 2.3 and earlier, when reading from a Parquet data source table, Spark always returns null for any column whose column names in Hive metastore schema and Parquet schema are in different letter cases, no matter whether `spark.sql.caseSensitive` is set to true or false. Since 2.4, when `spark.sql.caseSensitive` is set to false, Spark does case insensitive column name resolution between Hive metastore schema and Parquet schema, so even column names are in different letter cases, Spark returns corresponding column values. An exception is thrown if there is ambiguity, i.e. more than one Parquet column is matched. This change also applies to Parquet Hive tables when `spark.sql.hive.convertMetastoreParquet` is set to true. --- End diff -- Hi, @seancxmao . Maybe, the followings? ``` - `spark.sql.caseSensitive` is set to true or false + `spark.sql.caseSensitive` is set to `true` or `false` ``` ``` - `spark.sql.caseSensitive` is set to false + `spark.sql.caseSensitive` is set to `false` ``` ``` - `spark.sql.hive.convertMetastoreParquet` is set to true + `spark.sql.hive.convertMetastoreParquet` is set to `true` ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99811/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)** for PR 22683 at commit [`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20146 ping @dbtsai --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239707130 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- Ya. That's the same what I observed. There is no need to keep both test cases because there is no code path like that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239706529 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala --- @@ -16,10 +16,13 @@ */ package org.apache.spark.deploy.k8s.features -import scala.collection.JavaConverters._ +import java.io.File +import java.nio.charset.StandardCharsets +import java.nio.file.Files import io.fabric8.kubernetes.api.model._ import org.scalatest.BeforeAndAfter +import scala.collection.JavaConverters._ --- End diff -- ? Hi, @mccheah . We import `java.*` and `scala.*` before any others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/23218 do we need to relnote jvm compatibility? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239705869 --- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala --- @@ -440,12 +473,27 @@ class SecurityManagerSuite extends SparkFunSuite with ResetSystemProperties { intercept[IllegalArgumentException] { mgr.getSecretKey() } + case FILE => +val secretFile = createTempSecretFile() +conf.set(AUTH_SECRET_FILE, secretFile.getAbsolutePath) +mgr.initializeAuth() +assert(encodeFileAsBase64(secretFile) === mgr.getSecretKey()) } } } ) } } + private def encodeFileAsBase64(secretFile: File) = { +Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath)) + } + + private def createTempSecretFile(contents: String = "test-secret"): File = { +val secretDir = Utils.createTempDir("temp-secrets") +val secretFile = new File(secretDir, "temp-secret.txt") +Files.write(secretFile.toPath, contents.getBytes(StandardCharsets.UTF_8)) +secretFile --- End diff -- can this secret be recovered on disk or we trust tempDir ACL is sufficient? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)** for PR 22683 at commit [`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239704796 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- I mean that before add code "if (sortedRecords.hasNext()) { return }" it will fail. now add assertEquals(0, spillFilesCreated.size()) to writeEmptyIterator seems good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99810/testReport)** for PR 22683 at commit [`6a3c58b`](https://github.com/apache/spark/commit/6a3c58b119ed298e1cab8d9a9b341a667a86c8f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23245 **[Test build #99809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99809/testReport)** for PR 23245 at commit [`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5844/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23245 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99803/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239702595 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- Do you mean that `writeEmptyIterator` will fail if we add this line `assertEquals(0, spillFilesCreated.size());` there? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239702559 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- I filed https://issues.apache.org/jira/browse/SPARK-26301 to suggest the alternative scheme. Unlike SPARK-26139 this would change the functionality that was merged here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23245 **[Test build #99803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99803/testReport)** for PR 23245 at commit [`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ArrowCollectSerializer(Serializer):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...
Github user liu-zhaokun commented on the issue: https://github.com/apache/spark/pull/23104 @guoxiaolongzte good job --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23252 **[Test build #99808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99808/testReport)** for PR 23252 at commit [`957cb15`](https://github.com/apache/spark/commit/957cb15a2d48b4cf2b5c7f1a8c124df3a53bf4d9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23252 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5843/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23252 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23072#discussion_r239701916 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -968,6 +970,17 @@ predicted <- predict(model, df) head(predicted) ``` + Power Iteration Clustering + +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex. + +```{r} +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), + list(1L, 2L, 1.0), list(3L, 4L, 1.0), --- End diff -- BTW, when I added that into https://spark.apache.org/contributing.html, we also agreed upon following committer's judgement based upon the guide because the guide mentions: > The coding conventions described above should be followed, unless there is good reason to do otherwise. Exceptions include legacy code and modifying third-party code. since we do have legacy reason, and there is a good reason - consistency and committer's judgement. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239701821 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -367,11 +372,26 @@ private[spark] class SecurityManager( case _ => require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), - s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") return } -secretKey = Utils.createSecret(sparkConf) +if (sparkConf.get(AUTH_SECRET_FILE_DRIVER).isDefined --- End diff -- Also considered adding a validation that file-based auth can only be used for Kubernetes. Omitted it for simplicity, but file-based auth seems dubious for non-K8s contexts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/23252 [SPARK-26239] File-based secret key loading for SASL. ## What changes were proposed in this pull request? This proposes an alternative way to load secret keys into a Spark application that is running on Kubernetes. Instead of automatically generating the secret, the secret key can reside in a file that is shared between both the driver and executor containers. ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/palantir/spark auth-secret-with-file Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23252.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23252 commit 957cb15a2d48b4cf2b5c7f1a8c124df3a53bf4d9 Author: mcheah Date: 2018-12-07T05:04:56Z [SPARK-26239] File-based secret key loading for SASL. This proposes an alternative way to load secret keys into a Spark application that is running on Kubernetes. Instead of automatically generating the secret, the secret key can reside in a file that is shared between both the driver and executor containers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/23252 @vanzin @vinooganesh @ifilonenko --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23072#discussion_r239701364 --- Diff: R/pkg/tests/fulltests/test_mllib_clustering.R --- @@ -319,4 +319,18 @@ test_that("spark.posterior and spark.perplexity", { expect_equal(length(local.posterior), sum(unlist(local.posterior))) }) +test_that("spark.assignClusters", { + df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), + list(1L, 2L, 1.0), list(3L, 4L, 1.0), + list(4L, 0L, 0.1)), schema = c("src", "dst", "weight")) + clusters <- spark.assignClusters(df, initMode = "degree", weightCol = "weight") + expected_result <- createDataFrame(list(list(4L, 1L), + list(0L, 0L), + list(1L, 0L), + list(3L, 1L), + list(2L, 0L)), + schema = c("id", "cluster")) --- End diff -- ditto for style --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99807/testReport)** for PR 22683 at commit [`87a9d5a`](https://github.com/apache/spark/commit/87a9d5ad1ebfbb9b247e95ead3e1a4c34ee08020). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23072#discussion_r239701069 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -968,6 +970,17 @@ predicted <- predict(model, df) head(predicted) ``` + Power Iteration Clustering + +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex. + +```{r} +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), + list(1L, 2L, 1.0), list(3L, 4L, 1.0), --- End diff -- There are two separate style are already mixed in R code IIRC: ```r df <- createDataFrame( list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), list(1L, 2L, 1.0), list(3L, 4L, 1.0), list(4L, 0L, 0.1)), schema = c("src", "dst", "weight")) ``` or ```r df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), list(1L, 2L, 1.0), list(3L, 4L, 1.0), list(4L, 0L, 0.1)), schema = c("src", "dst", "weight")) ``` Let's avoid mixed style, and let's go for the later one when possible because at least that looks more complying the code style. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239700999 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- Test writeEmptyIterator() has create spill file although write empty iterator,but the writeEmptyIteratorNotCreateEmptySpillFile test not create spill file while there has not records in memroy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23072#discussion_r239700846 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -968,6 +970,17 @@ predicted <- predict(model, df) head(predicted) ``` + Power Iteration Clustering + +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex. + +```{r} +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), + list(1L, 2L, 1.0), list(3L, 4L, 1.0), --- End diff -- Yea, we do have for indentation rule. "Code Style Guide" at https://spark.apache.org/contributing.html -> https://google.github.io/styleguide/Rguide.xml. I know the code style is not perfectly documented but at least there are some examples. I think the correct indentation is: ```r df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), list(1L, 2L, 1.0), list(3L, 4L, 1.0), list(4L, 0L, 0.1)), schema = c("src", "dst", "weight")) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99806/testReport)** for PR 22683 at commit [`e9b4b2c`](https://github.com/apache/spark/commit/e9b4b2c5b3f0da4e77d60d8299481c70aec95b6a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23072#discussion_r239700056 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -968,6 +970,17 @@ predicted <- predict(model, df) head(predicted) ``` + Power Iteration Clustering + +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex. + +```{r} +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), + list(1L, 2L, 1.0), list(3L, 4L, 1.0), --- End diff -- Do we have an indentation rule for this? This PR is using two types of indentations for the same statements. - For docs (sparkr-vignettes.Rmd, mllib_clustering.R), this line is aligned with the first `list`. - For real code (test_mllib_clustering.R, powerIterationClustering.R), this line is aligned with the second `list`. Can we use the same indentation rule? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/23207 ``` the code looks much cleaner now! ``` Sorry for the original rush and code, I should and will pay more attention on coding clean and more discussion on optional implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99805/testReport)** for PR 23207 at commit [`6378a3d`](https://github.com/apache/spark/commit/6378a3d4707b0d7559fca20220229cde71f9a64b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5842/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239698500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +80,7 @@ object SQLMetrics { private val SUM_METRIC = "sum" private val SIZE_METRIC = "size" private val TIMING_METRIC = "timing" + private val NS_TIMING_METRIC = "nanosecond" --- End diff -- How about naming it as `NORMALIZE_TIMING_METRIC`, maybe it can be reused later for other timing metric which need normalize unit. If you think its strange name I'll change back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239698273 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -333,8 +343,19 @@ object ShuffleExchangeExec { new ShuffleDependency[Int, InternalRow, InternalRow]( rddWithPartitionIds, new PartitionIdPassthrough(part.numPartitions), -serializer) +serializer, +shuffleWriterProcessor = createShuffleWriteProcessor(writeMetrics)) dependency } + + /** + * Create a customized [[ShuffleWriteProcessor]] for SQL which wrap the default metrics reporter + * with [[SQLShuffleWriteMetricsReporter]] as new reporter for [[ShuffleWriteProcessor]]. + */ + def createShuffleWriteProcessor(metrics: Map[String, SQLMetric]): ShuffleWriteProcessor = { +(reporter: ShuffleWriteMetricsReporter) => { --- End diff -- Yes it can't work with Scala 2.11, should write in more readable, done in 6378a3d. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239698174 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriterProcessor.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import org.apache.spark.{Partition, ShuffleDependency, SparkEnv, TaskContext} +import org.apache.spark.internal.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.scheduler.MapStatus + +/** + * The interface for customizing shuffle write process. The driver create a ShuffleWriteProcessor + * and put it into [[ShuffleDependency]], and executors use it in each ShuffleMapTask. + */ +private[spark] trait ShuffleWriteProcessor extends Serializable with Logging { + + /** + * Create a [[ShuffleWriteMetricsReporter]] from the default reporter, always return a proxy + * reporter for both local accumulator and original reporter updating. As the reporter is a + * per-row operator, here need a careful consideration on performance. + */ + def createMetricsReporter(reporter: ShuffleWriteMetricsReporter): ShuffleWriteMetricsReporter --- End diff -- Copy, the trait can be added when we need more customization for SQL shuffle. Done in 6378a3d. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239697971 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- It seems that we can move this line to `writeEmptyIterator()` instead of making a new test case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239697860 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { --- End diff -- This looks like mostly a duplication of `writeEmptyIterator`. Logically, do we need to keep both `writeEmptyIterator` and `writeEmptyIteratorNotCreateEmptySpillFile` seperately? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99798/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23221 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23221 **[Test build #99798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99798/testReport)** for PR 23221 at commit [`c9ab9bc`](https://github.com/apache/spark/commit/c9ab9bcc378168ff3430d8885899ccd74afe7b32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23250: [SPARK-26298][BUILD] Upgrade Janino to 3.0.11
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23250 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23250: [SPARK-26298][BUILD] Upgrade Janino to 3.0.11
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23250 Thank you, @HyukjinKwon . Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)** for PR 23108 at commit [`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5841/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/23108 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239694862 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- Ah I thought about this a bit more and realized that this is more insecure than I originally read it to be. If the secret is put directly in the environment variable field itself, then anyone who has permission to get the pod metadata from the Kubernetes API server can now read the secret generated by this application. In practice permissioning on pod specs is often far looser than permissioning on Kubernetes secret objects. In this solution the administrator has to restrict access to pod specs to only the user. I think at the very least we want this to be configured via creating a Kubernetes secret object, then [loading the environment variable](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables) to point to the secret object. In the meantime I'm going to push the PR that allows secrets to be specified as file paths directly. I will also file a Spark ticket to avoid putting the environment variable directly in the pod spec object itself. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239694008 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -243,10 +248,19 @@ case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int) * Represents a partitioning where rows are split across partitions based on some total ordering of * the expressions specified in `ordering`. When data is partitioned in this manner the following * two conditions are guaranteed to hold: - * - All row where the expressions in `ordering` evaluate to the same values will be in the same --- End diff -- nit: "row" -> "rows", "where... `ordering`" -> "whose `ordering` expressions" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239693849 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -243,10 +248,19 @@ case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int) * Represents a partitioning where rows are split across partitions based on some total ordering of * the expressions specified in `ordering`. When data is partitioned in this manner the following --- End diff -- nit: add "," after "this manner". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23239 Yes it is. `UnsafeProjection` always normalize NaN and -0.0, and Spark uses `UnsafeProjection` to produce output. So users can't distinguish them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5840/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/23215 Thank you @cloud-fan @viirya @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23245 **[Test build #99803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99803/testReport)** for PR 23245 at commit [`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23239 The migration guide has changed by another followup https://github.com/apache/spark/pull/23141: > In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but users can still distinguish them via `Dataset.show`, `Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 internally, and users can't distinguish them any more. Is above still correct after this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99797/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23245 **[Test build #99797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99797/testReport)** for PR 23245 at commit [`15000a2`](https://github.com/apache/spark/commit/15000a2d548bf3785ba61aa01b2f6fb60e47ca71). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239690226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, IntegerType} /** * Specifies how tuples that share common expressions will be distributed when a query is executed - * in parallel on many machines. Distribution can be used to refer to two distinct physical - * properties: - * - Inter-node partitioning of data: In this case the distribution describes how tuples are - *partitioned across physical machines in a cluster. Knowing this property allows some - *operators (e.g., Aggregate) to perform partition local operations instead of global ones. - * - Intra-partition ordering of data: In this case the distribution describes guarantees made - *about how tuples are distributed within a single partition. + * in parallel on many machines. + * + * Distribution here refers to inter-node partitioning of data: + * The distribution describes how tuples are partitioned across physical machines in a cluster. + * Knowing this property allows some operators (e.g., Aggregate) to perform partition local + * operations instead of global ones. */ --- End diff -- for ordering, I think people can look at `OrderedDistribution`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23239#discussion_r239690045 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java --- @@ -198,11 +198,45 @@ protected final void writeLong(long offset, long value) { Platform.putLong(getBuffer(), offset, value); } + // We need to take care of NaN and -0.0 in several places: + // 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be + // treated as same. + // 2. In range partitioner, different NaNs should belong to the same partition, -0.0 and 0.0 --- End diff -- As this is not a problem, we should update the PR description too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239689874 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, IntegerType} /** * Specifies how tuples that share common expressions will be distributed when a query is executed - * in parallel on many machines. Distribution can be used to refer to two distinct physical - * properties: - * - Inter-node partitioning of data: In this case the distribution describes how tuples are - *partitioned across physical machines in a cluster. Knowing this property allows some - *operators (e.g., Aggregate) to perform partition local operations instead of global ones. - * - Intra-partition ordering of data: In this case the distribution describes guarantees made - *about how tuples are distributed within a single partition. + * in parallel on many machines. + * + * Distribution here refers to inter-node partitioning of data: + * The distribution describes how tuples are partitioned across physical machines in a cluster. + * Knowing this property allows some operators (e.g., Aggregate) to perform partition local + * operations instead of global ones. */ --- End diff -- Yes, I understand that partitioning has nothing to do with intra-partition ordering at all. And it was wrong to include intra-partition ordering as part of the distribution properties. But I was thinking mentioning ordering as a side note would probably help ppl understand better how some operators work. Or maybe here's not the best place to put it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r239687264 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { DecimalType(bigDecimal.precision, bigDecimal.scale) } decimalTry.getOrElse(StringType) - case VALUE_STRING => StringType + case VALUE_STRING => +val stringValue = parser.getText --- End diff -- or the order doesn't matter? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r239687213 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { DecimalType(bigDecimal.precision, bigDecimal.scale) } decimalTry.getOrElse(StringType) - case VALUE_STRING => StringType + case VALUE_STRING => +val stringValue = parser.getText --- End diff -- I checked `PartitioningUtils.inferPartitionColumnValue`, we try timestamp first and then date. Shall we follow it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23251 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5839/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23251 **[Test build #99802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99802/testReport)** for PR 23251 at commit [`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23251: [SPARK-26300][SS] The `checkForStreaming` mothod ...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/23251 [SPARK-26300][SS] The `checkForStreaming` mothod may be called twice in `createQuery` ## What changes were proposed in this pull request? If `checkForContinuous` is called ( `checkForStreaming` is called in `checkForContinuous` ), the `checkForStreaming` mothod will be called twice in `createQuery` , this is not necessary, and the `checkForStreaming` method has a lot of statements, so it's better to remove one of them. ## How was this patch tested? Existing unit tests in `StreamingQueryManagerSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark isUnsupportedOperationCheckEnabled Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23251 commit b1e71ee7a723d63f1cf3c0754f2372eb185439d3 Author: liuxian Date: 2018-12-07T03:08:26Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org