[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355300447 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: ok i will raise jira for list jars and files ,will create separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355299084 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: ok, i will raise one jira for this and will raise separate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355299084 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: ok, i will raise one jira for this and will raise separate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355298350 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu // Invalid jar path will only print the error log, will not add to file server. sc.addJar("dummy.jar") - sc.addJar("") Review comment: We shouldn't change behavior. Can we try-catch the exception and log? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355298501 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: If it's hard to fix list jars, we can do it in another PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #26797: [SPARK-30166][SQL] Eliminate compilation warnings in JSONOptions
MaxGekk commented on issue #26797: [SPARK-30166][SQL] Eliminate compilation warnings in JSONOptions URL: https://github.com/apache/spark/pull/26797#issuecomment-563105545 @srowen If such problem exists, maybe it makes sense to shade Jackson 2.10 and use the shaded version in JSON datasource? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355293740 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: 1. addJar is working fine even if jar file name contain space. 2. addFile is not working if file name contain space, need to correct it (i will update the code) 3. listJars() function issue : ``` scala> sc.listJars() res2: Seq[String] = Vector(spark://11.242.181.153:50811/jars/c6%20test.jar) ``` i think here we should not display file name in encoded form. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355293740 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: 1. addJar is working fine even if jar file name contain space. 2. addFile is not working , need to correct it (i will update the code) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355293740 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: 1. addJar is working fine even if jar file name contain space. 2. addFile is not working , need to correct it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563099026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115015/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563099021 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563099021 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563099026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115015/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563080285 **[Test build #115015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115015/testReport)** for PR 26803 at commit [`3807027`](https://github.com/apache/spark/commit/38070271e7ebd04e4e43fcb7c0d3175eb2efa5fe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563098688 **[Test build #115015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115015/testReport)** for PR 26803 at commit [`3807027`](https://github.com/apache/spark/commit/38070271e7ebd04e4e43fcb7c0d3175eb2efa5fe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355291295 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu // Invalid jar path will only print the error log, will not add to file server. sc.addJar("dummy.jar") - sc.addJar("") Review comment: yes , we are creating path `val uri = new Path(path).toUri` to get schema , if we will pass empty string to create path, then it will get exception ``` Can not create a Path from an empty string java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:134) at org.apache.spark.SparkContext.addJar(SparkContext.scala:1880) ``` because of this i have remove this code `sc.addJar("")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355291295 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu // Invalid jar path will only print the error log, will not add to file server. sc.addJar("dummy.jar") - sc.addJar("") Review comment: yes , we are creating path `val uri = new Path(path).toUri` to get schema , if we will pass empty string to create path, then it will throw exception ``` Can not create a Path from an empty string java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:134) at org.apache.spark.SparkContext.addJar(SparkContext.scala:1880) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355289542 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: ok , i will check and update you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer
AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer URL: https://github.com/apache/spark/pull/26766#issuecomment-563094627 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer
AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer URL: https://github.com/apache/spark/pull/26766#issuecomment-563094631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115005/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer
AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer URL: https://github.com/apache/spark/pull/26766#issuecomment-563094631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115005/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types
AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#issuecomment-563094374 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19837/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types
AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#issuecomment-563094359 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563094031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115016/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer
AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer URL: https://github.com/apache/spark/pull/26766#issuecomment-563094627 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
SparkQA removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563093432 **[Test build #115016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115016/testReport)** for PR 26804 at commit [`4d12d7f`](https://github.com/apache/spark/commit/4d12d7f3ea2aaff09944d8872791f1321c03bfc9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types
AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#issuecomment-563094374 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19837/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer
SparkQA removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer URL: https://github.com/apache/spark/pull/26766#issuecomment-563044570 **[Test build #115005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115005/testReport)** for PR 26766 at commit [`cf28837`](https://github.com/apache/spark/commit/cf288372346e1b1b7b4e8923361333d2b8af8104). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types
AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#issuecomment-563094359 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563094027 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563094027 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer
SparkQA commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer URL: https://github.com/apache/spark/pull/26766#issuecomment-563094080 **[Test build #115005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115005/testReport)** for PR 26766 at commit [`cf28837`](https://github.com/apache/spark/commit/cf288372346e1b1b7b4e8923361333d2b8af8104). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563094031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115016/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563094011 **[Test build #115016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115016/testReport)** for PR 26804 at commit [`4d12d7f`](https://github.com/apache/spark/commit/4d12d7f3ea2aaff09944d8872791f1321c03bfc9). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563090614 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19836/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563090605 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563093432 **[Test build #115016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115016/testReport)** for PR 26804 at commit [`4d12d7f`](https://github.com/apache/spark/commit/4d12d7f3ea2aaff09944d8872791f1321c03bfc9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types
SparkQA commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#issuecomment-563093431 **[Test build #115017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115017/testReport)** for PR 26699 at commit [`5f809b7`](https://github.com/apache/spark/commit/5f809b72db0d0b2c18057e90bd35ab688716120d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#discussion_r355286904 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala ## @@ -109,6 +115,14 @@ object GenerateColumnAccessor extends CodeGenerator[Seq[DataType], ColumnarItera rowWriter.write($index, (Decimal) null, $p, $s); } """ +case CalendarIntervalType => + // For CalendarInterval, it should have 16 bytes to store months(Int), days(Int), + // microseconds(Long) for future update even it's null now. + s""" +if (mutableRow.isNullAt($index)) { + rowWriter.write($index, (CalendarInterval) null); +} + """ Review comment: After a second thought, I guess this patch is needless for CalendarInterval which is a `DirectCopyColumnType` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#discussion_r355286263 ## File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history + +import java.io.IOException +import java.net.URI + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.{FileStatus, FileSystem, Path} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN +import org.apache.spark.scheduler._ + +/** + * This class compacts the old event log files into one compact file, via two phases reading: + * + * 1) Initialize available [[EventFilterBuilder]] instances, and replay the old event log files with + * builders, so that these builders can gather the information to create [[EventFilter]] instances. + * 2) Initialize [[EventFilter]] instances from [[EventFilterBuilder]] instances, and replay the + * old event log files with filters. Rewrite the content to the compact file if the filters decide + * to filter in. + * + * This class assumes caller will provide the sorted list of files which are sorted by the index of + * event log file - caller should keep in mind that this class doesn't care about the semantic of + * ordering. + * + * When compacting the files, the range of compaction for given file list is determined as: + * (rightmost compact file ~ the file where there're `maxFilesToRetain` files on the right side) + * + * If there's no compact file in the list, it starts from the first file. If there're not enough + * files after rightmost compact file, compaction will be skipped. + */ +class EventLogFileCompactor( +sparkConf: SparkConf, +hadoopConf: Configuration, +fs: FileSystem) extends Logging { + + private val maxFilesToRetain: Int = sparkConf.get(EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN) + + def compact(eventLogFiles: Seq[FileStatus]): Seq[FileStatus] = { +if (eventLogFiles.length <= maxFilesToRetain) { + return eventLogFiles +} + +if (EventLogFileWriter.isCompacted(eventLogFiles.last.getPath)) { + return Seq(eventLogFiles.last) +} + +val (filesToCompact, filesToRetain) = findFilesToCompact(eventLogFiles) +if (filesToCompact.isEmpty) { + filesToRetain +} else { + val builders = EventFilterBuilder.initializeBuilders(fs, filesToCompact.map(_.getPath)) + + val rewriter = new FilteredEventLogFileRewriter(sparkConf, hadoopConf, fs, +builders.map(_.createFilter())) + val compactedPath = rewriter.rewrite(filesToCompact) + + cleanupCompactedFiles(filesToCompact) + + fs.getFileStatus(new Path(compactedPath)) :: filesToRetain.toList +} + } + + private def cleanupCompactedFiles(files: Seq[FileStatus]): Unit = { +files.foreach { file => + var deleted = false + try { +deleted = fs.delete(file.getPath, true) + } catch { +case _: IOException => + } + if (!deleted) { +logWarning(s"Failed to remove ${file.getPath} / skip removing.") + } +} + } + + private def findFilesToCompact( + eventLogFiles: Seq[FileStatus]): (Seq[FileStatus], Seq[FileStatus]) = { +val files = RollingEventLogFilesFileReader.dropBeforeLastCompactFile(eventLogFiles) +if (files.length > maxFilesToRetain) { + (files.dropRight(maxFilesToRetain), files.takeRight(maxFilesToRetain)) +} else { + (Seq.empty, files) +} + } +} + +/** + * This class rewrites the event log files into one compact file: the compact file will only + * contain the events which pass the filters. Events will be filtered out only when all filters + * decide to filter out the event or don't mind about the event. Otherwise, the original line for + * the event is written to the compact file as it is. + */ +class FilteredEventLogFileRewriter( +sparkConf: SparkConf, +hadoopConf: Configuration, +override val fs: FileSystem, +override val filters:
[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563090614 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19836/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-563090605 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563090225 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115013/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563090225 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115013/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563090220 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563090220 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #26779: [SPARK-30150][SQL]AddFile Command do not accept quoted path
iRakson commented on a change in pull request #26779: [SPARK-30150][SQL]AddFile Command do not accept quoted path URL: https://github.com/apache/spark/pull/26779#discussion_r355285095 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -357,7 +357,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { * }}} */ override def visitManageResource(ctx: ManageResourceContext): LogicalPlan = withOrigin(ctx) { -val mayebePaths = remainder(ctx.identifier).trim +val mayebePaths = pathWrapper(remainder(ctx.identifier).trim) Review comment: I changed the grammar to take string literal as well for ADD/LIST. Now all of `add file abc.txt`, `add file 'abc.txt'` and `add file "abc.txt"` are supported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563089907 **[Test build #115013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115013/testReport)** for PR 26803 at commit [`9ddad92`](https://github.com/apache/spark/commit/9ddad925d443f209dc59b86fe1c7695b254f5baf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
wangyum opened a new pull request #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804 ### What changes were proposed in this pull request? This PR upgrade parquet to 1.11.0. Note that: I just verify that all tests passed now. I will do a benchmark later. ### Why are the changes needed? ### Does this PR introduce any user-facing change? Unknown ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563072181 **[Test build #115013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115013/testReport)** for PR 26803 at commit [`9ddad92`](https://github.com/apache/spark/commit/9ddad925d443f209dc59b86fe1c7695b254f5baf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#discussion_r355284671 ## File path: core/src/main/scala/org/apache/spark/deploy/history/EventFilter.scala ## @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history + +import java.util.ServiceLoader + +import scala.collection.JavaConverters._ +import scala.io.{Codec, Source} +import scala.util.control.NonFatal + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.json4s.jackson.JsonMethods.parse + +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.apache.spark.util.{JsonProtocol, Utils} + +/** + * EventFilterBuilder provides the interface to gather the information from events being received + * by [[SparkListenerInterface]], and create a new [[EventFilter]] instance which leverages + * information gathered to decide whether the event should be filtered or not. + */ +private[spark] trait EventFilterBuilder extends SparkListenerInterface { + def createFilter(): EventFilter +} + +object EventFilterBuilder { + /** + * Loads all available EventFilterBuilders in classloader via ServiceLoader, and initializes + * them via replaying events in given files. + */ + def initializeBuilders(fs: FileSystem, files: Seq[Path]): Seq[EventFilterBuilder] = { +val bus = new ReplayListenerBus() + +val builders = ServiceLoader.load(classOf[EventFilterBuilder], + Utils.getContextOrSparkClassLoader).asScala.toSeq +builders.foreach(bus.addListener) + +files.foreach { log => + Utils.tryWithResource(EventLogFileReader.openEventLog(log, fs)) { in => +bus.replay(in, log.getName) + } +} + +builders + } +} + +/** + * [[EventFilter]] decides whether the given event should be filtered in, or filtered out when + * compacting event log files. + * + * The meaning of return values of each filterXXX method are following: + * - Some(true): Filter in this event. + * - Some(false): Filter out this event. + * - None: Don't mind about this event. No problem even other filters decide to filter out. + * + * Please refer [[FilteredEventLogFileRewriter]] for more details on how the filter will be used. + */ +private[spark] trait EventFilter { Review comment: > The code in BasicEventFilter.filterStageSubmitted seems to be "return true if the given event references a stage that is part of a live job", but "filter" to me means "should I remove whatever parameter I'm passing to to this method", so they seem contradictory. I agree "filter" is interpreted as both opposite sides; when we say about "spam filter", it means "filter out", where we use `filter` in Scala it means "filter in". Here I use "filter" as "filter in" to be consistent with Scala, but I still agree the word brings confusion. > Java's predicate classes usually use accept which is clear in its intent. "accept" vs "reject" is much clearer. Thanks for the great suggestion! Will address. > Also, it might be cleaner to have this trait just have one method: > `def accept(event: SparkListenerEvent): Boolean` > Let the implementation match on the event type if needed Yeah that sounds OK - I thought we'd be better to provide methods for all available events so that implementations don't forget to deal with types of events, but anyway implementations should know which types of events they should care, so that should be OK. > It could even be a PartialFunction[SparkListenerEvent, Boolean] so you avoid the Option (no match = don't care). > (I see you use inheritance later in the SQL code; partial functions make that a little more interesting, but still doable.) I'm sorry I'm not clear about your suggestion. Do you suggest me to use `lift` to deal with Option in caller side, or do some composing? At least two partial functions (core side implementation and sql side implementation) can't be composed by `orElse` - it may work for "don't care" but we don't filter out event unless "ALL" filters reject it. We'll have to
[GitHub] [spark] cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563089272 OK let's discuss it later. For now let's focus on `ALTER TABLE/NAMESPACE SET OWNER`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355283949 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } + test("add jar when path contains spaces") { +withTempDir { dir => + val sep = File.separator + val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test space") + val tmpJar = File.createTempFile("test", ".jar", tmpDir) Review comment: can we also test if the jar file name contains space? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#discussion_r355283875 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu // Invalid jar path will only print the error log, will not add to file server. sc.addJar("dummy.jar") - sc.addJar("") Review comment: do we change behavior for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on issue #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces
07ARB commented on issue #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces URL: https://github.com/apache/spark/pull/26773#issuecomment-563088309 @cloud-fan and @srowen , please review (previous review comments i have fixed) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sandeep-katta commented on issue #26777: [SPARK-30134][SQL] Support DELETE JAR feature in SPARK
sandeep-katta commented on issue #26777: [SPARK-30134][SQL] Support DELETE JAR feature in SPARK URL: https://github.com/apache/spark/pull/26777#issuecomment-563086392 @srowen @cloud-fan could you guys please review this feature This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
cloud-fan commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#discussion_r355280785 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala ## @@ -133,7 +133,11 @@ private[sql] trait LookupCatalog extends Logging { // For example, if the name of a custom catalog is the same with `GLOBAL_TEMP_DATABASE`, // this custom catalog can't be accessed. if (nameParts.head.equalsIgnoreCase(globalTempDB)) { Review comment: ah you are right. For a single-part name like `abc`, it may mean a table `abc` under default catalog, or mean catalog `abc`. I think it's better know if it's allowed to return a single catalog. e.g. it's allowed in `SHOW TABLES`, but not `DESCRIBE TABLE`. Shall we separate `CatalogAndIdentifierParts` into `CatalogAndTable`, `CatalogAndNamespace`? For `CatalogAndTable`, it's not allowed to return empty array as table name, so we shouldn't resolve single-part name to a catalog. For `CatalogAndNamespace`, root namespace's name is an empty array, so it's allowed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#discussion_r355278111 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala ## @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends ColumnStats { Array[Any](null, null, nullCount, count, sizeInBytes) } +private[columnar] final class IntervalColumnStats extends ColumnStats { Review comment: yea, I'd like to make a followup after this pull request, thanks for notice me this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#discussion_r355278111 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala ## @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends ColumnStats { Array[Any](null, null, nullCount, count, sizeInBytes) } +private[columnar] final class IntervalColumnStats extends ColumnStats { Review comment: yea, I'd like to make a followup after this pull request, thanks for noticing me this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563080716 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563080722 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19835/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563080716 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563080722 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19835/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563080285 **[Test build #115015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115015/testReport)** for PR 26803 at commit [`3807027`](https://github.com/apache/spark/commit/38070271e7ebd04e4e43fcb7c0d3175eb2efa5fe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#discussion_r355276845 ## File path: sql/core/src/test/resources/sql-tests/inputs/interval.sql ## @@ -264,3 +264,6 @@ select interval 'interval \t 1\tday'; select interval 'interval\t1\tday'; select interval '1\t' day; select interval '1 ' day; + +cache table interval_columnar as select i, cast(v as interval) from VALUES(1, '1 seconds'), (1, '2 seconds'), (2, NULL), (2, NULL) t(i,v); +select * from interval_columnar; Review comment: maybe should be moved to `CachedTableSuite ` instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types
maropu commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#discussion_r355276851 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala ## @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends ColumnStats { Array[Any](null, null, nullCount, count, sizeInBytes) } +private[columnar] final class IntervalColumnStats extends ColumnStats { Review comment: Ah, I see. Thanks for the check. If you have time, do you wanna make a pr for that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#discussion_r355276430 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala ## @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends ColumnStats { Array[Any](null, null, nullCount, count, sizeInBytes) } +private[columnar] final class IntervalColumnStats extends ColumnStats { Review comment: yes, these can be set to true for intervals This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#discussion_r355275925 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala ## @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends ColumnStats { Array[Any](null, null, nullCount, count, sizeInBytes) } +private[columnar] final class IntervalColumnStats extends ColumnStats { + protected var upper: CalendarInterval = +new CalendarInterval(Int.MinValue, Int.MinValue, Long.MinValue) Review comment: this is nice This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563077996 E.g. we currently don't support multi tenancy in Spark's ThriftServer, but if can use such a flexible implementation, we may can implement `defaultOwner = session.getUser` to get the actual user from client side, and do authentication/authorization for it. In the current implementation of Spark's ThriftServer, the `sparkUser` can only works as a super user, then at least we can achieve metadata security. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis Exception in Spark-Shell
cloud-fan commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis Exception in Spark-Shell URL: https://github.com/apache/spark/pull/26791#issuecomment-563076392 what's wrong with the stack trace? If people are using Scala APIs, don't they expect to see detailed stack trace? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563075573 hmm, isn't it better to let Spark decide the default owner instead of catalog implementations? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 07ARB commented on issue #26795: [SPARK-30145][CORE]sparkContext.addJar fails when file path contains …
07ARB commented on issue #26795: [SPARK-30145][CORE]sparkContext.addJar fails when file path contains … URL: https://github.com/apache/spark/pull/26795#issuecomment-563075663 @srowen, please review https://github.com/apache/spark/pull/26773 (both PR i have combine) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563073181 For DataSource API developers: Exposes defaultOwner/OwnerType in `SupportsNamespaces` to defined the catalog default ownership. By default, we use our `sparkUser` as default owner and `USER` as default ownerType. This should be considered as how the developers define `authentication`. A bit similar with Hive's `HiveAuthenticationProvider `. Currently, we use this only in `CreateNamespaceExec` to define the default ownership. When we support `authorization`, these can be used in privilege checking for queries, commands etc.. For end users: We don't allow user to set ownership or location or comment in create syntax, the ownership should be inherited from the catalog impl, and the comment and location should from the specific clauses. Also modifying these properties need specific commands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563072506 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563072513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19833/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-563072554 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-563072558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19834/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-563072595 @cloud-fan updated the comments online and offline. Please help me review again. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#discussion_r355270674 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.adaptive + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.duration.Duration + +import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkEnv} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, UnknownPartitioning} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.joins.SortMergeJoinExec +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils + +case class OptimizeSkewedPartitions(conf: SQLConf) extends Rule[SparkPlan] { + + private val supportedJoinTypes = +Inner :: Cross :: LeftSemi :: LeftAnti :: LeftOuter :: RightOuter :: Nil + + /** + * A partition is considered as a skewed partition if its size is larger than the median + * partition size * spark.sql.adaptive.skewedPartitionFactor and also larger than + * spark.sql.adaptive.skewedPartitionSizeThreshold. + */ + private def isSkewed( + stats: MapOutputStatistics, + partitionId: Int, + medianSize: Long): Boolean = { +val size = stats.bytesByPartitionId(partitionId) +size > medianSize * conf.adaptiveSkewedFactor && + size > conf.adaptiveSkewedSizeThreshold + } + + private def medianSize(stats: MapOutputStatistics): Long = { +val bytesLen = stats.bytesByPartitionId.length +val bytes = stats.bytesByPartitionId.sorted +if (bytes(bytesLen / 2) > 0) bytes(bytesLen / 2) else 1 + } + + /* + * Get all the map data size for specific reduce partitionId. + */ + def getMapSizeForSpecificPartition(partitionId: Int, shuffleId: Int): Array[Long] = { +val mapOutputTracker = SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster] +mapOutputTracker.shuffleStatuses.get(shuffleId). + get.mapStatuses.map{_.getSizeForBlock(partitionId)} + } + + /* + * Split the mappers based on the map size of specific skewed reduce partitionId. + */ + def splitMappersBasedDataSize(mapPartitionSize: Array[Long], numMappers: Int): Array[Int] = { +val advisoryTargetPostShuffleInputSize = conf.targetPostShuffleInputSize +val partitionStartIndices = ArrayBuffer[Int]() +var i = 0 +var postMapPartitionSize: Long = mapPartitionSize(i) +partitionStartIndices += i +while (i < numMappers && i + 1 < numMappers) { + val nextIndex = if (i + 1 < numMappers) { Review comment: @manuzhang Thanks for your review. Offline discussion with wenchen, we decided to remove this method. And split the skewed partition with the number of mappers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-563072558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19834/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-563072554 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563072513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19833/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563072506 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-563072198 **[Test build #115014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115014/testReport)** for PR 26434 at commit [`d339287`](https://github.com/apache/spark/commit/d339287689f674227fe91a1291c8a24bc52f4676). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-563072181 **[Test build #115013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115013/testReport)** for PR 26803 at commit [`9ddad92`](https://github.com/apache/spark/commit/9ddad925d443f209dc59b86fe1c7695b254f5baf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis Exception in Spark-Shell
amanomer commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis Exception in Spark-Shell URL: https://github.com/apache/spark/pull/26791#issuecomment-563072000 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or()
amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or() URL: https://github.com/apache/spark/pull/26712#issuecomment-563071621 Thanks all for reviewing and merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26712: [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or()
amanomer commented on issue #26712: [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or() URL: https://github.com/apache/spark/pull/26712#issuecomment-563071621 Thanks @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563070737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19831/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563070731 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-563070773 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19832/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-563070767 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request #26803: [SPARK-30178][ML] RobustScaler support bigger numFeatures
zhengruifeng opened a new pull request #26803: [SPARK-30178][ML] RobustScaler support bigger numFeatures URL: https://github.com/apache/spark/pull/26803 ### What changes were proposed in this pull request? compute the medians/ranges more distributedly ### Why are the changes needed? In Spark-Shell with default params, I processed a dataset with numFeatures=69,200, and existing impl fail due to OOM. After this PR, it will sucessfuly fit the model. ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-563070767 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563070731 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-563070773 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19832/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563070737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19831/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
SparkQA commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563070444 **[Test build #115012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115012/testReport)** for PR 26775 at commit [`dd1d52b`](https://github.com/apache/spark/commit/dd1d52bf3714fcfd37965b4fc54d1418b9e8be71). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
imback82 commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#discussion_r355269076 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala ## @@ -133,7 +133,11 @@ private[sql] trait LookupCatalog extends Logging { // For example, if the name of a custom catalog is the same with `GLOBAL_TEMP_DATABASE`, // this custom catalog can't be accessed. if (nameParts.head.equalsIgnoreCase(globalTempDB)) { Review comment: > ``` > if (nameParts.length == 1) { > Some((currentCatalog, currentNamespace ++ nameParts)) > } > ``` Look like we cannot use the table look up logic since we have a command like `SHOW TABLES FROM testcat` where `testcat` needs to be resolved as a catalog. Basically, we have a conflict when one part is given: 1. It needs to be resolved as a catalog: ``` CREATE TABLE testcat.table (id bigint, data string) USING foo SHOW TABLES FROM testcat ``` 2. It needs to be resolved as a non-catalog multi parts: ``` CREATE TABLE testcat.testcat (id bigint, data string) USING foo USE testcat DESCRIBE TABLE testcat ``` One way to resolve the conflict is to check the one part name against the current catalog. If they are the same, the one part name is used as `Identifier` (case 2), otherwise, it is used as a catalog (case 1). Does this sound reasonable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#issuecomment-563068711 For data source API, we do want to use properties to set owner/location/comment to make the API flexible. The problem is about end-user API: how do we expect end-users to set them? What I expect: - for CREATE TABLE/NAMESPACE, location should be set by LOCATION clause, comment should be set by COMMENT or properties. I'm not sure about owner: does hive support setting owner during CREATE TABLE/DATABASE - for ALTER TABLE/NAMESPACE, should be the same. location/owner should be set by special syntax instead of via properties. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org