[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355300447
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   ok i will raise jira for list jars and files ,will create separate PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355299084
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   ok, i will raise one jira for this and will raise separate. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355299084
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   ok, i will raise one jira for this and will raise separate. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
cloud-fan commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355298350
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 
   // Invalid jar path will only print the error log, will not add to file 
server.
   sc.addJar("dummy.jar")
-  sc.addJar("")
 
 Review comment:
   We shouldn't change behavior. Can we try-catch the exception and log?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
cloud-fan commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355298501
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   If it's hard to fix list jars, we can do it in another PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26797: [SPARK-30166][SQL] Eliminate compilation warnings in JSONOptions

2019-12-08 Thread GitBox
MaxGekk commented on issue #26797: [SPARK-30166][SQL] Eliminate compilation 
warnings in JSONOptions
URL: https://github.com/apache/spark/pull/26797#issuecomment-563105545
 
 
   @srowen If such problem exists, maybe it makes sense to shade Jackson 2.10 
and use the shaded version in JSON datasource?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355293740
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   1. addJar is working fine even if jar file name contain space.
   2. addFile is not working if file name contain space, need to correct it (i 
will update the code)
   3. listJars() function issue :
   ```
   scala> sc.listJars()
   res2: Seq[String] = Vector(spark://11.242.181.153:50811/jars/c6%20test.jar)
   ```
   i think here we should not display file name in encoded form.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355293740
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   1. addJar is working fine even if jar file name contain space.
   2. addFile is not working , need to correct it (i will update the code)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355293740
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   1. addJar is working fine even if jar file name contain space.
   2. addFile is not working , need to correct it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563099026
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115015/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563099021
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563099021
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563099026
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115015/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563080285
 
 
   **[Test build #115015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115015/testReport)**
 for PR 26803 at commit 
[`3807027`](https://github.com/apache/spark/commit/38070271e7ebd04e4e43fcb7c0d3175eb2efa5fe).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large 
numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563098688
 
 
   **[Test build #115015 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115015/testReport)**
 for PR 26803 at commit 
[`3807027`](https://github.com/apache/spark/commit/38070271e7ebd04e4e43fcb7c0d3175eb2efa5fe).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355291295
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 
   // Invalid jar path will only print the error log, will not add to file 
server.
   sc.addJar("dummy.jar")
-  sc.addJar("")
 
 Review comment:
   yes , we are creating path `val uri = new Path(path).toUri` to get schema , 
if we will pass empty string to create path, then it will get exception 
   
   ```
   Can not create a Path from an empty string
   java.lang.IllegalArgumentException: Can not create a Path from an empty 
string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.(Path.java:134)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1880)
   ```
   because of this i have remove this code `sc.addJar("")`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355291295
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 
   // Invalid jar path will only print the error log, will not add to file 
server.
   sc.addJar("dummy.jar")
-  sc.addJar("")
 
 Review comment:
   yes , we are creating path `val uri = new Path(path).toUri` to get schema , 
if we will pass empty string to create path, then it will throw exception
   
   ```
   Can not create a Path from an empty string
   java.lang.IllegalArgumentException: Can not create a Path from an empty 
string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.(Path.java:134)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1880)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355289542
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   ok , i will check and update you


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate 
configuration key of max iterations for analyzer and optimizer
URL: https://github.com/apache/spark/pull/26766#issuecomment-563094627
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26766: [SPARK-30138][SQL]Separate 
configuration key of max iterations for analyzer and optimizer
URL: https://github.com/apache/spark/pull/26766#issuecomment-563094631
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115005/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate 
configuration key of max iterations for analyzer and optimizer
URL: https://github.com/apache/spark/pull/26766#issuecomment-563094631
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115005/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar 
execution support for interval types
URL: https://github.com/apache/spark/pull/26699#issuecomment-563094374
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19837/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26699: [SPARK-30066][SQL] Columnar 
execution support for interval types
URL: https://github.com/apache/spark/pull/26699#issuecomment-563094359
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] 
Upgrade parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563094031
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115016/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26766: [SPARK-30138][SQL]Separate 
configuration key of max iterations for analyzer and optimizer
URL: https://github.com/apache/spark/pull/26766#issuecomment-563094627
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
SparkQA removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] 
Upgrade parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563093432
 
 
   **[Test build #115016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115016/testReport)**
 for PR 26804 at commit 
[`4d12d7f`](https://github.com/apache/spark/commit/4d12d7f3ea2aaff09944d8872791f1321c03bfc9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution 
support for interval types
URL: https://github.com/apache/spark/pull/26699#issuecomment-563094374
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19837/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer

2019-12-08 Thread GitBox
SparkQA removed a comment on issue #26766: [SPARK-30138][SQL]Separate 
configuration key of max iterations for analyzer and optimizer
URL: https://github.com/apache/spark/pull/26766#issuecomment-563044570
 
 
   **[Test build #115005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115005/testReport)**
 for PR 26766 at commit 
[`cf28837`](https://github.com/apache/spark/commit/cf288372346e1b1b7b4e8923361333d2b8af8104).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26699: [SPARK-30066][SQL] Columnar execution 
support for interval types
URL: https://github.com/apache/spark/pull/26699#issuecomment-563094359
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] 
Upgrade parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563094027
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade 
parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563094027
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26766: [SPARK-30138][SQL]Separate configuration key of max iterations for analyzer and optimizer

2019-12-08 Thread GitBox
SparkQA commented on issue #26766: [SPARK-30138][SQL]Separate configuration key 
of max iterations for analyzer and optimizer
URL: https://github.com/apache/spark/pull/26766#issuecomment-563094080
 
 
   **[Test build #115005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115005/testReport)**
 for PR 26766 at commit 
[`cf28837`](https://github.com/apache/spark/commit/cf288372346e1b1b7b4e8923361333d2b8af8104).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade 
parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563094031
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115016/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade 
parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563094011
 
 
   **[Test build #115016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115016/testReport)**
 for PR 26804 at commit 
[`4d12d7f`](https://github.com/apache/spark/commit/4d12d7f3ea2aaff09944d8872791f1321c03bfc9).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] 
Upgrade parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563090614
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19836/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26804: [WIP][SPARK-26346][BUILD][SQL] 
Upgrade parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563090605
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
SparkQA commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade 
parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563093432
 
 
   **[Test build #115016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115016/testReport)**
 for PR 26804 at commit 
[`4d12d7f`](https://github.com/apache/spark/commit/4d12d7f3ea2aaff09944d8872791f1321c03bfc9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
SparkQA commented on issue #26699: [SPARK-30066][SQL] Columnar execution 
support for interval types
URL: https://github.com/apache/spark/pull/26699#issuecomment-563093431
 
 
   **[Test build #115017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115017/testReport)**
 for PR 26699 at commit 
[`5f809b7`](https://github.com/apache/spark/commit/5f809b72db0d0b2c18057e90bd35ab688716120d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] 
Columnar execution support for interval types
URL: https://github.com/apache/spark/pull/26699#discussion_r355286904
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala
 ##
 @@ -109,6 +115,14 @@ object GenerateColumnAccessor extends 
CodeGenerator[Seq[DataType], ColumnarItera
   rowWriter.write($index, (Decimal) null, $p, $s);
 }
"""
+case CalendarIntervalType =>
+  // For CalendarInterval, it should have 16 bytes to store 
months(Int), days(Int),
+  // microseconds(Long) for future update even it's null now.
+  s"""
+if (mutableRow.isNullAt($index)) {
+  rowWriter.write($index, (CalendarInterval) null);
+}
+   """
 
 Review comment:
   After a second thought, I guess this patch is needless for CalendarInterval 
which is a `DirectCopyColumnType` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-08 Thread GitBox
HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] 
Compact old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#discussion_r355286263
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
 ##
 @@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.IOException
+import java.net.URI
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN
+import org.apache.spark.scheduler._
+
+/**
+ * This class compacts the old event log files into one compact file, via two 
phases reading:
+ *
+ * 1) Initialize available [[EventFilterBuilder]] instances, and replay the 
old event log files with
+ * builders, so that these builders can gather the information to create 
[[EventFilter]] instances.
+ * 2) Initialize [[EventFilter]] instances from [[EventFilterBuilder]] 
instances, and replay the
+ * old event log files with filters. Rewrite the content to the compact file 
if the filters decide
+ * to filter in.
+ *
+ * This class assumes caller will provide the sorted list of files which are 
sorted by the index of
+ * event log file - caller should keep in mind that this class doesn't care 
about the semantic of
+ * ordering.
+ *
+ * When compacting the files, the range of compaction for given file list is 
determined as:
+ * (rightmost compact file ~ the file where there're `maxFilesToRetain` files 
on the right side)
+ *
+ * If there's no compact file in the list, it starts from the first file. If 
there're not enough
+ * files after rightmost compact file, compaction will be skipped.
+ */
+class EventLogFileCompactor(
+sparkConf: SparkConf,
+hadoopConf: Configuration,
+fs: FileSystem) extends Logging {
+
+  private val maxFilesToRetain: Int = 
sparkConf.get(EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN)
+
+  def compact(eventLogFiles: Seq[FileStatus]): Seq[FileStatus] = {
+if (eventLogFiles.length <= maxFilesToRetain) {
+  return eventLogFiles
+}
+
+if (EventLogFileWriter.isCompacted(eventLogFiles.last.getPath)) {
+  return Seq(eventLogFiles.last)
+}
+
+val (filesToCompact, filesToRetain) = findFilesToCompact(eventLogFiles)
+if (filesToCompact.isEmpty) {
+  filesToRetain
+} else {
+  val builders = EventFilterBuilder.initializeBuilders(fs, 
filesToCompact.map(_.getPath))
+
+  val rewriter = new FilteredEventLogFileRewriter(sparkConf, hadoopConf, 
fs,
+builders.map(_.createFilter()))
+  val compactedPath = rewriter.rewrite(filesToCompact)
+
+  cleanupCompactedFiles(filesToCompact)
+
+  fs.getFileStatus(new Path(compactedPath)) :: filesToRetain.toList
+}
+  }
+
+  private def cleanupCompactedFiles(files: Seq[FileStatus]): Unit = {
+files.foreach { file =>
+  var deleted = false
+  try {
+deleted = fs.delete(file.getPath, true)
+  } catch {
+case _: IOException =>
+  }
+  if (!deleted) {
+logWarning(s"Failed to remove ${file.getPath} / skip removing.")
+  }
+}
+  }
+
+  private def findFilesToCompact(
+  eventLogFiles: Seq[FileStatus]): (Seq[FileStatus], Seq[FileStatus]) = {
+val files = 
RollingEventLogFilesFileReader.dropBeforeLastCompactFile(eventLogFiles)
+if (files.length > maxFilesToRetain) {
+  (files.dropRight(maxFilesToRetain), files.takeRight(maxFilesToRetain))
+} else {
+  (Seq.empty, files)
+}
+  }
+}
+
+/**
+ * This class rewrites the event log files into one compact file: the compact 
file will only
+ * contain the events which pass the filters. Events will be filtered out only 
when all filters
+ * decide to filter out the event or don't mind about the event. Otherwise, 
the original line for
+ * the event is written to the compact file as it is.
+ */
+class FilteredEventLogFileRewriter(
+sparkConf: SparkConf,
+hadoopConf: Configuration,
+override val fs: FileSystem,
+override val filters: 

[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade 
parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563090614
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19836/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade 
parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804#issuecomment-563090605
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563090225
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115013/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563090225
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115013/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563090220
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563090220
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] iRakson commented on a change in pull request #26779: [SPARK-30150][SQL]AddFile Command do not accept quoted path

2019-12-08 Thread GitBox
iRakson commented on a change in pull request #26779: [SPARK-30150][SQL]AddFile 
Command do not accept quoted path
URL: https://github.com/apache/spark/pull/26779#discussion_r355285095
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
 ##
 @@ -357,7 +357,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* }}}
*/
   override def visitManageResource(ctx: ManageResourceContext): LogicalPlan = 
withOrigin(ctx) {
-val mayebePaths = remainder(ctx.identifier).trim
+val mayebePaths = pathWrapper(remainder(ctx.identifier).trim)
 
 Review comment:
   I changed the grammar to take string literal as well for ADD/LIST. 
   
   Now all of  `add file abc.txt`, `add file 'abc.txt'` and `add file 
"abc.txt"`  are supported.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large 
numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563089907
 
 
   **[Test build #115013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115013/testReport)**
 for PR 26803 at commit 
[`9ddad92`](https://github.com/apache/spark/commit/9ddad925d443f209dc59b86fe1c7695b254f5baf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0

2019-12-08 Thread GitBox
wangyum opened a new pull request #26804: [WIP][SPARK-26346][BUILD][SQL] 
Upgrade parquet to 1.11.0
URL: https://github.com/apache/spark/pull/26804
 
 
   ### What changes were proposed in this pull request?
   
   This PR upgrade parquet to 1.11.0.
   
   Note that:
   I just verify that all tests passed now. I will do a benchmark later.
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce any user-facing change?
   Unknown
   
   
   ### How was this patch tested?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
SparkQA removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563072181
 
 
   **[Test build #115013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115013/testReport)**
 for PR 26803 at commit 
[`9ddad92`](https://github.com/apache/spark/commit/9ddad925d443f209dc59b86fe1c7695b254f5baf).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-08 Thread GitBox
HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] 
Compact old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#discussion_r355284671
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/history/EventFilter.scala
 ##
 @@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.ServiceLoader
+
+import scala.collection.JavaConverters._
+import scala.io.{Codec, Source}
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.json4s.jackson.JsonMethods.parse
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler._
+import org.apache.spark.util.{JsonProtocol, Utils}
+
+/**
+ * EventFilterBuilder provides the interface to gather the information from 
events being received
+ * by [[SparkListenerInterface]], and create a new [[EventFilter]] instance 
which leverages
+ * information gathered to decide whether the event should be filtered or not.
+ */
+private[spark] trait EventFilterBuilder extends SparkListenerInterface {
+  def createFilter(): EventFilter
+}
+
+object EventFilterBuilder {
+  /**
+   * Loads all available EventFilterBuilders in classloader via ServiceLoader, 
and initializes
+   * them via replaying events in given files.
+   */
+  def initializeBuilders(fs: FileSystem, files: Seq[Path]): 
Seq[EventFilterBuilder] = {
+val bus = new ReplayListenerBus()
+
+val builders = ServiceLoader.load(classOf[EventFilterBuilder],
+  Utils.getContextOrSparkClassLoader).asScala.toSeq
+builders.foreach(bus.addListener)
+
+files.foreach { log =>
+  Utils.tryWithResource(EventLogFileReader.openEventLog(log, fs)) { in =>
+bus.replay(in, log.getName)
+  }
+}
+
+builders
+  }
+}
+
+/**
+ * [[EventFilter]] decides whether the given event should be filtered in, or 
filtered out when
+ * compacting event log files.
+ *
+ * The meaning of return values of each filterXXX method are following:
+ * - Some(true): Filter in this event.
+ * - Some(false): Filter out this event.
+ * - None: Don't mind about this event. No problem even other filters decide 
to filter out.
+ *
+ * Please refer [[FilteredEventLogFileRewriter]] for more details on how the 
filter will be used.
+ */
+private[spark] trait EventFilter {
 
 Review comment:
   > The code in BasicEventFilter.filterStageSubmitted seems to be "return true 
if the given event references a stage that is part of a live job", but "filter" 
to me means "should I remove whatever parameter I'm passing to to this method", 
so they seem contradictory.
   
   I agree "filter" is interpreted as both opposite sides; when we say about 
"spam filter", it means "filter out", where we use `filter` in Scala it means 
"filter in". Here I use "filter" as "filter in" to be consistent with Scala, 
but I still agree the word brings confusion.
   
   > Java's predicate classes usually use accept which is clear in its intent.
   
   "accept" vs "reject" is much clearer. Thanks for the great suggestion! Will 
address.
   
   > Also, it might be cleaner to have this trait just have one method:
   > `def accept(event: SparkListenerEvent): Boolean`
   > Let the implementation match on the event type if needed
   
   Yeah that sounds OK - I thought we'd be better to provide methods for all 
available events so that implementations don't forget to deal with types of 
events, but anyway implementations should know which types of events they 
should care, so that should be OK.
   
   > It could even be a PartialFunction[SparkListenerEvent, Boolean] so you 
avoid the Option (no match = don't care).
   > (I see you use inheritance later in the SQL code; partial functions make 
that a little more interesting, but still doable.)
   
   I'm sorry I'm not clear about your suggestion. Do you suggest me to use 
`lift` to deal with Option in caller side, or do some composing?
   
   At least two partial functions (core side implementation and sql side 
implementation) can't be composed by `orElse` - it may work for "don't care" 
but we don't filter out event unless "ALL" filters reject it. We'll have to 

[GitHub] [spark] cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE 
SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563089272
 
 
   OK let's discuss it later. For now let's focus on `ALTER TABLE/NAMESPACE SET 
OWNER`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
cloud-fan commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355283949
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -294,6 +322,20 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("add jar when path contains spaces") {
+withTempDir { dir =>
+  val sep = File.separator
+  val tmpDir = Utils.createTempDir(dir.getAbsolutePath + sep + "test 
space")
+  val tmpJar = File.createTempFile("test", ".jar", tmpDir)
 
 Review comment:
   can we also test if the jar file name contains space?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
cloud-fan commented on a change in pull request #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#discussion_r355283875
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 ##
 @@ -303,7 +345,6 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 
   // Invalid jar path will only print the error log, will not add to file 
server.
   sc.addJar("dummy.jar")
-  sc.addJar("")
 
 Review comment:
   do we change behavior for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on issue #26773: [SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar fails when file path contains spaces

2019-12-08 Thread GitBox
07ARB commented on issue #26773: 
[SPARK-30126][SPARK-30145][CORE]sparkContext.addFile and sparkContext.addJar 
fails when file path contains spaces
URL: https://github.com/apache/spark/pull/26773#issuecomment-563088309
 
 
   @cloud-fan and @srowen , please review (previous review comments i have 
fixed)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on issue #26777: [SPARK-30134][SQL] Support DELETE JAR feature in SPARK

2019-12-08 Thread GitBox
sandeep-katta commented on issue #26777: [SPARK-30134][SQL]  Support DELETE JAR 
feature in SPARK
URL: https://github.com/apache/spark/pull/26777#issuecomment-563086392
 
 
   @srowen  @cloud-fan could you guys please review this feature


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'

2019-12-08 Thread GitBox
cloud-fan commented on a change in pull request #26741: [SPARK-30104][SQL] Fix 
catalog resolution for 'global_temp'
URL: https://github.com/apache/spark/pull/26741#discussion_r355280785
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala
 ##
 @@ -133,7 +133,11 @@ private[sql] trait LookupCatalog extends Logging {
 // For example, if the name of a custom catalog is the same with 
`GLOBAL_TEMP_DATABASE`,
 // this custom catalog can't be accessed.
 if (nameParts.head.equalsIgnoreCase(globalTempDB)) {
 
 Review comment:
   ah you are right. For a single-part name like `abc`, it may mean a table 
`abc` under default catalog, or mean catalog `abc`.
   
   I think it's better know if it's allowed to return a single catalog. e.g. 
it's allowed in `SHOW TABLES`, but not `DESCRIBE TABLE`.
   
   Shall we separate `CatalogAndIdentifierParts` into `CatalogAndTable`, 
`CatalogAndNamespace`? For `CatalogAndTable`, it's not allowed to return empty 
array as table name, so we shouldn't resolve single-part name to a catalog. For 
`CatalogAndNamespace`, root namespace's name is an empty array, so it's allowed.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] 
Columnar execution support for interval types
URL: https://github.com/apache/spark/pull/26699#discussion_r355278111
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ##
 @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends 
ColumnStats {
 Array[Any](null, null, nullCount, count, sizeInBytes)
 }
 
+private[columnar] final class IntervalColumnStats extends ColumnStats {
 
 Review comment:
   yea, I'd like to make a followup after this pull request, thanks for notice 
me this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] 
Columnar execution support for interval types
URL: https://github.com/apache/spark/pull/26699#discussion_r355278111
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ##
 @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends 
ColumnStats {
 Array[Any](null, null, nullCount, count, sizeInBytes)
 }
 
+private[columnar] final class IntervalColumnStats extends ColumnStats {
 
 Review comment:
   yea, I'd like to make a followup after this pull request, thanks for 
noticing me this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563080716
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563080722
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19835/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563080716
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563080722
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19835/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large 
numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563080285
 
 
   **[Test build #115015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115015/testReport)**
 for PR 26803 at commit 
[`3807027`](https://github.com/apache/spark/commit/38070271e7ebd04e4e43fcb7c0d3175eb2efa5fe).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] 
Columnar execution support for interval types
URL: https://github.com/apache/spark/pull/26699#discussion_r355276845
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/interval.sql
 ##
 @@ -264,3 +264,6 @@ select interval 'interval \t 1\tday';
 select interval 'interval\t1\tday';
 select interval '1\t' day;
 select interval '1 ' day;
+
+cache table interval_columnar as select i, cast(v as interval) from VALUES(1, 
'1 seconds'), (1, '2 seconds'), (2, NULL), (2, NULL) t(i,v);
+select * from interval_columnar;
 
 Review comment:
   maybe should be moved to `CachedTableSuite ` instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
maropu commented on a change in pull request #26699: [SPARK-30066][SQL] 
Columnar execution support for interval types
URL: https://github.com/apache/spark/pull/26699#discussion_r355276851
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ##
 @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends 
ColumnStats {
 Array[Any](null, null, nullCount, count, sizeInBytes)
 }
 
+private[columnar] final class IntervalColumnStats extends ColumnStats {
 
 Review comment:
   Ah, I see. Thanks for the check. If you have time, do you wanna make a pr 
for that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] 
Columnar execution support for interval types
URL: https://github.com/apache/spark/pull/26699#discussion_r355276430
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ##
 @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends 
ColumnStats {
 Array[Any](null, null, nullCount, count, sizeInBytes)
 }
 
+private[columnar] final class IntervalColumnStats extends ColumnStats {
 
 Review comment:
   yes, these can be set to true for intervals


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Columnar execution support for interval types

2019-12-08 Thread GitBox
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] 
Columnar execution support for interval types
URL: https://github.com/apache/spark/pull/26699#discussion_r355275925
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ##
 @@ -295,6 +295,28 @@ private[columnar] final class BinaryColumnStats extends 
ColumnStats {
 Array[Any](null, null, nullCount, count, sizeInBytes)
 }
 
+private[columnar] final class IntervalColumnStats extends ColumnStats {
+  protected var upper: CalendarInterval =
+new CalendarInterval(Int.MinValue, Int.MinValue, Long.MinValue)
 
 Review comment:
   this is nice


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE 
SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563077996
 
 
   E.g. we currently don't support multi tenancy in Spark's ThriftServer, but 
if can use such a flexible implementation, we may can implement `defaultOwner = 
session.getUser` to get the actual user from client side, and do 
authentication/authorization for it. In the current implementation of Spark's 
ThriftServer, the `sparkUser` can only works as a super user, then at least we 
can achieve metadata security.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis Exception in Spark-Shell

2019-12-08 Thread GitBox
cloud-fan commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis 
Exception in Spark-Shell
URL: https://github.com/apache/spark/pull/26791#issuecomment-563076392
 
 
   what's wrong with the stack trace? If people are using Scala APIs, don't 
they expect to see detailed stack trace?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE 
SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563075573
 
 
   hmm, isn't it better to let Spark decide the default owner instead of 
catalog implementations?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 07ARB commented on issue #26795: [SPARK-30145][CORE]sparkContext.addJar fails when file path contains …

2019-12-08 Thread GitBox
07ARB commented on issue #26795: [SPARK-30145][CORE]sparkContext.addJar fails 
when file path contains …
URL: https://github.com/apache/spark/pull/26795#issuecomment-563075663
 
 
   @srowen, please review https://github.com/apache/spark/pull/26773 (both PR i 
have combine)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
yaooqinn commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE 
SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563073181
 
 
   For DataSource API developers:
   Exposes defaultOwner/OwnerType in `SupportsNamespaces` to defined the 
catalog default ownership. By default, we use our `sparkUser` as default owner 
and `USER` as default ownerType. This should be considered as how the 
developers define `authentication`. A bit similar with Hive's 
`HiveAuthenticationProvider `. Currently, we use this only in 
`CreateNamespaceExec` to define the default ownership. When we support 
`authorization`, these can be used in privilege checking for queries, commands 
etc..
   
   For end users:
   We don't allow user to set ownership or location or comment in create 
syntax, the ownership should be inherited from the catalog impl, and the 
comment and location should from the specific clauses.
   
   Also modifying these properties need specific commands.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563072506
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26803: [SPARK-30178][ML] RobustScaler 
support large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563072513
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19833/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-563072554
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-563072558
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19834/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-08 Thread GitBox
JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition 
based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-563072595
 
 
   @cloud-fan updated the comments online and offline. Please help me review 
again. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-08 Thread GitBox
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] 
optimize skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#discussion_r355270674
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala
 ##
 @@ -0,0 +1,281 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.concurrent.duration.Duration
+
+import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkEnv}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
UnknownPartitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.joins.SortMergeJoinExec
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.ThreadUtils
+
+case class OptimizeSkewedPartitions(conf: SQLConf) extends Rule[SparkPlan] {
+
+  private val supportedJoinTypes =
+Inner :: Cross :: LeftSemi :: LeftAnti :: LeftOuter :: RightOuter :: Nil
+
+  /**
+   * A partition is considered as a skewed partition if its size is larger 
than the median
+   * partition size * spark.sql.adaptive.skewedPartitionFactor and also larger 
than
+   * spark.sql.adaptive.skewedPartitionSizeThreshold.
+   */
+  private def isSkewed(
+ stats: MapOutputStatistics,
+ partitionId: Int,
+ medianSize: Long): Boolean = {
+val size = stats.bytesByPartitionId(partitionId)
+size > medianSize * conf.adaptiveSkewedFactor &&
+  size > conf.adaptiveSkewedSizeThreshold
+  }
+
+  private def medianSize(stats: MapOutputStatistics): Long = {
+val bytesLen = stats.bytesByPartitionId.length
+val bytes = stats.bytesByPartitionId.sorted
+if (bytes(bytesLen / 2) > 0) bytes(bytesLen / 2) else 1
+  }
+
+  /*
+  * Get all the map data size for specific reduce partitionId.
+  */
+  def getMapSizeForSpecificPartition(partitionId: Int, shuffleId: Int): 
Array[Long] = {
+val mapOutputTracker = 
SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
+mapOutputTracker.shuffleStatuses.get(shuffleId).
+  get.mapStatuses.map{_.getSizeForBlock(partitionId)}
+  }
+
+  /*
+  * Split the mappers based on the map size of specific skewed reduce 
partitionId.
+  */
+  def splitMappersBasedDataSize(mapPartitionSize: Array[Long], numMappers: 
Int): Array[Int] = {
+val advisoryTargetPostShuffleInputSize = conf.targetPostShuffleInputSize
+val partitionStartIndices = ArrayBuffer[Int]()
+var i = 0
+var postMapPartitionSize: Long = mapPartitionSize(i)
+partitionStartIndices += i
+while (i < numMappers && i + 1 < numMappers) {
+  val nextIndex = if (i + 1 < numMappers) {
 
 Review comment:
   @manuzhang Thanks for your review. Offline discussion with wenchen, we 
decided to remove this method. And split the skewed partition with the number 
of mappers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-563072558
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19834/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-563072554
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563072513
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19833/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26803: [SPARK-30178][ML] RobustScaler support 
large numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563072506
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-08 Thread GitBox
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-563072198
 
 
   **[Test build #115014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115014/testReport)**
 for PR 26434 at commit 
[`d339287`](https://github.com/apache/spark/commit/d339287689f674227fe91a1291c8a24bc52f4676).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures

2019-12-08 Thread GitBox
SparkQA commented on issue #26803: [SPARK-30178][ML] RobustScaler support large 
numFeatures
URL: https://github.com/apache/spark/pull/26803#issuecomment-563072181
 
 
   **[Test build #115013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115013/testReport)**
 for PR 26803 at commit 
[`9ddad92`](https://github.com/apache/spark/commit/9ddad925d443f209dc59b86fe1c7695b254f5baf).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amanomer commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis Exception in Spark-Shell

2019-12-08 Thread GitBox
amanomer commented on issue #26791: [SPARK-30161][SQL] Uncaught Analysis 
Exception in Spark-Shell
URL: https://github.com/apache/spark/pull/26791#issuecomment-563072000
 
 
   cc @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or()

2019-12-08 Thread GitBox
amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Implement a 
helper method for aliasing bool_and() and bool_or()
URL: https://github.com/apache/spark/pull/26712#issuecomment-563071621
 
 
   Thanks all for reviewing and merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amanomer commented on issue #26712: [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or()

2019-12-08 Thread GitBox
amanomer commented on issue #26712: [SPARK-29883][SQL] Implement a helper 
method for aliasing bool_and() and bool_or()
URL: https://github.com/apache/spark/pull/26712#issuecomment-563071621
 
 
   Thanks @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support 
ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563070737
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19831/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #26775: [SPARK-30018][SQL] Support 
ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563070731
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support 
passing all Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/25651#issuecomment-563070773
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19832/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-08 Thread GitBox
AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support 
passing all Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/25651#issuecomment-563070767
 
 
   Build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request #26803: [SPARK-30178][ML] RobustScaler support bigger numFeatures

2019-12-08 Thread GitBox
zhengruifeng opened a new pull request #26803: [SPARK-30178][ML] RobustScaler 
support bigger numFeatures
URL: https://github.com/apache/spark/pull/26803
 
 
   ### What changes were proposed in this pull request?
   compute the medians/ranges more distributedly
   
   ### Why are the changes needed?
   In Spark-Shell with default params, I processed a dataset with 
numFeatures=69,200, and existing impl fail due to OOM.
   After this PR, it will sucessfuly fit the model.
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all 
Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/25651#issuecomment-563070767
 
 
   Build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER 
DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563070731
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all 
Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/25651#issuecomment-563070773
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19832/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
AmplabJenkins commented on issue #26775: [SPARK-30018][SQL] Support ALTER 
DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563070737
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19831/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
SparkQA commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE 
SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563070444
 
 
   **[Test build #115012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115012/testReport)**
 for PR 26775 at commit 
[`dd1d52b`](https://github.com/apache/spark/commit/dd1d52bf3714fcfd37965b4fc54d1418b9e8be71).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'

2019-12-08 Thread GitBox
imback82 commented on a change in pull request #26741: [SPARK-30104][SQL] Fix 
catalog resolution for 'global_temp'
URL: https://github.com/apache/spark/pull/26741#discussion_r355269076
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala
 ##
 @@ -133,7 +133,11 @@ private[sql] trait LookupCatalog extends Logging {
 // For example, if the name of a custom catalog is the same with 
`GLOBAL_TEMP_DATABASE`,
 // this custom catalog can't be accessed.
 if (nameParts.head.equalsIgnoreCase(globalTempDB)) {
 
 Review comment:
   > ``` 
   > if (nameParts.length == 1) {
   >   Some((currentCatalog, currentNamespace ++ nameParts))
   > }
   > ```
   Look like we cannot use the table look up logic since we have a command like 
`SHOW TABLES FROM testcat` where `testcat` needs to be resolved as a catalog.
   
   Basically, we have a conflict when one part is given:
   1. It needs to be resolved as a catalog:
   ```
   CREATE TABLE testcat.table (id bigint, data string) USING foo
   SHOW TABLES FROM testcat
   ```
   2. It needs to be resolved as a non-catalog multi parts:
   ```
   CREATE TABLE testcat.testcat (id bigint, data string) USING foo
   USE testcat
   DESCRIBE TABLE testcat
   ```
   One way to resolve the conflict is to check the one part name against the 
current catalog. If they are the same, the one part name is used as 
`Identifier` (case 2), otherwise, it is used as a catalog (case 1).  Does this 
sound reasonable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2019-12-08 Thread GitBox
cloud-fan commented on issue #26775: [SPARK-30018][SQL] Support ALTER DATABASE 
SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#issuecomment-563068711
 
 
   For data source API, we do want to use properties to set 
owner/location/comment to make the API flexible. The problem is about end-user 
API: how do we expect end-users to set them?
   
   What I expect:
   - for CREATE TABLE/NAMESPACE, location should be set by LOCATION clause, 
comment should be set by COMMENT or properties. I'm not sure about owner: does 
hive support setting owner during CREATE TABLE/DATABASE
   - for ALTER TABLE/NAMESPACE, should be the same. location/owner should be 
set by special syntax instead of via properties.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >