[GitHub] [spark] SparkQA commented on pull request #30864: [SPARK-33857][SQL] Unify the default seed of random functions

2020-12-22 Thread GitBox


SparkQA commented on pull request #30864:
URL: https://github.com/apache/spark/pull/30864#issuecomment-749995396


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37865/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30538: [SPARK-33591][SQL] Recognize `null` in partition spec values

2020-12-22 Thread GitBox


SparkQA removed a comment on pull request #30538:
URL: https://github.com/apache/spark/pull/30538#issuecomment-749988674


   **[Test build #133273 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133273/testReport)**
 for PR 30538 at commit 
[`343d15d`](https://github.com/apache/spark/commit/343d15d44c67f2139b9a555dbbaa10f684ddb094).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30900: [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

2020-12-22 Thread GitBox


SparkQA removed a comment on pull request #30900:
URL: https://github.com/apache/spark/pull/30900#issuecomment-749968313


   **[Test build #133265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133265/testReport)**
 for PR 30900 at commit 
[`7771bb2`](https://github.com/apache/spark/commit/7771bb2700b66d50bf3140f16142b56b13f95b83).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on pull request #30900: [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

2020-12-22 Thread GitBox


imback82 commented on pull request #30900:
URL: https://github.com/apache/spark/pull/30900#issuecomment-749994396


   cc @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30865: [SPARK-33861][SQL] Simplify conditional in predicate

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30865:
URL: https://github.com/apache/spark/pull/30865#issuecomment-749994083


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37864/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30865: [SPARK-33861][SQL] Simplify conditional in predicate

2020-12-22 Thread GitBox


SparkQA commented on pull request #30865:
URL: https://github.com/apache/spark/pull/30865#issuecomment-749994064


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37864/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


SparkQA commented on pull request #30893:
URL: https://github.com/apache/spark/pull/30893#issuecomment-749993904


   **[Test build #133279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133279/testReport)**
 for PR 30893 at commit 
[`ee4c365`](https://github.com/apache/spark/commit/ee4c36540f481afc98f90ea3e84e17d9d0033d77).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #30893:
URL: https://github.com/apache/spark/pull/30893#discussion_r547771511



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowPartitionsSuite.scala
##
@@ -17,6 +17,31 @@
 
 package org.apache.spark.sql.hive.execution.command
 
+import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.execution.command.v1
 
-class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase
+class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase {
+  test("null and empty string as partition values") {

Review comment:
   Oh, shall we proceed that NPE fix PR first, @MaxGekk ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30900: [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30900:
URL: https://github.com/apache/spark/pull/30900#issuecomment-749993486


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37863/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30900: [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

2020-12-22 Thread GitBox


SparkQA commented on pull request #30900:
URL: https://github.com/apache/spark/pull/30900#issuecomment-749993474


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37863/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-749992829


   Hi, @AngersZh. This PR looks mostly ready as @maropu wrote. I left a few 
comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547767161



##
File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
##
@@ -445,6 +445,29 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("SPARK-33084: add jar when path contains comma") {

Review comment:
   Thanks. Yes. Please remove this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


MaxGekk commented on a change in pull request #30893:
URL: https://github.com/apache/spark/pull/30893#discussion_r547766855



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowPartitionsSuite.scala
##
@@ -17,6 +17,31 @@
 
 package org.apache.spark.sql.hive.execution.command
 
+import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.execution.command.v1
 
-class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase
+class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase {
+  test("null and empty string as partition values") {

Review comment:
   It fails with NPE :-(
   
   ```scala
 test("null and empty string as partition values") {
   import testImplicits._
   withNamespaceAndTable("ns", "tbl") { t =>
 val df = Seq((0, ""), (1, null)).toDF("a", "part")
 df.write
   .partitionBy("part")
   .format("parquet")
   .mode(SaveMode.Overwrite)
   .saveAsTable(t)
   
 runShowPartitionsSql(s"SHOW PARTITIONS $t", Row("part=null") :: Nil)
   }
 }
   ```
   ```
   java.lang.NullPointerException was thrown.
   java.lang.NullPointerException
at 
org.apache.spark.sql.execution.datasources.v2.ShowPartitionsExec.$anonfun$run$3(ShowPartitionsExec.scala:58)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
   ```
   Let me fix this separately since this PR is about tests. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749991501


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37872/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #30716: [SPARK-33747][CORE] Avoid calling unregisterMapOutput when the map stage is being rerunning.

2020-12-22 Thread GitBox


Ngone51 commented on pull request #30716:
URL: https://github.com/apache/spark/pull/30716#issuecomment-749991166


   After taking a look at the updated UT, I think the problem mentioned by 
@weixiuli is:
   
   1) task X of Stage B encountered FetchFailure while fetching the shuffle 
data of Stage A at executor-0; Stage A unregisters the MapOutput for the 
mapIndex-0.
   
   2) Stage A and Stage B both marked as failed and Stage A starts to rerun 
after a while.
   
   3) task Y (run at executor-0 too) of rerun Stage A success and register its 
MapOutput for the same mapIndex-0
   
   4) task Z of Stage B encountered FetchFailure at executor-0 too and Stage A 
unregister the MapOutput of mapIndex-1 again
   
   I think what @weixiuli is trying to do here is to avoid unregistering the 
MapOutput in step 4 again.
   
   
   However, I wonder whether it could happen in a real job. Or say, it's really 
a corner case. I think tasks like task Z usually failed early before the rerun 
task Y success.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30538: [SPARK-33591][SQL] Recognize `null` in partition spec values

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30538:
URL: https://github.com/apache/spark/pull/30538#issuecomment-749991113


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133273/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30538: [SPARK-33591][SQL] Recognize `null` in partition spec values

2020-12-22 Thread GitBox


SparkQA commented on pull request #30538:
URL: https://github.com/apache/spark/pull/30538#issuecomment-749991086


   **[Test build #133273 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133273/testReport)**
 for PR 30538 at commit 
[`343d15d`](https://github.com/apache/spark/commit/343d15d44c67f2139b9a555dbbaa10f684ddb094).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2020-12-22 Thread GitBox


SparkQA commented on pull request #30763:
URL: https://github.com/apache/spark/pull/30763#issuecomment-749990686


   **[Test build #133278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133278/testReport)**
 for PR 30763 at commit 
[`b468869`](https://github.com/apache/spark/commit/b46886998e5161484cac5bad17c79fdd7e91d91b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547763450



##
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##
@@ -1348,6 +1348,7 @@ private[spark] object SparkSubmitUtils {
   coordinates: String,
   ivySettings: IvySettings,
   exclusions: Seq[String] = Nil,
+  transitive: Boolean = true,

Review comment:
   What happens if we remove this default value? If it doesn't make many 
changes, let's remove this default value.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547761656



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
##
@@ -159,6 +161,13 @@ class SessionResourceLoader(session: SparkSession) extends 
FunctionResourceLoade
 }
   }
 
+  protected def resolveJars(path: URI): Seq[String] = {

Review comment:
   Although this class is `SessionResourceLoader `, `resolveJars` looks 
like more a utility function. Do we need to add here as `protected def`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


SparkQA commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-749989168


   **[Test build #133277 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133277/testReport)**
 for PR 29087 at commit 
[`8df104b`](https://github.com/apache/spark/commit/8df104beb258fe02a0a9f7ad01bf71bf8309a5f8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


SparkQA commented on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749988991


   **[Test build #133274 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133274/testReport)**
 for PR 30387 at commit 
[`4101603`](https://github.com/apache/spark/commit/4101603d92ea1ffe92e4205777c7fdb8492c2969).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29982: [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

2020-12-22 Thread GitBox


SparkQA commented on pull request #29982:
URL: https://github.com/apache/spark/pull/29982#issuecomment-749988953


   **[Test build #133275 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133275/testReport)**
 for PR 29982 at commit 
[`5cbd308`](https://github.com/apache/spark/commit/5cbd30850b1403e34e489ba1adbbafc8cf82894a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30900: [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30900:
URL: https://github.com/apache/spark/pull/30900#issuecomment-749988933


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133265/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30900: [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

2020-12-22 Thread GitBox


SparkQA commented on pull request #30900:
URL: https://github.com/apache/spark/pull/30900#issuecomment-749988765


   **[Test build #133265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133265/testReport)**
 for PR 30900 at commit 
[`7771bb2`](https://github.com/apache/spark/commit/7771bb2700b66d50bf3140f16142b56b13f95b83).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30538: [SPARK-33591][SQL] Recognize `null` in partition spec values

2020-12-22 Thread GitBox


SparkQA commented on pull request #30538:
URL: https://github.com/apache/spark/pull/30538#issuecomment-749988674


   **[Test build #133273 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133273/testReport)**
 for PR 30538 at commit 
[`343d15d`](https://github.com/apache/spark/commit/343d15d44c67f2139b9a555dbbaa10f684ddb094).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547759415



##
File path: docs/sql-ref-syntax-aux-resource-mgmt-add-jar.md
##
@@ -33,15 +33,30 @@ ADD JAR file_name
 
 * **file_name**
 
-The name of the JAR file to be added. It could be either on a local file 
system or a distributed file system.
+The name of the JAR file to be added. It could be either on a local file 
system or a distributed file system or an ivy URL.
+Apache Ivy is a popular dependency manager focusing on flexibility and 
simplicity. Now we support two parameter in URL query string:
 
+ * transitive: whether to download dependent jars related to your ivy URL. 
It is case-sensitive and only take last one if multiple transitive parameters 
are specified.
+ * exclude: exclusion list when download ivy URL jar and dependent jars.
+
+User can write ivy URL such as:
+
+  ivy://group:module:version
+  ivy://group:module:version?transitive=[true|false]
+  ivy://group:module:version?exclude=group:module,group:module
+  
ivy://group:module:version?exclude=group:module,group:module&transitive=[true|false]
+
 ### Examples
 
 ```sql
 ADD JAR /tmp/test.jar;
 ADD JAR "/path/to/some.jar";
 ADD JAR '/some/other.jar';
 ADD JAR "/path with space/abc.jar";
+ADD JAR "ivy://org.apache.hive:hive-contrib:2.3.7";
+ADD JAR "ivy://org.apache.hive:hive-contrib:2.3.7?transitive=false"

Review comment:
   Just a question. Without `transitive=true`, does `hive-contrib` work 
correctly?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


SparkQA commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-749988240


   **[Test build #133276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133276/testReport)**
 for PR 29966 at commit 
[`6bd41cd`](https://github.com/apache/spark/commit/6bd41cd85462a6bef0e8be61e9a61c2d628bc8fe).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547758123



##
File path: docs/sql-ref-syntax-aux-resource-mgmt-add-jar.md
##
@@ -33,15 +33,30 @@ ADD JAR file_name
 
 * **file_name**
 
-The name of the JAR file to be added. It could be either on a local file 
system or a distributed file system.
+The name of the JAR file to be added. It could be either on a local file 
system or a distributed file system or an ivy URL.
+Apache Ivy is a popular dependency manager focusing on flexibility and 
simplicity. Now we support two parameter in URL query string:
 
+ * transitive: whether to download dependent jars related to your ivy URL. 
It is case-sensitive and only take last one if multiple transitive parameters 
are specified.
+ * exclude: exclusion list when download ivy URL jar and dependent jars.
+
+User can write ivy URL such as:
+
+  ivy://group:module:version
+  ivy://group:module:version?transitive=[true|false]
+  ivy://group:module:version?exclude=group:module,group:module

Review comment:
   `exclude` seems to be misleading here because `transitive` is false by 
default. Maybe, shall we revise like the following?
   
   ```
   ivy://group:module:version
   ivy://group:module:version?transitive=[true|false]
   
ivy://group:module:version?transitive=[true|false]&exclude=group:module,group:module
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-749987581







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30899: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-12-22 Thread GitBox


AmplabJenkins removed a comment on pull request #30899:
URL: https://github.com/apache/spark/pull/30899#issuecomment-749987586


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133261/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


AmplabJenkins removed a comment on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749987584







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30902: [SPARK-33888][SQL] Add support for TimeMillis logicalType to TimestampType

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30902:
URL: https://github.com/apache/spark/pull/30902#issuecomment-749987729


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29982: [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

2020-12-22 Thread GitBox


SparkQA commented on pull request #29982:
URL: https://github.com/apache/spark/pull/29982#issuecomment-749987717


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37867/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30901: [SPARK-33887][SQL] Allow insert overwrite same table with static partition if using dynamic partition overwrite mode

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30901:
URL: https://github.com/apache/spark/pull/30901#issuecomment-749987753


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30899: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30899:
URL: https://github.com/apache/spark/pull/30899#issuecomment-749987586


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133261/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-749987581







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749987584







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547755940



##
File path: docs/sql-ref-syntax-aux-resource-mgmt-add-jar.md
##
@@ -33,15 +33,30 @@ ADD JAR file_name
 
 * **file_name**
 
-The name of the JAR file to be added. It could be either on a local file 
system or a distributed file system.
+The name of the JAR file to be added. It could be either on a local file 
system or a distributed file system or an ivy URL.
+Apache Ivy is a popular dependency manager focusing on flexibility and 
simplicity. Now we support two parameter in URL query string:
 
+ * transitive: whether to download dependent jars related to your ivy URL. 
It is case-sensitive and only take last one if multiple transitive parameters 
are specified.
+ * exclude: exclusion list when download ivy URL jar and dependent jars.

Review comment:
   `when download` -> `during downloading`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547752608



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -25,12 +25,140 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  private def isInvalidQueryString(tokens: Array[String]): Boolean = {
+tokens.length != 2 || StringUtils.isBlank(tokens(0)) || 
StringUtils.isBlank(tokens(1))
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy URI need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of Ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(isInvalidQueryString)) {
+throw new IllegalArgumentException(
+  s"Invalid query string in Ivy URI ${uri.toString}: $uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+
+  // Parse transitive parameters (e.g., transitive=true) in an Ivy URI, 
default value is false
+  val transitiveParams = groupedParams.get("transitive")
+  if (transitiveParams.map(_.size).getOrElse(0) > 1) {
+logWarning("It's best to specify `transitive` parameter in ivy URI 
query only once." +
+  " If there are multiple `transitive` parameter, we will select the 
last one")
+  }
+  val transitive =
+transitiveParams.flatMap(_.takeRight(1).map(_._2 == 
"true").headOption).getOrElse(false)

Review comment:
   Oh, never mind. I found the test cases.
   ```
   test("SPARK-33084: Add jar support Ivy URI -- test param case sensitive") {
   test("SPARK-33084: Add jar support Ivy URI -- test transitive value case 
sensitive") {
   ```

##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -25,12 +25,140 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  private def isInvalidQueryString(tokens: Array[String]): Boolean = {
+tokens.length != 2 || StringUtils.isBlank(tokens(0)) || 
StringUtils.isBlank(tokens(1))
+  }
+
+  /**

[GitHub] [spark] saikocat opened a new pull request #30902: [SPARK-33888][SQL] Add support for TimeMillis logicalType to TimestampType

2020-12-22 Thread GitBox


saikocat opened a new pull request #30902:
URL: https://github.com/apache/spark/pull/30902


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30864: [SPARK-33857][SQL] Unify the default seed of random functions

2020-12-22 Thread GitBox


SparkQA commented on pull request #30864:
URL: https://github.com/apache/spark/pull/30864#issuecomment-749985874


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37865/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547752259



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -25,12 +25,140 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  private def isInvalidQueryString(tokens: Array[String]): Boolean = {
+tokens.length != 2 || StringUtils.isBlank(tokens(0)) || 
StringUtils.isBlank(tokens(1))
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy URI need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of Ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(isInvalidQueryString)) {
+throw new IllegalArgumentException(
+  s"Invalid query string in Ivy URI ${uri.toString}: $uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+
+  // Parse transitive parameters (e.g., transitive=true) in an Ivy URI, 
default value is false
+  val transitiveParams = groupedParams.get("transitive")
+  if (transitiveParams.map(_.size).getOrElse(0) > 1) {
+logWarning("It's best to specify `transitive` parameter in ivy URI 
query only once." +
+  " If there are multiple `transitive` parameter, we will select the 
last one")
+  }
+  val transitive =
+transitiveParams.flatMap(_.takeRight(1).map(_._2 == 
"true").headOption).getOrElse(false)

Review comment:
   > This `true` looks like case sensitive. Shall we do this in a 
case-insensitive way?
   
   We discuss this before, since hive is case-sensitive and we always write ivy 
uri in lowercase so make it case-sensitive





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30852: [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None

2020-12-22 Thread GitBox


SparkQA commented on pull request #30852:
URL: https://github.com/apache/spark/pull/30852#issuecomment-749985416


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37866/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547750850



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -25,12 +25,140 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  private def isInvalidQueryString(tokens: Array[String]): Boolean = {
+tokens.length != 2 || StringUtils.isBlank(tokens(0)) || 
StringUtils.isBlank(tokens(1))
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy URI need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of Ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(isInvalidQueryString)) {
+throw new IllegalArgumentException(
+  s"Invalid query string in Ivy URI ${uri.toString}: $uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+
+  // Parse transitive parameters (e.g., transitive=true) in an Ivy URI, 
default value is false
+  val transitiveParams = groupedParams.get("transitive")
+  if (transitiveParams.map(_.size).getOrElse(0) > 1) {
+logWarning("It's best to specify `transitive` parameter in ivy URI 
query only once." +
+  " If there are multiple `transitive` parameter, we will select the 
last one")
+  }
+  val transitive =
+transitiveParams.flatMap(_.takeRight(1).map(_._2 == 
"true").headOption).getOrElse(false)

Review comment:
   This `true` looks like case sensitive. Shall we do this in a 
case-insensitive way?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547749884



##
File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
##
@@ -445,6 +445,29 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("SPARK-33084: add jar when path contains comma") {

Review comment:
   > Do we really need to support this `comma`?
   
   This is a UT added earlier.
   In current code seems we don't need it?
   Just remove?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547749884



##
File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
##
@@ -445,6 +445,29 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("SPARK-33084: add jar when path contains comma") {

Review comment:
   > Do we really need to support this `comma`?
   
   This is a UT added earlier.
   In current code seems we don't need it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30865: [SPARK-33861][SQL] Simplify conditional in predicate

2020-12-22 Thread GitBox


SparkQA commented on pull request #30865:
URL: https://github.com/apache/spark/pull/30865#issuecomment-749984196


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37864/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30900: [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

2020-12-22 Thread GitBox


SparkQA commented on pull request #30900:
URL: https://github.com/apache/spark/pull/30900#issuecomment-749984098


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37863/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547747201



##
File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
##
@@ -445,6 +445,29 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("SPARK-33084: add jar when path contains comma") {

Review comment:
   Do we really need to support this `comma`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


beliefer commented on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749983136


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547745481



##
File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
##
@@ -445,6 +445,29 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 }
   }
 
+  test("SPARK-33084: add jar when path contains comma") {

Review comment:
   For a negative test case, could you be more specific? Otherwise, this 
test case name can be misleading.
   ```scala
   - test("SPARK-33084: add jar when path contains comma") {
   + test("SPARK-33084: add jar should fail when path contains comma") {
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #30888: [SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list

2020-12-22 Thread GitBox


yaooqinn commented on a change in pull request #30888:
URL: https://github.com/apache/spark/pull/30888#discussion_r547743509



##
File path: docs/sql-ref-syntax-dml-insert-into.md
##
@@ -40,11 +40,20 @@ INSERT INTO [ TABLE ] table_identifier [ partition_spec ]
 
 * **partition_spec**
 
-An optional parameter that specifies a comma separated list of key and 
value pairs
+An optional parameter that specifies a comma-separated list of key and 
value pairs
 for partitions.
 
 **Syntax:** `PARTITION ( partition_col_name  = partition_col_val [ , ... ] 
)`
 
+* **column_list**
+
+An optional parameter that specifies a comma-separated list of columns 
belonging to the `table_identifier` table.
+
+**Note:**The current behaviour has some limitations:
+- All specified columns should exist in the table and not be duplicated 
from each other. It includes all columns except the static partition columns.
+- The size of the column list should be exactly the size of the data from 
`VALUES` clause or query.
+- The order of the column list is alterable and determines how the data 
from `VALUES` clause or query to be inserted by position.

Review comment:
   how about removing: `The current behavior has some limitations:`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


SparkQA commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-74998


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37862/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30899: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-12-22 Thread GitBox


SparkQA removed a comment on pull request #30899:
URL: https://github.com/apache/spark/pull/30899#issuecomment-749928251


   **[Test build #133261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133261/testReport)**
 for PR 30899 at commit 
[`03c7aac`](https://github.com/apache/spark/commit/03c7aac25282f2d75ed9bd4b82f45babc308f2d7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #30887: [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns

2020-12-22 Thread GitBox


yaooqinn commented on pull request #30887:
URL: https://github.com/apache/spark/pull/30887#issuecomment-749980684


   thanks for merging and review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30899: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-12-22 Thread GitBox


SparkQA commented on pull request #30899:
URL: https://github.com/apache/spark/pull/30899#issuecomment-749980617


   **[Test build #133261 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133261/testReport)**
 for PR 30899 at commit 
[`03c7aac`](https://github.com/apache/spark/commit/03c7aac25282f2d75ed9bd4b82f45babc308f2d7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class ContextAwareIterator[IN](iter: Iterator[IN], context: 
TaskContext) extends Iterator[IN] `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #30887: [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns

2020-12-22 Thread GitBox


HyukjinKwon closed pull request #30887:
URL: https://github.com/apache/spark/pull/30887


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30887: [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns

2020-12-22 Thread GitBox


HyukjinKwon commented on pull request #30887:
URL: https://github.com/apache/spark/pull/30887#issuecomment-749980442


   Merged to master and branch-3.1.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547739630



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3771,6 +3792,40 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   }
 })
   }
+
+  test("SPARK-33084: Add jar support Ivy URI in SQL -- jar contains udf 
class") {
+val sumFuncClass = "org.apache.spark.examples.sql.Spark33084"
+val functionName = "test_udf"
+withTempDir { dir => {

Review comment:
   Do we need this ending `{`? I guess we can remove this.
   ```scala
   - withTempDir { dir => {
   + withTempDir { dir =>
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


SparkQA removed a comment on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-749968936


   **[Test build #133269 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133269/testReport)**
 for PR 29087 at commit 
[`1a4262b`](https://github.com/apache/spark/commit/1a4262b40d711aff02e457661d755d4d84beb9a3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei commented on pull request #29982: [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

2020-12-22 Thread GitBox


turboFei commented on pull request #29982:
URL: https://github.com/apache/spark/pull/29982#issuecomment-749978769


   > **[Test build #133268 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133268/testReport)**
 for PR 29982 at commit 
[`5a752d4`](https://github.com/apache/spark/commit/5a752d4a988a927323f70eab0b2d3e6b4720a3f5).
   > 
   > * This patch **fails to build**.
   > * This patch merges cleanly.
   > * This patch adds no public classes.
   
   
   Jenkins error:
   [error] java.io.IOException: No space left on device



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


SparkQA commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-749978475


   **[Test build #133269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133269/testReport)**
 for PR 29087 at commit 
[`1a4262b`](https://github.com/apache/spark/commit/1a4262b40d711aff02e457661d755d4d84beb9a3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547735935



##
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##
@@ -304,8 +304,8 @@ private[spark] class SparkSubmit extends Logging {
   // Resolve maven dependencies if there are any and add classpath to 
jars. Add them to py-files
   // too for packages that include Python code
   val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
-args.packagesExclusions, args.packages, args.repositories, 
args.ivyRepoPath,
-args.ivySettingsPath)
+true, args.packagesExclusions, args.packages,

Review comment:
   > nit. `true,` -> `transitive = true,`.
   
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 edited a comment on pull request #30901: [SPARK-33887][SQL] Allow insert overwrite same table with static partition if using dynamic partition overwrite mode

2020-12-22 Thread GitBox


wangshuo128 edited a comment on pull request #30901:
URL: https://github.com/apache/spark/pull/30901#issuecomment-749975050


   AFAIK, when `spark.sql.sources.partitionOverwriteMode` is set to `DYNAMIC`, 
Spark will replace partitions in runtime, instead of deleting relevant 
partitions paths ahead. So we can use `DYNAMIC` overwrite mode if we overwrite 
a static partition.
   Gentle ping @cloud-fan @viirya, would you help to review this? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


SparkQA removed a comment on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749973781


   **[Test build #133272 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133272/testReport)**
 for PR 30387 at commit 
[`4101603`](https://github.com/apache/spark/commit/4101603d92ea1ffe92e4205777c7fdb8492c2969).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


SparkQA commented on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749976551


   **[Test build #133272 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133272/testReport)**
 for PR 30387 at commit 
[`4101603`](https://github.com/apache/spark/commit/4101603d92ea1ffe92e4205777c7fdb8492c2969).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547732346



##
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##
@@ -304,8 +304,8 @@ private[spark] class SparkSubmit extends Logging {
   // Resolve maven dependencies if there are any and add classpath to 
jars. Add them to py-files
   // too for packages that include Python code
   val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
-args.packagesExclusions, args.packages, args.repositories, 
args.ivyRepoPath,
-args.ivySettingsPath)
+true, args.packagesExclusions, args.packages,

Review comment:
   nit. `true,` -> `transitive = true,`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547732346



##
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##
@@ -304,8 +304,8 @@ private[spark] class SparkSubmit extends Logging {
   // Resolve maven dependencies if there are any and add classpath to 
jars. Add them to py-files
   // too for packages that include Python code
   val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
-args.packagesExclusions, args.packages, args.repositories, 
args.ivyRepoPath,
-args.ivySettingsPath)
+true, args.packagesExclusions, args.packages,

Review comment:
   nit. `true` -> `transitive = true`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547732189



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1959,47 +1959,58 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  val (keys, schema) = if (path.contains("\\") && Utils.isWindows) {

Review comment:
   > If we want to use `plural`, it's `schemes`.
   
   Here scheme is singular.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547731808



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1959,47 +1959,58 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  val (keys, schema) = if (path.contains("\\") && Utils.isWindows) {
 // For local paths with backslashes on Windows, URI throws an exception
-addLocalJarFile(new File(path))
+(addLocalJarFile(new File(path)), "local")
   } else {
 val uri = new Path(path).toUri
 // SPARK-17650: Make sure this is a valid URL before adding it to the 
list of dependencies
 Utils.validateURL(uri)
-uri.getScheme match {
+val uriSchema = uri.getScheme

Review comment:
   Done

##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1959,47 +1959,58 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  val (keys, schema) = if (path.contains("\\") && Utils.isWindows) {

Review comment:
   > `schema` -> `scheme` because `scheme` is different from `schema`.
   
   Very embarrassed, updated





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30893:
URL: https://github.com/apache/spark/pull/30893#discussion_r547730493



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowPartitionsSuite.scala
##
@@ -17,6 +17,31 @@
 
 package org.apache.spark.sql.hive.execution.command
 
+import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.execution.command.v1
 
-class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase
+class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase {
+  test("null and empty string as partition values") {

Review comment:
   shall we add the same test for v2?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547730366



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1959,47 +1959,58 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  val (keys, schema) = if (path.contains("\\") && Utils.isWindows) {

Review comment:
   If we want to use `plural`, it's `schemes`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on pull request #30901: [SPARK-33887][SQL] Allow insert overwrite same table with static partition if using dynamic partition overwrite mode

2020-12-22 Thread GitBox


wangshuo128 commented on pull request #30901:
URL: https://github.com/apache/spark/pull/30901#issuecomment-749975050


   AFAIK, when `spark.sql.sources.partitionOverwriteMode` is set to `DYNAMIC`, 
Spark will replace partitions in runtime, instead of deleting relevant 
partitions paths ahead. So we can use `DYNAMIC` overwrite mode if we overwrite 
a static partition.
   Gentle ping @cloud-fan @viirya, would help to review this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30893:
URL: https://github.com/apache/spark/pull/30893#discussion_r547729365



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowPartitionsSuite.scala
##
@@ -17,6 +17,31 @@
 
 package org.apache.spark.sql.hive.execution.command
 
+import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.execution.command.v1
 
-class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase
+class ShowPartitionsSuite extends v1.ShowPartitionsSuiteBase with 
CommandSuiteBase {
+  test("null and empty string as partition values") {
+import testImplicits._
+withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") {
+  withTable("t") {
+val df = Seq(
+  (0, ""),
+  (1, null)
+).toDF("a", "part")
+df.write
+  .partitionBy("part")
+  .format("hive")
+  .mode(SaveMode.Overwrite)
+  .saveAsTable("t")
+
+runShowPartitionsSql(
+  "SHOW PARTITIONS t",
+  Row("part=__HIVE_DEFAULT_PARTITION__") :: Nil)
+checkAnswer(spark.table("t"),
+  Row(0, "__HIVE_DEFAULT_PARTITION__") ::
+  Row(1, "__HIVE_DEFAULT_PARTITION__") :: Nil)

Review comment:
   this is inconsistent with the in-memory catalog. Let's fix this later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30893:
URL: https://github.com/apache/spark/pull/30893#discussion_r547728220



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
##
@@ -93,4 +93,26 @@ class ShowPartitionsSuite extends ShowPartitionsSuiteBase 
with CommandSuiteBase
   assert(sql("SHOW PARTITIONS part_datasrc").count() == 3)
 }
   }
+
+  test("null and empty string as partition values") {
+import testImplicits._
+withTable("t") {
+  val df = Seq(
+(0, ""),
+(1, null)
+  ).toDF("a", "part")
+  df.write
+.partitionBy("part")
+.format("parquet")
+.mode(SaveMode.Overwrite)
+.saveAsTable("t")
+
+  runShowPartitionsSql(
+"SHOW PARTITIONS t",
+Row("part=__HIVE_DEFAULT_PARTITION__") :: Nil)

Review comment:
   ah, if the behavior is the same as Hive, we may need to keep  it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547727882



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1959,47 +1959,58 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  val (keys, schema) = if (path.contains("\\") && Utils.isWindows) {
 // For local paths with backslashes on Windows, URI throws an exception
-addLocalJarFile(new File(path))
+(addLocalJarFile(new File(path)), "local")
   } else {
 val uri = new Path(path).toUri
 // SPARK-17650: Make sure this is a valid URL before adding it to the 
list of dependencies
 Utils.validateURL(uri)
-uri.getScheme match {
+val uriSchema = uri.getScheme

Review comment:
   `uriSchema` -> `uriScheme`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30893:
URL: https://github.com/apache/spark/pull/30893#discussion_r547727845



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
##
@@ -93,4 +93,26 @@ class ShowPartitionsSuite extends ShowPartitionsSuiteBase 
with CommandSuiteBase
   assert(sql("SHOW PARTITIONS part_datasrc").count() == 3)
 }
   }
+
+  test("null and empty string as partition values") {
+import testImplicits._
+withTable("t") {
+  val df = Seq(
+(0, ""),
+(1, null)
+  ).toDF("a", "part")
+  df.write
+.partitionBy("part")
+.format("parquet")
+.mode(SaveMode.Overwrite)
+.saveAsTable("t")
+
+  runShowPartitionsSql(
+"SHOW PARTITIONS t",
+Row("part=__HIVE_DEFAULT_PARTITION__") :: Nil)

Review comment:
   This is a bug... We shouldn't expose the internal stuff. We should 
return null as `df.show` did. Let's fix it later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547727612



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1959,47 +1959,58 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  val (keys, schema) = if (path.contains("\\") && Utils.isWindows) {

Review comment:
   `schema` -> `scheme` because `scheme` is different from `schema`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30887: [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns

2020-12-22 Thread GitBox


HyukjinKwon commented on pull request #30887:
URL: https://github.com/apache/spark/pull/30887#issuecomment-749974030


   ```
   org.scalatest.exceptions.TestFailedException: transform.sql
   Expected "...h status 127. Error:[ /bin/bash: some_non_existent_command: 
command not found]", but got "...h status 127. Error:[]" Result did not match 
for query #2
   SELECT TRANSFORM(a)
   USING 'some_non_existent_command' AS (a)
   FROM t
   ```
   
   is being fixed in 
https://github.com/apache/spark/pull/30896#issuecomment-749954219



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30387: [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-22 Thread GitBox


SparkQA commented on pull request #30387:
URL: https://github.com/apache/spark/pull/30387#issuecomment-749973781


   **[Test build #133272 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133272/testReport)**
 for PR 30387 at commit 
[`4101603`](https://github.com/apache/spark/commit/4101603d92ea1ffe92e4205777c7fdb8492c2969).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30893: [SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30893:
URL: https://github.com/apache/spark/pull/30893#discussion_r547726487



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
##
@@ -93,4 +93,26 @@ class ShowPartitionsSuite extends ShowPartitionsSuiteBase 
with CommandSuiteBase
   assert(sql("SHOW PARTITIONS part_datasrc").count() == 3)
 }
   }
+
+  test("null and empty string as partition values") {
+import testImplicits._
+withTable("t") {
+  val df = Seq(
+(0, ""),
+(1, null)
+  ).toDF("a", "part")

Review comment:
   +1





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r547726508



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1959,47 +1959,58 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  val (keys, schema) = if (path.contains("\\") && Utils.isWindows) {

Review comment:
   `schema` -> `scheme`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29982: [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

2020-12-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29982:
URL: https://github.com/apache/spark/pull/29982#issuecomment-749971157


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133268/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30864: [SPARK-33857][SQL] Unify the default seed of random functions

2020-12-22 Thread GitBox


AmplabJenkins removed a comment on pull request #30864:
URL: https://github.com/apache/spark/pull/30864#issuecomment-749971573


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133266/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29982: [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

2020-12-22 Thread GitBox


SparkQA removed a comment on pull request #29982:
URL: https://github.com/apache/spark/pull/29982#issuecomment-749968433


   **[Test build #133268 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133268/testReport)**
 for PR 29982 at commit 
[`5a752d4`](https://github.com/apache/spark/commit/5a752d4a988a927323f70eab0b2d3e6b4720a3f5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-749971532


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37868/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30864: [SPARK-33857][SQL] Unify the default seed of random functions

2020-12-22 Thread GitBox


SparkQA removed a comment on pull request #30864:
URL: https://github.com/apache/spark/pull/30864#issuecomment-749967997


   **[Test build #133266 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133266/testReport)**
 for PR 30864 at commit 
[`8d39f5a`](https://github.com/apache/spark/commit/8d39f5a264ef8a6bebd1d62b3ff6ae082b6deee7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 opened a new pull request #30901: [SPARK-33887][SQL] Allow insert overwrite same table with static partition if using dynamic partition overwrite mode

2020-12-22 Thread GitBox


wangshuo128 opened a new pull request #30901:
URL: https://github.com/apache/spark/pull/30901


   
   
   
   ### What changes were proposed in this pull request?
   ```sql
   
   CREATE TABLE insertTable(i int, part int) USING PARQUET PARTITIONED BY 
(part);
   
   INSERT INTO TABLE insertTable PARTITION(part=1) SELECT 1;
   
   SET spark.sql.sources.partitionOverwriteMode=DYNAMIC;
   
   INSERT OVERWRITE TABLE insertTable PARTITION(part=1)
   SELECT i + 1 FROM insertTable WHERE part = 1;
   ```
   The SQL will fail because Spark complains  "Cannot overwrite a path that is 
also being read from."
   
   
   ### Why are the changes needed?
   Allow users to insert overwrite the same table with static partition if 
`spark.sql.sources.partitionOverwriteMode` is set to `DYNAMIC`.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes.
   
   
   ### How was this patch tested?
   New UT.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30852: [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30852:
URL: https://github.com/apache/spark/pull/30852#discussion_r547723985



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
##
@@ -215,4 +215,24 @@ class SimplifyConditionalSuite extends PlanTest with 
ExpressionEvalHelper with P
   If(GreaterThan(Rand(0), UnresolvedAttribute("a")), FalseLiteral, 
TrueLiteral),
   LessThanOrEqual(Rand(0), UnresolvedAttribute("a")))
   }
+
+  test("SPARK-33847: Remove the CaseWhen if elseValue is empty and other 
outputs are null") {
+assertEquivalent(
+  CaseWhen((GreaterThan('a, 1), Literal.create(null, IntegerType)) :: Nil,

Review comment:
   ditto, `Seq(GreaterThan('a, 1), GreaterThan(Rand(0), 1)).foreach...`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30852: [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30852:
URL: https://github.com/apache/spark/pull/30852#discussion_r547722720



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala
##
@@ -380,6 +382,52 @@ class ReplaceNullWithFalseInPredicateSuite extends 
PlanTest {
 testProjection(originalExpr = column, expectedExpr = column)
   }
 
+  test("replace None of elseValue inside CaseWhen if all branches are 
FalseLiteral") {
+val allFalseBranches = Seq(
+  (UnresolvedAttribute("i") < Literal(10)) -> FalseLiteral,
+  (UnresolvedAttribute("i") > Literal(40)) -> FalseLiteral)
+val allFalseCond = CaseWhen(allFalseBranches)
+
+val nonAllFalseBranches = Seq(
+  (UnresolvedAttribute("i") < Literal(10)) -> FalseLiteral,
+  (UnresolvedAttribute("i") > Literal(40)) -> TrueLiteral)
+val nonAllFalseCond = CaseWhen(nonAllFalseBranches, FalseLiteral)
+
+testFilter(allFalseCond, FalseLiteral)
+testJoin(allFalseCond, FalseLiteral)
+testDelete(allFalseCond, FalseLiteral)
+testUpdate(allFalseCond, FalseLiteral)
+
+testFilter(nonAllFalseCond, nonAllFalseCond)
+testJoin(nonAllFalseCond, nonAllFalseCond)
+testDelete(nonAllFalseCond, nonAllFalseCond)
+testUpdate(nonAllFalseCond, nonAllFalseCond)
+  }
+
+  test("replace None of elseValue inside CaseWhen if all branches are null") {
+val allFalseBranches = Seq(

Review comment:
   `allNullBranches`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30864: [SPARK-33857][SQL] Unify the default seed of random functions

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #30864:
URL: https://github.com/apache/spark/pull/30864#issuecomment-749971573


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133266/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30852: [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30852:
URL: https://github.com/apache/spark/pull/30852#discussion_r547721686



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala
##
@@ -222,10 +223,14 @@ class ReplaceNullWithFalseInPredicateSuite extends 
PlanTest {
 Literal(null, IntegerType),
 Literal(3)),
   FalseLiteral)
-testFilter(originalCond = condition, expectedCond = condition)
-testJoin(originalCond = condition, expectedCond = condition)
-testDelete(originalCond = condition, expectedCond = condition)
-testUpdate(originalCond = condition, expectedCond = condition)
+val expectedCond = If(

Review comment:
   or we change this test to preserve the original purpose.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30864: [SPARK-33857][SQL] Unify the default seed of random functions

2020-12-22 Thread GitBox


SparkQA commented on pull request #30864:
URL: https://github.com/apache/spark/pull/30864#issuecomment-749971548


   **[Test build #133266 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133266/testReport)**
 for PR 30864 at commit 
[`8d39f5a`](https://github.com/apache/spark/commit/8d39f5a264ef8a6bebd1d62b3ff6ae082b6deee7).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-749971532


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37868/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29982: [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

2020-12-22 Thread GitBox


AmplabJenkins commented on pull request #29982:
URL: https://github.com/apache/spark/pull/29982#issuecomment-749971157


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133268/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29982: [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

2020-12-22 Thread GitBox


SparkQA commented on pull request #29982:
URL: https://github.com/apache/spark/pull/29982#issuecomment-749971136


   **[Test build #133268 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133268/testReport)**
 for PR 29982 at commit 
[`5a752d4`](https://github.com/apache/spark/commit/5a752d4a988a927323f70eab0b2d3e6b4720a3f5).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30852: [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30852:
URL: https://github.com/apache/spark/pull/30852#discussion_r547719655



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushFoldableIntoBranchesSuite.scala
##
@@ -258,4 +258,22 @@ class PushFoldableIntoBranchesSuite
   EqualTo(CaseWhen(Seq((a, Literal(1)), (c, Literal(2))), 
None).cast(StringType), Literal("4")),
   CaseWhen(Seq((a, FalseLiteral), (c, FalseLiteral)), None))
   }
+
+  test("SPARK-33847: Remove the CaseWhen if elseValue is empty and other 
outputs are null") {
+assertEquivalent(

Review comment:
   nit: to make the test more readable
   ```
   Seq(a, LessThan(Rand(1), Literal(0.5)).foreach { condition =>
 assertEquivalent(EqualTo(CaseWhen(Seq((condition, ...
 assertEquivalent(// with cast ...
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30852: [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30852:
URL: https://github.com/apache/spark/pull/30852#discussion_r547719655



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushFoldableIntoBranchesSuite.scala
##
@@ -258,4 +258,22 @@ class PushFoldableIntoBranchesSuite
   EqualTo(CaseWhen(Seq((a, Literal(1)), (c, Literal(2))), 
None).cast(StringType), Literal("4")),
   CaseWhen(Seq((a, FalseLiteral), (c, FalseLiteral)), None))
   }
+
+  test("SPARK-33847: Remove the CaseWhen if elseValue is empty and other 
outputs are null") {
+assertEquivalent(

Review comment:
   nit: to make the test more readable
   ```
   Seq(a, LessThan(Rand(1), Literal(0.5)).foreach { condition =>
 assertEquivalent(EqualTo(CaseWhen(Seq((condition, ...
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30864: [SPARK-33857][SQL] Unify the default seed of random functions

2020-12-22 Thread GitBox


SparkQA commented on pull request #30864:
URL: https://github.com/apache/spark/pull/30864#issuecomment-749970193


   **[Test build #133271 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133271/testReport)**
 for PR 30864 at commit 
[`edf0b03`](https://github.com/apache/spark/commit/edf0b03bc59c7aa9d2838391ff197d51a649d6cf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >