[GitHub] [spark] LuciferYang edited a comment on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT

2020-12-01 Thread GitBox


LuciferYang edited a comment on pull request #30547:
URL: https://github.com/apache/spark/pull/30547#issuecomment-737057477


   All failed cases in Jenkins belong to kafka10-sql module, but local test 
succcess:
   ```
   mvn clean install -pl external/kafka-0-10-sql 
   
   Run completed in 11 minutes, 42 seconds.
   Total number of tests run: 260
   Suites: completed 26, aborted 0
   Tests: succeeded 260, failed 0, canceled 0, ignored 4, pending 0
   ```
   
   Let me merge with master and retest these



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on a change in pull request #30562: [SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete

2020-12-01 Thread GitBox


aokolnychyi commented on a change in pull request #30562:
URL: https://github.com/apache/spark/pull/30562#discussion_r533962113



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##
@@ -28,6 +28,25 @@
  */
 @Evolving
 public interface SupportsDelete {
+
+  /**
+   * Checks whether it is possible to delete data from a data source table 
that matches filter
+   * expressions.
+   * 
+   * Rows should be deleted from the data source iff all of the filter 
expressions match.
+   * That is, the expressions must be interpreted as a set of filters that are 
ANDed together.
+   * 
+   * Spark will call this method to check if the delete is possible without 
significant effort.
+   * Otherwise, Spark will try to rewrite the delete operation and produce 
row-level changes
+   * if the data source table supports deleting individual records.
+   *
+   * @param filters filter expressions, used to select rows to delete when all 
expressions match
+   * @return true if the delete operation can be performed
+   */
+  default boolean canDeleteWhere(Filter[] filters) {
+return true;

Review comment:
   That's correct and the method returns `true` to keep the old behavior by 
default.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT

2020-12-01 Thread GitBox


LuciferYang edited a comment on pull request #30547:
URL: https://github.com/apache/spark/pull/30547#issuecomment-737030101


   4 test cases of the yarn module failed in the GitHub action:
   
   ```
   YarnClusterSuite.run Spark in yarn-client mode with different 
configurations, ensuring redaction
   YarnClusterSuite.run Spark in yarn-cluster mode with different 
configurations, ensuring redaction
   YarnClusterSuite.yarn-cluster should respect conf overrides in 
SparkHadoopUtil (SPARK-16414, SPARK-23630)
   YarnClusterSuite.run Spark in yarn-client mode with additional jar
   ```
   
   but local test success:
   
   ```
   mvn clean install -pl resource-managers/yarn -Pyarn
   
   Run completed in 8 minutes, 56 seconds.
   Total number of tests run: 137
   Suites: completed 18, aborted 0
   Tests: succeeded 137, failed 0, canceled 1, ignored 0, pending 0
   All tests passed.
   ```
   Let me check kafka-sql test failed in Jenkins.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT

2020-12-01 Thread GitBox


LuciferYang commented on pull request #30547:
URL: https://github.com/apache/spark/pull/30547#issuecomment-737057477


   All failed cases in Jenkins belong to kafka10-sql module, but local test 
succcess:
   ```
   mvn clean install -pl external/kafka-0-10-sql 
   
   Run completed in 11 minutes, 42 seconds.
   Total number of tests run: 260
   Suites: completed 26, aborted 0
   Tests: succeeded 260, failed 0, canceled 0, ignored 4, pending 0
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30471: [SPARK-33520][ML] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/model

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-737042079


   **[Test build #132028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132028/testReport)**
 for PR 30471 at commit 
[`e4f8acb`](https://github.com/apache/spark/commit/e4f8acbdb82d762d9323bc0c00d2e1b3993f097d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30471: [SPARK-33520][ML] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/model

2020-12-01 Thread GitBox


SparkQA commented on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-737055896


   **[Test build #132028 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132028/testReport)**
 for PR 30471 at commit 
[`e4f8acb`](https://github.com/apache/spark/commit/e4f8acbdb82d762d9323bc0c00d2e1b3993f097d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30565:
URL: https://github.com/apache/spark/pull/30565#issuecomment-737041757


   **[Test build #132024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132024/testReport)**
 for PR 30565 at commit 
[`368236f`](https://github.com/apache/spark/commit/368236fd73a21dfdc52c2819e7db26427eea523d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-736996100


   **[Test build #132021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132021/testReport)**
 for PR 29966 at commit 
[`9c22882`](https://github.com/apache/spark/commit/9c228823e0be56a87ebc498c254c627babc9db45).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


SparkQA commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-737054001


   **[Test build #132021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132021/testReport)**
 for PR 29966 at commit 
[`9c22882`](https://github.com/apache/spark/commit/9c228823e0be56a87ebc498c254c627babc9db45).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter

2020-12-01 Thread GitBox


SparkQA commented on pull request #30565:
URL: https://github.com/apache/spark/pull/30565#issuecomment-737053937


   **[Test build #132024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132024/testReport)**
 for PR 30565 at commit 
[`368236f`](https://github.com/apache/spark/commit/368236fd73a21dfdc52c2819e7db26427eea523d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-12-01 Thread GitBox


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-737051003


   My comments remain the same. If we can address them (full support on v2 
create table, don't provide the "only" option "create table if exist") in 
DataStreamWriter without making it complicated I'm OK with it. (Though the 
complication looks to worth splitting out.) Both must be addressed - I don't 
think case 2 is rare which can be ignored.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on a change in pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


leanken commented on a change in pull request #30560:
URL: https://github.com/apache/spark/pull/30560#discussion_r533952098



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
##
@@ -160,9 +159,19 @@ trait ExpressionEvalHelper extends 
ScalaCheckDrivenPropertyChecks with PlanTestB
   expectedErrMsg: String): Unit = {
 
 def checkException(eval: => Unit, testMode: String): Unit = {
+  val modes = if (testMode == "non-codegen mode") {
+Seq(CodegenObjectFactoryMode.NO_CODEGEN)
+  } else {
+Seq(CodegenObjectFactoryMode.CODEGEN_ONLY, 
CodegenObjectFactoryMode.NO_CODEGEN)

Review comment:
   should be OK. since setting mode does not affect `evaluateWithoutCodegen`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-12-01 Thread GitBox


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-737051003


   My comments remain the same. If we can address them (full support on v2 
create table, don't provide the "only" option "create table if exist") in 
DataStreamWriter without making it complicated I'm OK with it. (Though the 
complication looks to worth splitting out.) Both must be addressed - I don't 
think case 2 is a rare case which can be ignored.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


cloud-fan commented on a change in pull request #30560:
URL: https://github.com/apache/spark/pull/30560#discussion_r533951436



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
##
@@ -160,9 +159,19 @@ trait ExpressionEvalHelper extends 
ScalaCheckDrivenPropertyChecks with PlanTestB
   expectedErrMsg: String): Unit = {
 
 def checkException(eval: => Unit, testMode: String): Unit = {
+  val modes = if (testMode == "non-codegen mode") {
+Seq(CodegenObjectFactoryMode.NO_CODEGEN)
+  } else {
+Seq(CodegenObjectFactoryMode.CODEGEN_ONLY, 
CodegenObjectFactoryMode.NO_CODEGEN)

Review comment:
   For simplicity, can we always test it with 2 modes?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-12-01 Thread GitBox


HeartSaVioR commented on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-737051003


   My comments remain the same. If we can address them (full support on v2 
create table, don't provide the "only" option "create table if exist") in 
DataStreamWriter without making it complicated I'm OK with it. Both must be 
addressed - I don't think case 2 is a rare case which can be ignored.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-737048806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-737048806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533948414



##
File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
##
@@ -955,6 +978,121 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 .set(EXECUTOR_ALLOW_SPARK_CONTEXT, true)).stop()
 }
   }
+
+  test("SPARK-33084: Add jar support ivy url -- default transitive = false") {
+sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0")
+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
+
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
+
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=true")
+
assert(sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
+  }
+
+  test("SPARK-33084: Add jar support ivy url -- invalid transitive use default 
false") {
+sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=foo")
+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
+assert(!sc.listJars().exists(_.contains("org.slf4j_slf4j-api-1.7.10.jar")))
+
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
+  }
+
+  test("SPARK-33084: Add jar support ivy url -- transitive=true will download 
dependency jars") {
+sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=true")
+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
+assert(sc.listJars().exists(_.contains("org.slf4j_slf4j-api-1.7.10.jar")))
+
assert(sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
+  }
+
+  test("SPARK-33084: Add jar support ivy url -- test exclude param when 
transitive=true") {
+sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0" +
+  "?exclude=commons-lang:commons-lang&transitive=true")
+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
+assert(sc.listJars().exists(_.contains("org.slf4j_slf4j-api-1.7.10.jar")))
+
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
+  }
+
+  test("SPARK-33084: Add jar support ivy url -- test different version") {
+sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0")
+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.6.0")
+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.6.0.jar")))
+  }
+
+  test("SPARK-33084: Add jar support ivy url -- test invalid param") {
+sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?invalidParam=foo")
+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
+  }
+
+  test("SPARK-33084: Add jar support ivy url -- test multiple transitive 
params") {
+sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
+sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?" +
+  "transitive=true&transitive=false&transitive=invalidValue")

Review comment:
   > Could you add tests for ?transitive=true&transitive=invalidValue, too?
   #29966 (comment)
   
   Where?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-736960627


   **[Test build #132014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132014/testReport)**
 for PR 30563 at commit 
[`fef2403`](https://github.com/apache/spark/commit/fef24030b31ebdff15fe3ee003da8c8bc0e6d564).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


SparkQA commented on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-737048008


   **[Test build #132014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132014/testReport)**
 for PR 30563 at commit 
[`fef2403`](https://github.com/apache/spark/commit/fef24030b31ebdff15fe3ee003da8c8bc0e6d564).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work

2020-12-01 Thread GitBox


dongjoon-hyun commented on pull request #30508:
URL: https://github.com/apache/spark/pull/30508#issuecomment-737047712


   Also, cc @viirya , @dbtsai , @sunchao , @srowen , @AngersZh , @mridulm , 
@tgravescs .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func

2020-12-01 Thread GitBox


SparkQA commented on pull request #30243:
URL: https://github.com/apache/spark/pull/30243#issuecomment-737044137


   **[Test build #132030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132030/testReport)**
 for PR 30243 at commit 
[`918e222`](https://github.com/apache/spark/commit/918e222eb45f49845b14f1689e9d606ff414b03a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work

2020-12-01 Thread GitBox


dongjoon-hyun edited a comment on pull request #30508:
URL: https://github.com/apache/spark/pull/30508#issuecomment-737038195


   Hi, @HyukjinKwon .
   Could you review this PR, please? I will reopen SPARK-33212 after merging 
this PR.
   This will recover `hadoop-aws` functionality in Apache Spark 3.1.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533942826



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -15,22 +15,158 @@
  * limitations under the License.
  */
 
-package org.apache.spark.deploy
+package org.apache.spark.util
 
 import java.io.File
-import java.net.URI
+import java.net.{URI, URISyntaxException}
 
 import org.apache.commons.lang3.StringUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, s"Invalid query string: 
$uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+  // Parse transitive parameters (e.g., transitive=true) in an ivy URL, 
default value is false
+  var transitive: Boolean = false
+  groupedParams.get("transitive").foreach { params =>
+if (params.length > 1) {
+  logWarning("It's best to specify `transitive` parameter in ivy URL 
query only once." +
+" If there are multiple `transitive` parameter, we will select the 
last one")
+}
+params.map(_._2).foreach {
+  case "true" => transitive = true
+  case _ => transitive = false
+}
+  }
+  // Parse an excluded list (e.g., 
exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http)
+  // in an ivy URL. When download ivy URL jar, Spark won't download 
transitive jar
+  // in a excluded list.
+  val exclusionList = groupedParams.get("exclude").map { params =>
+params.map(_._2).flatMap { excludeString =>
+  val excludes = excludeString.split(",")
+  if (excludes.map(_.split(":")).exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, "Invalid exclude 
string: " +
+  "expected 'org:module,org:module,..', found " + excludeString)
+  }
+  excludes
+}.mkString(",")
+  }.getOrElse("")
+
+  val invalidParams = groupedParams
+.filter(entry => !Seq("transitive", "exclude").contains(entry._1))
+.keys.toArray.sorted
+  if (invalidParams.nonEmpty) {
+logWarning(
+  s"Invalid parameters `${invalidParams.mkString(",")}` found in URI 
query `$uriQuery`.")
+  }
+
+  groupedParams.foreach { case (key: String, values: Array[(String, 
String)]) =>
+if (key != "transitive" || key != "exclude") {
+  logWarning("Invalid parameter")
+}
+  }
+
+  (transitive, exclusionList)
+}
+  }
+
+  /**
+   * Download Ivy URIs dependency jars.
+   *

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-12-01 Thread GitBox


SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737042534


   **[Test build #132029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132029/testReport)**
 for PR 28781 at commit 
[`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737042348







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737042348







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya removed a comment on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter

2020-12-01 Thread GitBox


viirya removed a comment on pull request #30565:
URL: https://github.com/apache/spark/pull/30565#issuecomment-737029373


   The codegen change is ready for review. I need to make some benchmark code 
too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30471: [SPARK-33520][ML] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/model

2020-12-01 Thread GitBox


SparkQA commented on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-737042079


   **[Test build #132028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132028/testReport)**
 for PR 30471 at commit 
[`e4f8acb`](https://github.com/apache/spark/commit/e4f8acbdb82d762d9323bc0c00d2e1b3993f097d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737000928


   **[Test build #132022 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132022/testReport)**
 for PR 30560 at commit 
[`16940ce`](https://github.com/apache/spark/commit/16940ce89477f1c3839fed13e67b39792cb55fa6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737041582







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30243:
URL: https://github.com/apache/spark/pull/30243#issuecomment-737041584







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737041583







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8

2020-12-01 Thread GitBox


SparkQA commented on pull request #30517:
URL: https://github.com/apache/spark/pull/30517#issuecomment-737041925


   **[Test build #132027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132027/testReport)**
 for PR 30517 at commit 
[`26badc4`](https://github.com/apache/spark/commit/26badc4cc4ea6d68c8c5d50cf2c83e4904aacc0d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


SparkQA commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737041907


   **[Test build #132022 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132022/testReport)**
 for PR 30560 at commit 
[`16940ce`](https://github.com/apache/spark/commit/16940ce89477f1c3839fed13e67b39792cb55fa6).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT

2020-12-01 Thread GitBox


SparkQA commented on pull request #30547:
URL: https://github.com/apache/spark/pull/30547#issuecomment-737041808


   **[Test build #132026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132026/testReport)**
 for PR 30547 at commit 
[`9088635`](https://github.com/apache/spark/commit/908863543372655091b8f6114b8ec5ec0290763e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


SparkQA commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737041791


   **[Test build #132025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132025/testReport)**
 for PR 30560 at commit 
[`6570d70`](https://github.com/apache/spark/commit/6570d70fd5416e5d39bbcf9b68e552ac66ebc8ff).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter

2020-12-01 Thread GitBox


SparkQA commented on pull request #30565:
URL: https://github.com/apache/spark/pull/30565#issuecomment-737041757


   **[Test build #132024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132024/testReport)**
 for PR 30565 at commit 
[`368236f`](https://github.com/apache/spark/commit/368236fd73a21dfdc52c2819e7db26427eea523d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737041583







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30243:
URL: https://github.com/apache/spark/pull/30243#issuecomment-737041584







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737041582







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30558: [SPARK-33612][SQL] Add dataSourceRewriteRules batch to Optimizer

2020-12-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #30558:
URL: https://github.com/apache/spark/pull/30558#discussion_r533940558



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -185,6 +185,9 @@ abstract class Optimizer(catalogManager: CatalogManager)
   RemoveLiteralFromGroupExpressions,
   RemoveRepetitionFromGroupExpressions) :: Nil ++
 operatorOptimizationBatch) :+
+// This batch rewrites data source plans and should be run after the 
operator
+// optimization batch and before any batches that depend on stats.
+Batch("Data Source Rewrite Rules", Once, dataSourceRewriteRules: _*) :+

Review comment:
   Could you propose a name then, @gatorsmile ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30556: [WIP][SPARK-33212][BUILD] Provide hadoop-aws-shaded jar in hadoop-cloud module

2020-12-01 Thread GitBox


dongjoon-hyun commented on pull request #30556:
URL: https://github.com/apache/spark/pull/30556#issuecomment-737040013


   Ya. I'll proceed https://github.com/apache/spark/pull/30508 first since 
Apache Spark 3.1 branch cut is this Friday.
   We can revisit this with later during QA period.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work

2020-12-01 Thread GitBox


dongjoon-hyun edited a comment on pull request #30508:
URL: https://github.com/apache/spark/pull/30508#issuecomment-737038195


   Hi, @HyukjinKwon .
   Could you review this PR, please? I will reopen SPARK-33212 after merging 
this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work

2020-12-01 Thread GitBox


dongjoon-hyun commented on pull request #30508:
URL: https://github.com/apache/spark/pull/30508#issuecomment-737038195


   Hi, @HyukjinKwon .
   Could you review this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-12-01 Thread GitBox


cloud-fan commented on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-737038157


   `DataFrameWriterV2` is very powerful to describe the table writing behavior 
(CREATE, CREATE IF NOT EXISTS, CREATE OR REPLACE, REPLACE, append, overwrite 
where, etc.) and I don't think the current streaming framework can support 
these at the current stage.
   
   Ideally we need to handle these cases:
   1. table exists and users want to write to it
   2. table not exists and users want to fail
   3. table not exists and users want to create it
   
   The current PR can't cover case 2 but I don't know how common it is for 
streaming users. Adding a `DataStreamWriterV2` to  cover case 2 looks an 
overkill to me. One possible solution is to add 2 methods `insertTable` and 
`createAndInsertTable`. If we think case 2 is rare, adding only `toTable` which 
works as `createAndInsertTable` is also fine to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30562: [SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete

2020-12-01 Thread GitBox


cloud-fan commented on a change in pull request #30562:
URL: https://github.com/apache/spark/pull/30562#discussion_r533936837



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##
@@ -28,6 +28,25 @@
  */
 @Evolving
 public interface SupportsDelete {
+
+  /**
+   * Checks whether it is possible to delete data from a data source table 
that matches filter
+   * expressions.
+   * 
+   * Rows should be deleted from the data source iff all of the filter 
expressions match.
+   * That is, the expressions must be interpreted as a set of filters that are 
ANDed together.
+   * 
+   * Spark will call this method to check if the delete is possible without 
significant effort.
+   * Otherwise, Spark will try to rewrite the delete operation and produce 
row-level changes
+   * if the data source table supports deleting individual records.
+   *
+   * @param filters filter expressions, used to select rows to delete when all 
expressions match
+   * @return true if the delete operation can be performed
+   */
+  default boolean canDeleteWhere(Filter[] filters) {
+return true;
+  }
+
   /**
* Delete data from a data source table that matches filter expressions.

Review comment:
   Yea I have the same question.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30243:
URL: https://github.com/apache/spark/pull/30243#issuecomment-736995917


   **[Test build #132020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132020/testReport)**
 for PR 30243 at commit 
[`563622f`](https://github.com/apache/spark/commit/563622f2f38d75e88c53611bf662fcc793afc860).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func

2020-12-01 Thread GitBox


SparkQA commented on pull request #30243:
URL: https://github.com/apache/spark/pull/30243#issuecomment-737036131


   **[Test build #132020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132020/testReport)**
 for PR 30243 at commit 
[`563622f`](https://github.com/apache/spark/commit/563622f2f38d75e88c53611bf662fcc793afc860).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-12-01 Thread GitBox


uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737033196


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533931068



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -15,22 +15,158 @@
  * limitations under the License.
  */
 
-package org.apache.spark.deploy
+package org.apache.spark.util
 
 import java.io.File
-import java.net.URI
+import java.net.{URI, URISyntaxException}
 
 import org.apache.commons.lang3.StringUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, s"Invalid query string: 
$uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+  // Parse transitive parameters (e.g., transitive=true) in an ivy URL, 
default value is false
+  var transitive: Boolean = false
+  groupedParams.get("transitive").foreach { params =>
+if (params.length > 1) {
+  logWarning("It's best to specify `transitive` parameter in ivy URL 
query only once." +
+" If there are multiple `transitive` parameter, we will select the 
last one")
+}
+params.map(_._2).foreach {
+  case "true" => transitive = true
+  case _ => transitive = false
+}
+  }
+  // Parse an excluded list (e.g., 
exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http)
+  // in an ivy URL. When download ivy URL jar, Spark won't download 
transitive jar
+  // in a excluded list.
+  val exclusionList = groupedParams.get("exclude").map { params =>
+params.map(_._2).flatMap { excludeString =>
+  val excludes = excludeString.split(",")
+  if (excludes.map(_.split(":")).exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, "Invalid exclude 
string: " +
+  "expected 'org:module,org:module,..', found " + excludeString)
+  }
+  excludes
+}.mkString(",")
+  }.getOrElse("")
+
+  val invalidParams = groupedParams
+.filter(entry => !Seq("transitive", "exclude").contains(entry._1))
+.keys.toArray.sorted
+  if (invalidParams.nonEmpty) {
+logWarning(
+  s"Invalid parameters `${invalidParams.mkString(",")}` found in URI 
query `$uriQuery`.")
+  }
+
+  groupedParams.foreach { case (key: String, values: Array[(String, 
String)]) =>
+if (key != "transitive" || key != "exclude") {
+  logWarning("Invalid parameter")
+}
+  }
+
+  (transitive, exclusionList)
+}
+  }
+
+  /**
+   * Download Ivy URIs dependency jars.
+   *

[GitHub] [spark] LuciferYang commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT

2020-12-01 Thread GitBox


LuciferYang commented on pull request #30547:
URL: https://github.com/apache/spark/pull/30547#issuecomment-737030101


   4 test cases of the yarn module failed in the GitHub action:
   
   ```
   YarnClusterSuite.run Spark in yarn-client mode with different 
configurations, ensuring redaction
   YarnClusterSuite.run Spark in yarn-cluster mode with different 
configurations, ensuring redaction
   YarnClusterSuite.yarn-cluster should respect conf overrides in 
SparkHadoopUtil (SPARK-16414, SPARK-23630)
   YarnClusterSuite.run Spark in yarn-client mode with additional jar
   ```
   
   but local test success:
   
   ```
   Run completed in 8 minutes, 56 seconds.
   Total number of tests run: 137
   Suites: completed 18, aborted 0
   Tests: succeeded 137, failed 0, canceled 1, ignored 0, pending 0
   All tests passed.
   ```
   Let me check kafka-sql test failed in Jenkins.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #30556: [WIP][SPARK-33212][BUILD] Provide hadoop-aws-shaded jar in hadoop-cloud module

2020-12-01 Thread GitBox


sunchao commented on pull request #30556:
URL: https://github.com/apache/spark/pull/30556#issuecomment-737029894


   Thanks @dongjoon-hyun . This is bad news and it means we'd have to abandon 
the approach in this PR. The only solution seems have to be on the Hadoop side. 
I've opened a [Hadoop PR](https://github.com/apache/hadoop/pull/2510) and 
tested it successfully with the code snippet you pasted above. @steveloughran 
could you take a look there? thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533931068



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -15,22 +15,158 @@
  * limitations under the License.
  */
 
-package org.apache.spark.deploy
+package org.apache.spark.util
 
 import java.io.File
-import java.net.URI
+import java.net.{URI, URISyntaxException}
 
 import org.apache.commons.lang3.StringUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, s"Invalid query string: 
$uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+  // Parse transitive parameters (e.g., transitive=true) in an ivy URL, 
default value is false
+  var transitive: Boolean = false
+  groupedParams.get("transitive").foreach { params =>
+if (params.length > 1) {
+  logWarning("It's best to specify `transitive` parameter in ivy URL 
query only once." +
+" If there are multiple `transitive` parameter, we will select the 
last one")
+}
+params.map(_._2).foreach {
+  case "true" => transitive = true
+  case _ => transitive = false
+}
+  }
+  // Parse an excluded list (e.g., 
exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http)
+  // in an ivy URL. When download ivy URL jar, Spark won't download 
transitive jar
+  // in a excluded list.
+  val exclusionList = groupedParams.get("exclude").map { params =>
+params.map(_._2).flatMap { excludeString =>
+  val excludes = excludeString.split(",")
+  if (excludes.map(_.split(":")).exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, "Invalid exclude 
string: " +
+  "expected 'org:module,org:module,..', found " + excludeString)
+  }
+  excludes
+}.mkString(",")
+  }.getOrElse("")
+
+  val invalidParams = groupedParams
+.filter(entry => !Seq("transitive", "exclude").contains(entry._1))
+.keys.toArray.sorted
+  if (invalidParams.nonEmpty) {
+logWarning(
+  s"Invalid parameters `${invalidParams.mkString(",")}` found in URI 
query `$uriQuery`.")
+  }
+
+  groupedParams.foreach { case (key: String, values: Array[(String, 
String)]) =>
+if (key != "transitive" || key != "exclude") {
+  logWarning("Invalid parameter")
+}
+  }
+
+  (transitive, exclusionList)
+}
+  }
+
+  /**
+   * Download Ivy URIs dependency jars.
+   *

[GitHub] [spark] viirya commented on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter

2020-12-01 Thread GitBox


viirya commented on pull request #30565:
URL: https://github.com/apache/spark/pull/30565#issuecomment-737029373


   The codegen change is ready for review. I need to make some benchmark code 
too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter

2020-12-01 Thread GitBox


viirya opened a new pull request #30565:
URL: https://github.com/apache/spark/pull/30565


   
   
   ### What changes were proposed in this pull request?
   
   
   This patch proposes to enable whole-stage subexpression elimination for 
Filter.
   
   ### Why are the changes needed?
   
   
   We made subexpression elimination available for whole-stage codegen in 
ProjectExec. Another one operator that frequently runs into subexpressions, is 
Filter. We should also make whole-stage codegen subexpression elimination in 
FilterExec too.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Unit test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533928892



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -15,22 +15,158 @@
  * limitations under the License.
  */
 
-package org.apache.spark.deploy
+package org.apache.spark.util
 
 import java.io.File
-import java.net.URI
+import java.net.{URI, URISyntaxException}
 
 import org.apache.commons.lang3.StringUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, s"Invalid query string: 
$uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+  // Parse transitive parameters (e.g., transitive=true) in an ivy URL, 
default value is false
+  var transitive: Boolean = false
+  groupedParams.get("transitive").foreach { params =>
+if (params.length > 1) {
+  logWarning("It's best to specify `transitive` parameter in ivy URL 
query only once." +
+" If there are multiple `transitive` parameter, we will select the 
last one")
+}
+params.map(_._2).foreach {
+  case "true" => transitive = true
+  case _ => transitive = false
+}
+  }
+  // Parse an excluded list (e.g., 
exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http)
+  // in an ivy URL. When download ivy URL jar, Spark won't download 
transitive jar
+  // in a excluded list.
+  val exclusionList = groupedParams.get("exclude").map { params =>
+params.map(_._2).flatMap { excludeString =>
+  val excludes = excludeString.split(",")
+  if (excludes.map(_.split(":")).exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, "Invalid exclude 
string: " +
+  "expected 'org:module,org:module,..', found " + excludeString)
+  }
+  excludes
+}.mkString(",")
+  }.getOrElse("")
+
+  val invalidParams = groupedParams
+.filter(entry => !Seq("transitive", "exclude").contains(entry._1))
+.keys.toArray.sorted
+  if (invalidParams.nonEmpty) {
+logWarning(
+  s"Invalid parameters `${invalidParams.mkString(",")}` found in URI 
query `$uriQuery`.")
+  }

Review comment:
   nit format:
   ```
 val validParams = Set("transitive", "exclude")
 val invalidParams = 
groupedParams.keys.filterNot(validParams.contains).toSeq.sorted
 if (invalidParams.nonEmpty) {
   logWarning(s"Invalid parameters `${invalidParams.mkString(",")}` 
foun

[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533928805



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -15,22 +15,158 @@
  * limitations under the License.
  */
 
-package org.apache.spark.deploy
+package org.apache.spark.util
 
 import java.io.File
-import java.net.URI
+import java.net.{URI, URISyntaxException}
 
 import org.apache.commons.lang3.StringUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, s"Invalid query string: 
$uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+  // Parse transitive parameters (e.g., transitive=true) in an ivy URL, 
default value is false
+  var transitive: Boolean = false
+  groupedParams.get("transitive").foreach { params =>
+if (params.length > 1) {
+  logWarning("It's best to specify `transitive` parameter in ivy URL 
query only once." +
+" If there are multiple `transitive` parameter, we will select the 
last one")
+}
+params.map(_._2).foreach {
+  case "true" => transitive = true
+  case _ => transitive = false
+}
+  }
+  // Parse an excluded list (e.g., 
exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http)
+  // in an ivy URL. When download ivy URL jar, Spark won't download 
transitive jar
+  // in a excluded list.
+  val exclusionList = groupedParams.get("exclude").map { params =>
+params.map(_._2).flatMap { excludeString =>
+  val excludes = excludeString.split(",")
+  if (excludes.map(_.split(":")).exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, "Invalid exclude 
string: " +
+  "expected 'org:module,org:module,..', found " + excludeString)
+  }
+  excludes
+}.mkString(",")
+  }.getOrElse("")
+
+  val invalidParams = groupedParams
+.filter(entry => !Seq("transitive", "exclude").contains(entry._1))
+.keys.toArray.sorted
+  if (invalidParams.nonEmpty) {
+logWarning(
+  s"Invalid parameters `${invalidParams.mkString(",")}` found in URI 
query `$uriQuery`.")
+  }
+
+  groupedParams.foreach { case (key: String, values: Array[(String, 
String)]) =>
+if (key != "transitive" || key != "exclude") {
+  logWarning("Invalid parameter")
+}
+  }

Review comment:
   What's this?





[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-736978616


   **[Test build #132018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132018/testReport)**
 for PR 28781 at commit 
[`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-12-01 Thread GitBox


SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737025797


   **[Test build #132018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132018/testReport)**
 for PR 28781 at commit 
[`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT

2020-12-01 Thread GitBox


HyukjinKwon commented on pull request #30547:
URL: https://github.com/apache/spark/pull/30547#issuecomment-737025493







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737024489







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT

2020-12-01 Thread GitBox


HyukjinKwon commented on pull request #30547:
URL: https://github.com/apache/spark/pull/30547#issuecomment-737025282


   Hm, the test failures look consistent?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533927237



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -15,22 +15,158 @@
  * limitations under the License.
  */
 
-package org.apache.spark.deploy
+package org.apache.spark.util
 
 import java.io.File
-import java.net.URI
+import java.net.{URI, URISyntaxException}
 
 import org.apache.commons.lang3.StringUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, s"Invalid query string: 
$uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+  // Parse transitive parameters (e.g., transitive=true) in an ivy URL, 
default value is false
+  var transitive: Boolean = false
+  groupedParams.get("transitive").foreach { params =>
+if (params.length > 1) {
+  logWarning("It's best to specify `transitive` parameter in ivy URL 
query only once." +
+" If there are multiple `transitive` parameter, we will select the 
last one")
+}
+params.map(_._2).foreach {
+  case "true" => transitive = true
+  case _ => transitive = false
+}
+  }
+  // Parse an excluded list (e.g., 
exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http)
+  // in an ivy URL. When download ivy URL jar, Spark won't download 
transitive jar
+  // in a excluded list.
+  val exclusionList = groupedParams.get("exclude").map { params =>
+params.map(_._2).flatMap { excludeString =>
+  val excludes = excludeString.split(",")
+  if (excludes.map(_.split(":")).exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {

Review comment:
   How about making a helper function for this check? 
https://github.com/apache/spark/pull/29966/files#diff-3e9f71e7d80c1dc7d02b0edef611de280f219789f0d2b282887f07e999020024R74-R75





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737024489







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR closed pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


HeartSaVioR closed pull request #30563:
URL: https://github.com/apache/spark/pull/30563


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


HeartSaVioR commented on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-737022204


   Thanks! Merging to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-737021958







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-737021958







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533924963



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1890,47 +1890,66 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  var schema = ""
+  val keys = if (path.contains("\\") && Utils.isWindows) {
 // For local paths with backslashes on Windows, URI throws an exception
 addLocalJarFile(new File(path))
   } else {
 val uri = new Path(path).toUri
 // SPARK-17650: Make sure this is a valid URL before adding it to the 
list of dependencies
 Utils.validateURL(uri)
-uri.getScheme match {
+schema = uri.getScheme
+schema match {
   // A JAR file which exists only on the driver node
   case null =>
 // SPARK-22585 path without schema is not url encoded
 addLocalJarFile(new File(uri.getPath))
   // A JAR file which exists only on the driver node
   case "file" => addLocalJarFile(new File(uri.getPath))
   // A JAR file which exists locally on every worker node
-  case "local" => "file:" + uri.getPath
+  case "local" => Seq("file:" + uri.getPath)
+  case "ivy" =>
+// Since `new Path(path).toUri` will lose query information,
+// so here we use `URI.create(path)`
+DependencyUtils.resolveMavenDependencies(URI.create(path))
   case _ => checkRemoteJarFile(path)
 }
   }
-  if (key != null) {
+  if (keys.nonEmpty) {
 val timestamp = if (addedOnSubmit) startTime else 
System.currentTimeMillis
-if (addedJars.putIfAbsent(key, timestamp).isEmpty) {
-  logInfo(s"Added JAR $path at $key with timestamp $timestamp")
+val (added, existed) = keys.partition(addedJars.putIfAbsent(_, 
timestamp).isEmpty)
+if (added.nonEmpty) {
+  if (schema != "ivy") {
+logInfo(s"Added JAR $path at ${added.mkString(",")} with timestamp 
$timestamp")
+  } else {
+logInfo(s"Added dependency jars of ivy uri $path at 
${added.mkString(",")}" +
+  s" with timestamp $timestamp")
+  }
   postEnvironmentUpdate()
-} else {
-  logWarning(s"The jar $path has been added already. Overwriting of 
added jars " +
-"is not supported in the current version.")
+}
+if (existed.nonEmpty) {
+  if (schema != "ivy") {
+logWarning(s"The jar $path has been added already. Overwriting of 
added jars " +
+  "is not supported in the current version.")
+  } else {
+logWarning(s"The dependency jars of ivy URI with $path at" +
+  s" ${existed.mkString(",")} has been added already." +
+  s" Overwriting of added jars is not supported in the current 
version.")

Review comment:
   nit: remove `s`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-737021222







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-737021222







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533924173



##
File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
##
@@ -15,22 +15,158 @@
  * limitations under the License.
  */
 
-package org.apache.spark.deploy
+package org.apache.spark.util
 
 import java.io.File
-import java.net.URI
+import java.net.{URI, URISyntaxException}
 
 import org.apache.commons.lang3.StringUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
+import org.apache.spark.deploy.SparkSubmitUtils
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.{MutableURLClassLoader, Utils}
 
-private[deploy] object DependencyUtils extends Logging {
+case class IvyProperties(
+packagesExclusions: String,
+packages: String,
+repositories: String,
+ivyRepoPath: String,
+ivySettingsPath: String)
+
+private[spark] object DependencyUtils extends Logging {
+
+  def getIvyProperties(): IvyProperties = {
+val Seq(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath) = Seq(
+  "spark.jars.excludes",
+  "spark.jars.packages",
+  "spark.jars.repositories",
+  "spark.jars.ivy",
+  "spark.jars.ivySettings"
+).map(sys.props.get(_).orNull)
+IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, 
ivySettingsPath)
+  }
+
+  /**
+   * Parse URI query string's parameter value of `transitive` and `exclude`.
+   * Other invalid parameters will be ignored.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Tuple value of parameter `transitive` and `exclude` value.
+   *
+   * 1. transitive: whether to download dependency jar of ivy URI, 
default value is false
+   *and this parameter value is case-sensitive. Invalid value will 
be treat as false.
+   *Example: Input:  
exclude=org.mortbay.jetty:jetty&transitive=true
+   *Output:  true
+   *
+   * 2. exclude: comma separated exclusions to apply when resolving 
transitive dependencies,
+   *consists of `group:module` pairs separated by commas.
+   *Example: Input:  
excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   *Output:  [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http]
+   */
+  private def parseQueryParams(uri: URI): (Boolean, String) = {
+val uriQuery = uri.getQuery
+if (uriQuery == null) {
+  (false, "")
+} else {
+  val mapTokens = uriQuery.split("&").map(_.split("="))
+  if (mapTokens.exists(token =>
+token.length != 2 || StringUtils.isBlank(token(0)) || 
StringUtils.isBlank(token(1 {
+throw new URISyntaxException(uri.toString, s"Invalid query string: 
$uriQuery")
+  }
+  val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
+  // Parse transitive parameters (e.g., transitive=true) in an ivy URL, 
default value is false
+  var transitive: Boolean = false

Review comment:
   nit: we don't need the type: `var transitive  = false`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533923652



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1890,47 +1890,66 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  var schema = ""
+  val keys = if (path.contains("\\") && Utils.isWindows) {
 // For local paths with backslashes on Windows, URI throws an exception
 addLocalJarFile(new File(path))
   } else {
 val uri = new Path(path).toUri
 // SPARK-17650: Make sure this is a valid URL before adding it to the 
list of dependencies
 Utils.validateURL(uri)
-uri.getScheme match {
+schema = uri.getScheme
+schema match {
   // A JAR file which exists only on the driver node
   case null =>
 // SPARK-22585 path without schema is not url encoded
 addLocalJarFile(new File(uri.getPath))
   // A JAR file which exists only on the driver node
   case "file" => addLocalJarFile(new File(uri.getPath))
   // A JAR file which exists locally on every worker node
-  case "local" => "file:" + uri.getPath
+  case "local" => Seq("file:" + uri.getPath)
+  case "ivy" =>
+// Since `new Path(path).toUri` will lose query information,
+// so here we use `URI.create(path)`
+DependencyUtils.resolveMavenDependencies(URI.create(path))
   case _ => checkRemoteJarFile(path)
 }
   }
-  if (key != null) {
+  if (keys.nonEmpty) {
 val timestamp = if (addedOnSubmit) startTime else 
System.currentTimeMillis
-if (addedJars.putIfAbsent(key, timestamp).isEmpty) {
-  logInfo(s"Added JAR $path at $key with timestamp $timestamp")
+val (added, existed) = keys.partition(addedJars.putIfAbsent(_, 
timestamp).isEmpty)
+if (added.nonEmpty) {
+  if (schema != "ivy") {
+logInfo(s"Added JAR $path at ${added.mkString(",")} with timestamp 
$timestamp")
+  } else {
+logInfo(s"Added dependency jars of ivy uri $path at 
${added.mkString(",")}" +
+  s" with timestamp $timestamp")

Review comment:
   nit: 
   ```
 val jarMessage = if (schema != "ivy") "JAR" else "dependency jars 
of ivy uri"
 logInfo(s"Added $jarMessage $path at ${added.mkString(",")} with 
timestamp $timestamp")
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30243:
URL: https://github.com/apache/spark/pull/30243#issuecomment-737020328







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30243:
URL: https://github.com/apache/spark/pull/30243#issuecomment-737020328







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-736941564


   **[Test build #132011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132011/testReport)**
 for PR 30563 at commit 
[`27b43f4`](https://github.com/apache/spark/commit/27b43f4e51d5223ab0ac2b52a969eb71e39fcf94).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite

2020-12-01 Thread GitBox


SparkQA commented on pull request #30563:
URL: https://github.com/apache/spark/pull/30563#issuecomment-737019934


   **[Test build #132011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132011/testReport)**
 for PR 30563 at commit 
[`27b43f4`](https://github.com/apache/spark/commit/27b43f4e51d5223ab0ac2b52a969eb71e39fcf94).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30564: [SPARK-32670][SQL][FOLLOWUP] Group exception messages in Catalyst Analyzer in one file

2020-12-01 Thread GitBox


SparkQA commented on pull request #30564:
URL: https://github.com/apache/spark/pull/30564#issuecomment-737019310


   **[Test build #132023 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132023/testReport)**
 for PR 30564 at commit 
[`086bba3`](https://github.com/apache/spark/commit/086bba330ffeba1856e90962872b87a710326631).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


leanken edited a comment on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737018589


   > ```
   >  org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite.Consistent 
error handling for datetime formatting and parsing functions16 ms   1
   >  
org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23594 
GetExternalRowField should support interpreted execution 49 ms   1
   >  
org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23595 
ValidateExternalType should support interpreted execution0.64 sec1
   >  
org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtract   
  0.25 sec1
   >  
org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtractAll
   > ```
   > 
   > It seems the five errors? If so, IMO its okay to fix all of them at once. 
Anyway, nice catch!
   
   it should be zero error after the final patch 16940ce. please ignore the 
former test result ^_^



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer opened a new pull request #30564: [SPARK-32670][SQL][FOLLOWUP] Group exception messages in Catalyst Analyzer in one file

2020-12-01 Thread GitBox


beliefer opened a new pull request #30564:
URL: https://github.com/apache/spark/pull/30564


   ### What changes were proposed in this pull request?
   This PR follows up https://github.com/apache/spark/pull/29497.
   Because https://github.com/apache/spark/pull/29497 just give us an example 
to group all `AnalysisExcpetion` in Analyzer into QueryCompilationErrors.
   This PR group other `AnalysisExcpetion` into QueryCompilationErrors.
   
   
   ### Why are the changes needed?
   It will largely help with standardization of error messages and its 
maintenance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No. Error messages remain unchanged.
   
   
   ### How was this patch tested?
   No new tests - pass all original tests to make sure it doesn't break any 
existing behavior.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


leanken commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737018589


   > ```
   >  org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite.Consistent 
error handling for datetime formatting and parsing functions16 ms   1
   >  
org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23594 
GetExternalRowField should support interpreted execution 49 ms   1
   >  
org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23595 
ValidateExternalType should support interpreted execution0.64 sec1
   >  
org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtract   
  0.25 sec1
   >  
org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtractAll
   > ```
   > 
   > It seems the five errors? If so, IMO its okay to fix all of them at once. 
Anyway, nice catch!
   
   it should be zero error after the final patch. please ignore the former test 
result ^_^



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


maropu commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r533920784



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1890,47 +1890,66 @@ class SparkContext(config: SparkConf) extends Logging {
 throw new IllegalArgumentException(
   s"Directory ${path} is not allowed for addJar")
   }
-  path
+  Seq(path)
 } catch {
   case NonFatal(e) =>
 logError(s"Failed to add $path to Spark environment", e)
-null
+Nil
 }
   } else {
-path
+Seq(path)
   }
 }
 
 if (path == null || path.isEmpty) {
   logWarning("null or empty path specified as parameter to addJar")
 } else {
-  val key = if (path.contains("\\") && Utils.isWindows) {
+  var schema = ""

Review comment:
   `val schema = uri.getScheme`? Please avoid to use `var` where possible.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-737016158







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737016161







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8

2020-12-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30517:
URL: https://github.com/apache/spark/pull/30517#issuecomment-737016159







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30517:
URL: https://github.com/apache/spark/pull/30517#issuecomment-737016159







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737016161







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


AmplabJenkins commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-737016158







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28036: [SPARK-26341][CORE]Expose executor memory metrics at the stage level, in the Stages tab

2020-12-01 Thread GitBox


AngersZh commented on a change in pull request #28036:
URL: https://github.com/apache/spark/pull/28036#discussion_r533919517



##
File path: core/src/main/resources/org/apache/spark/ui/static/stagepage.js
##
@@ -288,14 +298,18 @@ $(document).ready(function () {
 "" +
 "" +
 " Select 
All" +
-" Scheduler 
Delay" +
-" Task Deserialization 
Time" +
-" Shuffle Read Blocked 
Time" +
-" Shuffle Remote Reads" +
-" Shuffle Write Time" +
-" Result Serialization 
Time" +
-" Getting Result Time" +
-" Peak Execution Memory" +
+" Scheduler Delay" +
+" Task 
Deserialization Time" +
+" Shuffle 
Read Blocked Time" +
+" Shuffle 
Remote Reads" +
+" Shuffle 
Write Time" +
+" Result 
Serialization Time" +
+" Getting 
Result Time" +
+" Peak 
Execution Memory" +
+" Executor  JVMOnHeapMemory / JVMOffHeapMemory" +

Review comment:
   > @ron8hu @AngersZh If you see the data-column and the id 
attributes, the value in those attributes corresponds to the column placement 
in the tables present in the stages page. Currently, the values for the four 
new columns are overlapping with the other columns, thus, creating errors while 
loading the page. Try fixing the values and testing the same. If it is still 
not working, feel free to ping me and I can look into it with you more.
   
   They are two different table selector so current `data-column` is not 
conflict.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


maropu commented on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-737014904


   ```
org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite.Consistent 
error handling for datetime formatting and parsing functions  16 ms   1

org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23594 
GetExternalRowField should support interpreted execution   49 ms   1

org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23595 
ValidateExternalType should support interpreted execution  0.64 sec1

org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtract   
0.25 sec1

org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtractAll
   ```
   It seems the five errors? If so, IMO its okay to fix all of them at once. 
Anyway, nice catch!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-736966702


   **[Test build #132017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132017/testReport)**
 for PR 29966 at commit 
[`bdc5035`](https://github.com/apache/spark/commit/bdc50356077c9c9db1cbfab249083053dbe1a7dd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-12-01 Thread GitBox


SparkQA commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-737014366


   **[Test build #132017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132017/testReport)**
 for PR 29966 at commit 
[`bdc5035`](https://github.com/apache/spark/commit/bdc50356077c9c9db1cbfab249083053dbe1a7dd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class IvyProperties(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #30559: [SPARK-33617][SQL] spark.sql.files.minPartitionNum effective for LocalTableScan

2020-12-01 Thread GitBox


maropu commented on a change in pull request #30559:
URL: https://github.com/apache/spark/pull/30559#discussion_r533917581



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/LocalTableScanExec.scala
##
@@ -49,7 +49,9 @@ case class LocalTableScanExec(
 if (rows.isEmpty) {
   sqlContext.sparkContext.emptyRDD
 } else {
-  val numSlices = math.min(unsafeRows.length, 
sqlContext.sparkContext.defaultParallelism)
+  val numSlices = math.min(
+unsafeRows.length,
+
conf.filesMinPartitionNum.getOrElse(sqlContext.sparkContext.defaultParallelism))

Review comment:
   hm, how about adding a new config to control each partition size? I'm 
not sure about why `numSlice` depends on the length of `unsafeRows`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30517:
URL: https://github.com/apache/spark/pull/30517#issuecomment-736928681


   **[Test build #132010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132010/testReport)**
 for PR 30517 at commit 
[`c617757`](https://github.com/apache/spark/commit/c617757ec165163b85e284304340ee0c37ae12e5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8

2020-12-01 Thread GitBox


SparkQA commented on pull request #30517:
URL: https://github.com/apache/spark/pull/30517#issuecomment-737007775


   **[Test build #132010 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132010/testReport)**
 for PR 30517 at commit 
[`c617757`](https://github.com/apache/spark/commit/c617757ec165163b85e284304340ee0c37ae12e5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error

2020-12-01 Thread GitBox


SparkQA removed a comment on pull request #30560:
URL: https://github.com/apache/spark/pull/30560#issuecomment-736960640


   **[Test build #132015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132015/testReport)**
 for PR 30560 at commit 
[`6d4fb1d`](https://github.com/apache/spark/commit/6d4fb1d287965b1123d204f9a98b31dd249c1617).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30473: [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog

2020-12-01 Thread GitBox


cloud-fan commented on a change in pull request #30473:
URL: https://github.com/apache/spark/pull/30473#discussion_r533911399



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCNamespaceTest.scala
##
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc.v2
+
+import scala.collection.JavaConverters._
+
+import org.apache.log4j.Level
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.connector.catalog.NamespaceChange
+import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.DockerTest
+
+@DockerTest
+private[v2] trait V2JDBCNamespaceTest extends SharedSparkSession {
+  val catalog = new JDBCTableCatalog()
+
+  test("listNamespaces: basic behavior") {
+catalog.createNamespace(Array("foo"), Map("comment" -> "test 
comment").asJava)
+assert(catalog.listNamespaces() ===
+  Array(Array("foo"), Array("information_schema"), Array("pg_catalog"), 
Array("public")))

Review comment:
   this seems pgsql specific.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30473: [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog

2020-12-01 Thread GitBox


cloud-fan commented on a change in pull request #30473:
URL: https://github.com/apache/spark/pull/30473#discussion_r533910906



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
##
@@ -171,6 +175,130 @@ class JDBCTableCatalog extends TableCatalog with Logging {
 }
   }
 
+  override def namespaceExists(namespace: Array[String]): Boolean = namespace 
match {
+case Array(db) =>
+  withConnection { conn =>
+val rs = conn.getMetaData.getSchemas(null, db)
+while (rs.next()) {
+  if (rs.getString(1) == db) return true;
+}
+false
+  }
+case _ => false
+  }
+
+  override def listNamespaces(): Array[Array[String]] = {
+withConnection { conn =>
+  val schemaBuilder = ArrayBuilder.make[Array[String]]
+  val rs = conn.getMetaData.getSchemas()
+  while (rs.next()) {
+schemaBuilder += Array(rs.getString(1))
+  }
+  schemaBuilder.result
+}
+  }
+
+  override def listNamespaces(namespace: Array[String]): Array[Array[String]] 
= {
+namespace match {
+  case Array() =>
+listNamespaces()
+  case Array(db) if namespaceExists(namespace) =>
+Array()
+  case _ =>
+throw new NoSuchNamespaceException(namespace)
+}
+  }
+
+  override def loadNamespaceMetadata(namespace: Array[String]): 
util.Map[String, String] = {
+namespace match {
+  case Array(db) =>
+if (!namespaceExists(namespace)) throw new NoSuchNamespaceException(db)
+mutable.HashMap[String, String]().asJava
+
+  case _ =>
+throw new NoSuchNamespaceException(namespace)
+}
+  }
+
+  override def createNamespace(
+  namespace: Array[String],
+  metadata: util.Map[String, String]): Unit = namespace match {
+case Array(db) if !namespaceExists(namespace) =>
+  var comment = ""
+  if (!metadata.isEmpty) {
+metadata.asScala.map {
+  case (k, v) => k match {
+case SupportsNamespaces.PROP_COMMENT => comment = v
+case SupportsNamespaces.PROP_OWNER => // ignore
+case SupportsNamespaces.PROP_LOCATION =>
+  throw new AnalysisException("CREATE NAMESPACE ... LOCATION ... 
is not supported in" +
+" JDBC catalog.")
+case _ => // ignore all the other properties for now
+  }
+}
+  }
+  withConnection { conn =>
+classifyException(s"Failed create name space: $db") {
+  JdbcUtils.createNamespace(conn, options, db, comment)
+}
+  }
+
+case Array(_) =>
+  throw new NamespaceAlreadyExistsException(namespace)
+
+case _ =>
+  throw new IllegalArgumentException(s"Invalid namespace name: 
${namespace.quoted}")
+  }
+
+  override def alterNamespace(namespace: Array[String], changes: 
NamespaceChange*): Unit = {
+namespace match {
+  case Array(db) =>
+changes.foreach {
+  case set: NamespaceChange.SetProperty =>
+// ignore changes other than comments
+if (set.property() == SupportsNamespaces.PROP_COMMENT) {
+  withConnection { conn =>
+JdbcUtils.createNamespaceComment(conn, options, db, set.value)
+  }
+}
+
+  case unset: NamespaceChange.RemoveProperty =>
+// ignore changes other than comments
+if (unset.property() == SupportsNamespaces.PROP_COMMENT) {
+  withConnection { conn =>
+JdbcUtils.removeNamespaceComment(conn, options, db)
+  }
+}
+
+  case _ =>
+throw new SQLFeatureNotSupportedException(s"Unsupported 
NamespaceChange $changes")
+}
+
+  case _ =>
+throw new NoSuchNamespaceException(namespace)
+}
+  }
+
+  override def dropNamespace(namespace: Array[String]): Boolean = namespace 
match {
+case Array(db) if namespaceExists(namespace) =>
+  if (listTables(Array(db)).nonEmpty) {
+throw new IllegalStateException(s"Namespace ${namespace.quoted} is not 
empty")
+  }
+  withConnection { conn =>
+classifyException(s"Failed drop name space: $db") {
+  JdbcUtils.dropNamespace(conn, options, db)
+  true
+}
+  }
+
+case Array(_) =>
+  // exists returned false

Review comment:
   and shall we fail?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30473: [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog

2020-12-01 Thread GitBox


cloud-fan commented on a change in pull request #30473:
URL: https://github.com/apache/spark/pull/30473#discussion_r533910609



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
##
@@ -171,6 +175,130 @@ class JDBCTableCatalog extends TableCatalog with Logging {
 }
   }
 
+  override def namespaceExists(namespace: Array[String]): Boolean = namespace 
match {
+case Array(db) =>
+  withConnection { conn =>
+val rs = conn.getMetaData.getSchemas(null, db)
+while (rs.next()) {
+  if (rs.getString(1) == db) return true;
+}
+false
+  }
+case _ => false
+  }
+
+  override def listNamespaces(): Array[Array[String]] = {
+withConnection { conn =>
+  val schemaBuilder = ArrayBuilder.make[Array[String]]
+  val rs = conn.getMetaData.getSchemas()
+  while (rs.next()) {
+schemaBuilder += Array(rs.getString(1))
+  }
+  schemaBuilder.result
+}
+  }
+
+  override def listNamespaces(namespace: Array[String]): Array[Array[String]] 
= {
+namespace match {
+  case Array() =>
+listNamespaces()
+  case Array(db) if namespaceExists(namespace) =>
+Array()
+  case _ =>
+throw new NoSuchNamespaceException(namespace)
+}
+  }
+
+  override def loadNamespaceMetadata(namespace: Array[String]): 
util.Map[String, String] = {
+namespace match {
+  case Array(db) =>
+if (!namespaceExists(namespace)) throw new NoSuchNamespaceException(db)
+mutable.HashMap[String, String]().asJava
+
+  case _ =>
+throw new NoSuchNamespaceException(namespace)
+}
+  }
+
+  override def createNamespace(
+  namespace: Array[String],
+  metadata: util.Map[String, String]): Unit = namespace match {
+case Array(db) if !namespaceExists(namespace) =>
+  var comment = ""
+  if (!metadata.isEmpty) {
+metadata.asScala.map {
+  case (k, v) => k match {
+case SupportsNamespaces.PROP_COMMENT => comment = v
+case SupportsNamespaces.PROP_OWNER => // ignore
+case SupportsNamespaces.PROP_LOCATION =>
+  throw new AnalysisException("CREATE NAMESPACE ... LOCATION ... 
is not supported in" +
+" JDBC catalog.")
+case _ => // ignore all the other properties for now
+  }
+}
+  }
+  withConnection { conn =>
+classifyException(s"Failed create name space: $db") {
+  JdbcUtils.createNamespace(conn, options, db, comment)
+}
+  }
+
+case Array(_) =>
+  throw new NamespaceAlreadyExistsException(namespace)
+
+case _ =>
+  throw new IllegalArgumentException(s"Invalid namespace name: 
${namespace.quoted}")
+  }
+
+  override def alterNamespace(namespace: Array[String], changes: 
NamespaceChange*): Unit = {
+namespace match {
+  case Array(db) =>
+changes.foreach {
+  case set: NamespaceChange.SetProperty =>
+// ignore changes other than comments
+if (set.property() == SupportsNamespaces.PROP_COMMENT) {
+  withConnection { conn =>
+JdbcUtils.createNamespaceComment(conn, options, db, set.value)
+  }
+}
+
+  case unset: NamespaceChange.RemoveProperty =>
+// ignore changes other than comments
+if (unset.property() == SupportsNamespaces.PROP_COMMENT) {
+  withConnection { conn =>
+JdbcUtils.removeNamespaceComment(conn, options, db)
+  }
+}
+
+  case _ =>
+throw new SQLFeatureNotSupportedException(s"Unsupported 
NamespaceChange $changes")
+}
+
+  case _ =>
+throw new NoSuchNamespaceException(namespace)
+}
+  }
+
+  override def dropNamespace(namespace: Array[String]): Boolean = namespace 
match {
+case Array(db) if namespaceExists(namespace) =>
+  if (listTables(Array(db)).nonEmpty) {
+throw new IllegalStateException(s"Namespace ${namespace.quoted} is not 
empty")
+  }
+  withConnection { conn =>
+classifyException(s"Failed drop name space: $db") {
+  JdbcUtils.dropNamespace(conn, options, db)
+  true
+}
+  }
+
+case Array(_) =>
+  // exists returned false

Review comment:
   not exists





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >