[GitHub] [spark] LuciferYang edited a comment on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT
LuciferYang edited a comment on pull request #30547: URL: https://github.com/apache/spark/pull/30547#issuecomment-737057477 All failed cases in Jenkins belong to kafka10-sql module, but local test succcess: ``` mvn clean install -pl external/kafka-0-10-sql Run completed in 11 minutes, 42 seconds. Total number of tests run: 260 Suites: completed 26, aborted 0 Tests: succeeded 260, failed 0, canceled 0, ignored 4, pending 0 ``` Let me merge with master and retest these This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #30562: [SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete
aokolnychyi commented on a change in pull request #30562: URL: https://github.com/apache/spark/pull/30562#discussion_r533962113 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java ## @@ -28,6 +28,25 @@ */ @Evolving public interface SupportsDelete { + + /** + * Checks whether it is possible to delete data from a data source table that matches filter + * expressions. + * + * Rows should be deleted from the data source iff all of the filter expressions match. + * That is, the expressions must be interpreted as a set of filters that are ANDed together. + * + * Spark will call this method to check if the delete is possible without significant effort. + * Otherwise, Spark will try to rewrite the delete operation and produce row-level changes + * if the data source table supports deleting individual records. + * + * @param filters filter expressions, used to select rows to delete when all expressions match + * @return true if the delete operation can be performed + */ + default boolean canDeleteWhere(Filter[] filters) { +return true; Review comment: That's correct and the method returns `true` to keep the old behavior by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang edited a comment on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT
LuciferYang edited a comment on pull request #30547: URL: https://github.com/apache/spark/pull/30547#issuecomment-737030101 4 test cases of the yarn module failed in the GitHub action: ``` YarnClusterSuite.run Spark in yarn-client mode with different configurations, ensuring redaction YarnClusterSuite.run Spark in yarn-cluster mode with different configurations, ensuring redaction YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630) YarnClusterSuite.run Spark in yarn-client mode with additional jar ``` but local test success: ``` mvn clean install -pl resource-managers/yarn -Pyarn Run completed in 8 minutes, 56 seconds. Total number of tests run: 137 Suites: completed 18, aborted 0 Tests: succeeded 137, failed 0, canceled 1, ignored 0, pending 0 All tests passed. ``` Let me check kafka-sql test failed in Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT
LuciferYang commented on pull request #30547: URL: https://github.com/apache/spark/pull/30547#issuecomment-737057477 All failed cases in Jenkins belong to kafka10-sql module, but local test succcess: ``` mvn clean install -pl external/kafka-0-10-sql Run completed in 11 minutes, 42 seconds. Total number of tests run: 260 Suites: completed 26, aborted 0 Tests: succeeded 260, failed 0, canceled 0, ignored 4, pending 0 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30471: [SPARK-33520][ML] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/model
SparkQA removed a comment on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-737042079 **[Test build #132028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132028/testReport)** for PR 30471 at commit [`e4f8acb`](https://github.com/apache/spark/commit/e4f8acbdb82d762d9323bc0c00d2e1b3993f097d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30471: [SPARK-33520][ML] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/model
SparkQA commented on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-737055896 **[Test build #132028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132028/testReport)** for PR 30471 at commit [`e4f8acb`](https://github.com/apache/spark/commit/e4f8acbdb82d762d9323bc0c00d2e1b3993f097d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter
SparkQA removed a comment on pull request #30565: URL: https://github.com/apache/spark/pull/30565#issuecomment-737041757 **[Test build #132024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132024/testReport)** for PR 30565 at commit [`368236f`](https://github.com/apache/spark/commit/368236fd73a21dfdc52c2819e7db26427eea523d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
SparkQA removed a comment on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-736996100 **[Test build #132021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132021/testReport)** for PR 29966 at commit [`9c22882`](https://github.com/apache/spark/commit/9c228823e0be56a87ebc498c254c627babc9db45). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
SparkQA commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-737054001 **[Test build #132021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132021/testReport)** for PR 29966 at commit [`9c22882`](https://github.com/apache/spark/commit/9c228823e0be56a87ebc498c254c627babc9db45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter
SparkQA commented on pull request #30565: URL: https://github.com/apache/spark/pull/30565#issuecomment-737053937 **[Test build #132024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132024/testReport)** for PR 30565 at commit [`368236f`](https://github.com/apache/spark/commit/368236fd73a21dfdc52c2819e7db26427eea523d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-737051003 My comments remain the same. If we can address them (full support on v2 create table, don't provide the "only" option "create table if exist") in DataStreamWriter without making it complicated I'm OK with it. (Though the complication looks to worth splitting out.) Both must be addressed - I don't think case 2 is rare which can be ignored. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
leanken commented on a change in pull request #30560: URL: https://github.com/apache/spark/pull/30560#discussion_r533952098 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ## @@ -160,9 +159,19 @@ trait ExpressionEvalHelper extends ScalaCheckDrivenPropertyChecks with PlanTestB expectedErrMsg: String): Unit = { def checkException(eval: => Unit, testMode: String): Unit = { + val modes = if (testMode == "non-codegen mode") { +Seq(CodegenObjectFactoryMode.NO_CODEGEN) + } else { +Seq(CodegenObjectFactoryMode.CODEGEN_ONLY, CodegenObjectFactoryMode.NO_CODEGEN) Review comment: should be OK. since setting mode does not affect `evaluateWithoutCodegen` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-737051003 My comments remain the same. If we can address them (full support on v2 create table, don't provide the "only" option "create table if exist") in DataStreamWriter without making it complicated I'm OK with it. (Though the complication looks to worth splitting out.) Both must be addressed - I don't think case 2 is a rare case which can be ignored. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
cloud-fan commented on a change in pull request #30560: URL: https://github.com/apache/spark/pull/30560#discussion_r533951436 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala ## @@ -160,9 +159,19 @@ trait ExpressionEvalHelper extends ScalaCheckDrivenPropertyChecks with PlanTestB expectedErrMsg: String): Unit = { def checkException(eval: => Unit, testMode: String): Unit = { + val modes = if (testMode == "non-codegen mode") { +Seq(CodegenObjectFactoryMode.NO_CODEGEN) + } else { +Seq(CodegenObjectFactoryMode.CODEGEN_ONLY, CodegenObjectFactoryMode.NO_CODEGEN) Review comment: For simplicity, can we always test it with 2 modes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR commented on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-737051003 My comments remain the same. If we can address them (full support on v2 create table, don't provide the "only" option "create table if exist") in DataStreamWriter without making it complicated I'm OK with it. Both must be addressed - I don't think case 2 is a rare case which can be ignored. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
AmplabJenkins removed a comment on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-737048806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
AmplabJenkins commented on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-737048806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533948414 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -955,6 +978,121 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu .set(EXECUTOR_ALLOW_SPARK_CONTEXT, true)).stop() } } + + test("SPARK-33084: Add jar support ivy url -- default transitive = false") { +sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0") + assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar"))) + assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar"))) + +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=true") + assert(sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar"))) + } + + test("SPARK-33084: Add jar support ivy url -- invalid transitive use default false") { +sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=foo") + assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar"))) +assert(!sc.listJars().exists(_.contains("org.slf4j_slf4j-api-1.7.10.jar"))) + assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar"))) + } + + test("SPARK-33084: Add jar support ivy url -- transitive=true will download dependency jars") { +sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=true") + assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar"))) +assert(sc.listJars().exists(_.contains("org.slf4j_slf4j-api-1.7.10.jar"))) + assert(sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar"))) + } + + test("SPARK-33084: Add jar support ivy url -- test exclude param when transitive=true") { +sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0" + + "?exclude=commons-lang:commons-lang&transitive=true") + assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar"))) +assert(sc.listJars().exists(_.contains("org.slf4j_slf4j-api-1.7.10.jar"))) + assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar"))) + } + + test("SPARK-33084: Add jar support ivy url -- test different version") { +sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0") + assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar"))) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.6.0") + assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.6.0.jar"))) + } + + test("SPARK-33084: Add jar support ivy url -- test invalid param") { +sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?invalidParam=foo") + assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar"))) + } + + test("SPARK-33084: Add jar support ivy url -- test multiple transitive params") { +sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) +sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?" + + "transitive=true&transitive=false&transitive=invalidValue") Review comment: > Could you add tests for ?transitive=true&transitive=invalidValue, too? #29966 (comment) Where? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
SparkQA removed a comment on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-736960627 **[Test build #132014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132014/testReport)** for PR 30563 at commit [`fef2403`](https://github.com/apache/spark/commit/fef24030b31ebdff15fe3ee003da8c8bc0e6d564). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
SparkQA commented on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-737048008 **[Test build #132014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132014/testReport)** for PR 30563 at commit [`fef2403`](https://github.com/apache/spark/commit/fef24030b31ebdff15fe3ee003da8c8bc0e6d564). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work
dongjoon-hyun commented on pull request #30508: URL: https://github.com/apache/spark/pull/30508#issuecomment-737047712 Also, cc @viirya , @dbtsai , @sunchao , @srowen , @AngersZh , @mridulm , @tgravescs . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-737044137 **[Test build #132030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132030/testReport)** for PR 30243 at commit [`918e222`](https://github.com/apache/spark/commit/918e222eb45f49845b14f1689e9d606ff414b03a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work
dongjoon-hyun edited a comment on pull request #30508: URL: https://github.com/apache/spark/pull/30508#issuecomment-737038195 Hi, @HyukjinKwon . Could you review this PR, please? I will reopen SPARK-33212 after merging this PR. This will recover `hadoop-aws` functionality in Apache Spark 3.1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533942826 ## File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala ## @@ -15,22 +15,158 @@ * limitations under the License. */ -package org.apache.spark.deploy +package org.apache.spark.util import java.io.File -import java.net.URI +import java.net.{URI, URISyntaxException} import org.apache.commons.lang3.StringUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.{SecurityManager, SparkConf, SparkException} +import org.apache.spark.deploy.SparkSubmitUtils import org.apache.spark.internal.Logging -import org.apache.spark.util.{MutableURLClassLoader, Utils} -private[deploy] object DependencyUtils extends Logging { +case class IvyProperties( +packagesExclusions: String, +packages: String, +repositories: String, +ivyRepoPath: String, +ivySettingsPath: String) + +private[spark] object DependencyUtils extends Logging { + + def getIvyProperties(): IvyProperties = { +val Seq(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) = Seq( + "spark.jars.excludes", + "spark.jars.packages", + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" +).map(sys.props.get(_).orNull) +IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) + } + + /** + * Parse URI query string's parameter value of `transitive` and `exclude`. + * Other invalid parameters will be ignored. + * + * @param uri Ivy uri need to be downloaded. + * @return Tuple value of parameter `transitive` and `exclude` value. + * + * 1. transitive: whether to download dependency jar of ivy URI, default value is false + *and this parameter value is case-sensitive. Invalid value will be treat as false. + *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true + *Output: true + * + * 2. exclude: comma separated exclusions to apply when resolving transitive dependencies, + *consists of `group:module` pairs separated by commas. + *Example: Input: excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + *Output: [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http] + */ + private def parseQueryParams(uri: URI): (Boolean, String) = { +val uriQuery = uri.getQuery +if (uriQuery == null) { + (false, "") +} else { + val mapTokens = uriQuery.split("&").map(_.split("=")) + if (mapTokens.exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, s"Invalid query string: $uriQuery") + } + val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) + // Parse transitive parameters (e.g., transitive=true) in an ivy URL, default value is false + var transitive: Boolean = false + groupedParams.get("transitive").foreach { params => +if (params.length > 1) { + logWarning("It's best to specify `transitive` parameter in ivy URL query only once." + +" If there are multiple `transitive` parameter, we will select the last one") +} +params.map(_._2).foreach { + case "true" => transitive = true + case _ => transitive = false +} + } + // Parse an excluded list (e.g., exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http) + // in an ivy URL. When download ivy URL jar, Spark won't download transitive jar + // in a excluded list. + val exclusionList = groupedParams.get("exclude").map { params => +params.map(_._2).flatMap { excludeString => + val excludes = excludeString.split(",") + if (excludes.map(_.split(":")).exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, "Invalid exclude string: " + + "expected 'org:module,org:module,..', found " + excludeString) + } + excludes +}.mkString(",") + }.getOrElse("") + + val invalidParams = groupedParams +.filter(entry => !Seq("transitive", "exclude").contains(entry._1)) +.keys.toArray.sorted + if (invalidParams.nonEmpty) { +logWarning( + s"Invalid parameters `${invalidParams.mkString(",")}` found in URI query `$uriQuery`.") + } + + groupedParams.foreach { case (key: String, values: Array[(String, String)]) => +if (key != "transitive" || key != "exclude") { + logWarning("Invalid parameter") +} + } + + (transitive, exclusionList) +} + } + + /** + * Download Ivy URIs dependency jars. + *
[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support
SparkQA commented on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-737042534 **[Test build #132029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132029/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins removed a comment on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737042348 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737042348 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya removed a comment on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter
viirya removed a comment on pull request #30565: URL: https://github.com/apache/spark/pull/30565#issuecomment-737029373 The codegen change is ready for review. I need to make some benchmark code too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30471: [SPARK-33520][ML] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/model
SparkQA commented on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-737042079 **[Test build #132028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132028/testReport)** for PR 30471 at commit [`e4f8acb`](https://github.com/apache/spark/commit/e4f8acbdb82d762d9323bc0c00d2e1b3993f097d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
SparkQA removed a comment on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737000928 **[Test build #132022 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132022/testReport)** for PR 30560 at commit [`16940ce`](https://github.com/apache/spark/commit/16940ce89477f1c3839fed13e67b39792cb55fa6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins removed a comment on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737041582 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-737041584 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support
AmplabJenkins removed a comment on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-737041583 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8
SparkQA commented on pull request #30517: URL: https://github.com/apache/spark/pull/30517#issuecomment-737041925 **[Test build #132027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132027/testReport)** for PR 30517 at commit [`26badc4`](https://github.com/apache/spark/commit/26badc4cc4ea6d68c8c5d50cf2c83e4904aacc0d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
SparkQA commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737041907 **[Test build #132022 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132022/testReport)** for PR 30560 at commit [`16940ce`](https://github.com/apache/spark/commit/16940ce89477f1c3839fed13e67b39792cb55fa6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT
SparkQA commented on pull request #30547: URL: https://github.com/apache/spark/pull/30547#issuecomment-737041808 **[Test build #132026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132026/testReport)** for PR 30547 at commit [`9088635`](https://github.com/apache/spark/commit/908863543372655091b8f6114b8ec5ec0290763e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
SparkQA commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737041791 **[Test build #132025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132025/testReport)** for PR 30560 at commit [`6570d70`](https://github.com/apache/spark/commit/6570d70fd5416e5d39bbcf9b68e552ac66ebc8ff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter
SparkQA commented on pull request #30565: URL: https://github.com/apache/spark/pull/30565#issuecomment-737041757 **[Test build #132024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132024/testReport)** for PR 30565 at commit [`368236f`](https://github.com/apache/spark/commit/368236fd73a21dfdc52c2819e7db26427eea523d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support
AmplabJenkins commented on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-737041583 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-737041584 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737041582 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30558: [SPARK-33612][SQL] Add dataSourceRewriteRules batch to Optimizer
dongjoon-hyun commented on a change in pull request #30558: URL: https://github.com/apache/spark/pull/30558#discussion_r533940558 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -185,6 +185,9 @@ abstract class Optimizer(catalogManager: CatalogManager) RemoveLiteralFromGroupExpressions, RemoveRepetitionFromGroupExpressions) :: Nil ++ operatorOptimizationBatch) :+ +// This batch rewrites data source plans and should be run after the operator +// optimization batch and before any batches that depend on stats. +Batch("Data Source Rewrite Rules", Once, dataSourceRewriteRules: _*) :+ Review comment: Could you propose a name then, @gatorsmile ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30556: [WIP][SPARK-33212][BUILD] Provide hadoop-aws-shaded jar in hadoop-cloud module
dongjoon-hyun commented on pull request #30556: URL: https://github.com/apache/spark/pull/30556#issuecomment-737040013 Ya. I'll proceed https://github.com/apache/spark/pull/30508 first since Apache Spark 3.1 branch cut is this Friday. We can revisit this with later during QA period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work
dongjoon-hyun edited a comment on pull request #30508: URL: https://github.com/apache/spark/pull/30508#issuecomment-737038195 Hi, @HyukjinKwon . Could you review this PR, please? I will reopen SPARK-33212 after merging this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30508: [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work
dongjoon-hyun commented on pull request #30508: URL: https://github.com/apache/spark/pull/30508#issuecomment-737038195 Hi, @HyukjinKwon . Could you review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
cloud-fan commented on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-737038157 `DataFrameWriterV2` is very powerful to describe the table writing behavior (CREATE, CREATE IF NOT EXISTS, CREATE OR REPLACE, REPLACE, append, overwrite where, etc.) and I don't think the current streaming framework can support these at the current stage. Ideally we need to handle these cases: 1. table exists and users want to write to it 2. table not exists and users want to fail 3. table not exists and users want to create it The current PR can't cover case 2 but I don't know how common it is for streaming users. Adding a `DataStreamWriterV2` to cover case 2 looks an overkill to me. One possible solution is to add 2 methods `insertTable` and `createAndInsertTable`. If we think case 2 is rare, adding only `toTable` which works as `createAndInsertTable` is also fine to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30562: [SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete
cloud-fan commented on a change in pull request #30562: URL: https://github.com/apache/spark/pull/30562#discussion_r533936837 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java ## @@ -28,6 +28,25 @@ */ @Evolving public interface SupportsDelete { + + /** + * Checks whether it is possible to delete data from a data source table that matches filter + * expressions. + * + * Rows should be deleted from the data source iff all of the filter expressions match. + * That is, the expressions must be interpreted as a set of filters that are ANDed together. + * + * Spark will call this method to check if the delete is possible without significant effort. + * Otherwise, Spark will try to rewrite the delete operation and produce row-level changes + * if the data source table supports deleting individual records. + * + * @param filters filter expressions, used to select rows to delete when all expressions match + * @return true if the delete operation can be performed + */ + default boolean canDeleteWhere(Filter[] filters) { +return true; + } + /** * Delete data from a data source table that matches filter expressions. Review comment: Yea I have the same question. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-736995917 **[Test build #132020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132020/testReport)** for PR 30243 at commit [`563622f`](https://github.com/apache/spark/commit/563622f2f38d75e88c53611bf662fcc793afc860). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-737036131 **[Test build #132020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132020/testReport)** for PR 30243 at commit [`563622f`](https://github.com/apache/spark/commit/563622f2f38d75e88c53611bf662fcc793afc860). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support
uncleGen commented on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-737033196 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533931068 ## File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala ## @@ -15,22 +15,158 @@ * limitations under the License. */ -package org.apache.spark.deploy +package org.apache.spark.util import java.io.File -import java.net.URI +import java.net.{URI, URISyntaxException} import org.apache.commons.lang3.StringUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.{SecurityManager, SparkConf, SparkException} +import org.apache.spark.deploy.SparkSubmitUtils import org.apache.spark.internal.Logging -import org.apache.spark.util.{MutableURLClassLoader, Utils} -private[deploy] object DependencyUtils extends Logging { +case class IvyProperties( +packagesExclusions: String, +packages: String, +repositories: String, +ivyRepoPath: String, +ivySettingsPath: String) + +private[spark] object DependencyUtils extends Logging { + + def getIvyProperties(): IvyProperties = { +val Seq(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) = Seq( + "spark.jars.excludes", + "spark.jars.packages", + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" +).map(sys.props.get(_).orNull) +IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) + } + + /** + * Parse URI query string's parameter value of `transitive` and `exclude`. + * Other invalid parameters will be ignored. + * + * @param uri Ivy uri need to be downloaded. + * @return Tuple value of parameter `transitive` and `exclude` value. + * + * 1. transitive: whether to download dependency jar of ivy URI, default value is false + *and this parameter value is case-sensitive. Invalid value will be treat as false. + *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true + *Output: true + * + * 2. exclude: comma separated exclusions to apply when resolving transitive dependencies, + *consists of `group:module` pairs separated by commas. + *Example: Input: excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + *Output: [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http] + */ + private def parseQueryParams(uri: URI): (Boolean, String) = { +val uriQuery = uri.getQuery +if (uriQuery == null) { + (false, "") +} else { + val mapTokens = uriQuery.split("&").map(_.split("=")) + if (mapTokens.exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, s"Invalid query string: $uriQuery") + } + val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) + // Parse transitive parameters (e.g., transitive=true) in an ivy URL, default value is false + var transitive: Boolean = false + groupedParams.get("transitive").foreach { params => +if (params.length > 1) { + logWarning("It's best to specify `transitive` parameter in ivy URL query only once." + +" If there are multiple `transitive` parameter, we will select the last one") +} +params.map(_._2).foreach { + case "true" => transitive = true + case _ => transitive = false +} + } + // Parse an excluded list (e.g., exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http) + // in an ivy URL. When download ivy URL jar, Spark won't download transitive jar + // in a excluded list. + val exclusionList = groupedParams.get("exclude").map { params => +params.map(_._2).flatMap { excludeString => + val excludes = excludeString.split(",") + if (excludes.map(_.split(":")).exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, "Invalid exclude string: " + + "expected 'org:module,org:module,..', found " + excludeString) + } + excludes +}.mkString(",") + }.getOrElse("") + + val invalidParams = groupedParams +.filter(entry => !Seq("transitive", "exclude").contains(entry._1)) +.keys.toArray.sorted + if (invalidParams.nonEmpty) { +logWarning( + s"Invalid parameters `${invalidParams.mkString(",")}` found in URI query `$uriQuery`.") + } + + groupedParams.foreach { case (key: String, values: Array[(String, String)]) => +if (key != "transitive" || key != "exclude") { + logWarning("Invalid parameter") +} + } + + (transitive, exclusionList) +} + } + + /** + * Download Ivy URIs dependency jars. + *
[GitHub] [spark] LuciferYang commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT
LuciferYang commented on pull request #30547: URL: https://github.com/apache/spark/pull/30547#issuecomment-737030101 4 test cases of the yarn module failed in the GitHub action: ``` YarnClusterSuite.run Spark in yarn-client mode with different configurations, ensuring redaction YarnClusterSuite.run Spark in yarn-cluster mode with different configurations, ensuring redaction YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630) YarnClusterSuite.run Spark in yarn-client mode with additional jar ``` but local test success: ``` Run completed in 8 minutes, 56 seconds. Total number of tests run: 137 Suites: completed 18, aborted 0 Tests: succeeded 137, failed 0, canceled 1, ignored 0, pending 0 All tests passed. ``` Let me check kafka-sql test failed in Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #30556: [WIP][SPARK-33212][BUILD] Provide hadoop-aws-shaded jar in hadoop-cloud module
sunchao commented on pull request #30556: URL: https://github.com/apache/spark/pull/30556#issuecomment-737029894 Thanks @dongjoon-hyun . This is bad news and it means we'd have to abandon the approach in this PR. The only solution seems have to be on the Hadoop side. I've opened a [Hadoop PR](https://github.com/apache/hadoop/pull/2510) and tested it successfully with the code snippet you pasted above. @steveloughran could you take a look there? thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533931068 ## File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala ## @@ -15,22 +15,158 @@ * limitations under the License. */ -package org.apache.spark.deploy +package org.apache.spark.util import java.io.File -import java.net.URI +import java.net.{URI, URISyntaxException} import org.apache.commons.lang3.StringUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.{SecurityManager, SparkConf, SparkException} +import org.apache.spark.deploy.SparkSubmitUtils import org.apache.spark.internal.Logging -import org.apache.spark.util.{MutableURLClassLoader, Utils} -private[deploy] object DependencyUtils extends Logging { +case class IvyProperties( +packagesExclusions: String, +packages: String, +repositories: String, +ivyRepoPath: String, +ivySettingsPath: String) + +private[spark] object DependencyUtils extends Logging { + + def getIvyProperties(): IvyProperties = { +val Seq(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) = Seq( + "spark.jars.excludes", + "spark.jars.packages", + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" +).map(sys.props.get(_).orNull) +IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) + } + + /** + * Parse URI query string's parameter value of `transitive` and `exclude`. + * Other invalid parameters will be ignored. + * + * @param uri Ivy uri need to be downloaded. + * @return Tuple value of parameter `transitive` and `exclude` value. + * + * 1. transitive: whether to download dependency jar of ivy URI, default value is false + *and this parameter value is case-sensitive. Invalid value will be treat as false. + *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true + *Output: true + * + * 2. exclude: comma separated exclusions to apply when resolving transitive dependencies, + *consists of `group:module` pairs separated by commas. + *Example: Input: excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + *Output: [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http] + */ + private def parseQueryParams(uri: URI): (Boolean, String) = { +val uriQuery = uri.getQuery +if (uriQuery == null) { + (false, "") +} else { + val mapTokens = uriQuery.split("&").map(_.split("=")) + if (mapTokens.exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, s"Invalid query string: $uriQuery") + } + val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) + // Parse transitive parameters (e.g., transitive=true) in an ivy URL, default value is false + var transitive: Boolean = false + groupedParams.get("transitive").foreach { params => +if (params.length > 1) { + logWarning("It's best to specify `transitive` parameter in ivy URL query only once." + +" If there are multiple `transitive` parameter, we will select the last one") +} +params.map(_._2).foreach { + case "true" => transitive = true + case _ => transitive = false +} + } + // Parse an excluded list (e.g., exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http) + // in an ivy URL. When download ivy URL jar, Spark won't download transitive jar + // in a excluded list. + val exclusionList = groupedParams.get("exclude").map { params => +params.map(_._2).flatMap { excludeString => + val excludes = excludeString.split(",") + if (excludes.map(_.split(":")).exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, "Invalid exclude string: " + + "expected 'org:module,org:module,..', found " + excludeString) + } + excludes +}.mkString(",") + }.getOrElse("") + + val invalidParams = groupedParams +.filter(entry => !Seq("transitive", "exclude").contains(entry._1)) +.keys.toArray.sorted + if (invalidParams.nonEmpty) { +logWarning( + s"Invalid parameters `${invalidParams.mkString(",")}` found in URI query `$uriQuery`.") + } + + groupedParams.foreach { case (key: String, values: Array[(String, String)]) => +if (key != "transitive" || key != "exclude") { + logWarning("Invalid parameter") +} + } + + (transitive, exclusionList) +} + } + + /** + * Download Ivy URIs dependency jars. + *
[GitHub] [spark] viirya commented on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter
viirya commented on pull request #30565: URL: https://github.com/apache/spark/pull/30565#issuecomment-737029373 The codegen change is ready for review. I need to make some benchmark code too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter
viirya opened a new pull request #30565: URL: https://github.com/apache/spark/pull/30565 ### What changes were proposed in this pull request? This patch proposes to enable whole-stage subexpression elimination for Filter. ### Why are the changes needed? We made subexpression elimination available for whole-stage codegen in ProjectExec. Another one operator that frequently runs into subexpressions, is Filter. We should also make whole-stage codegen subexpression elimination in FilterExec too. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533928892 ## File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala ## @@ -15,22 +15,158 @@ * limitations under the License. */ -package org.apache.spark.deploy +package org.apache.spark.util import java.io.File -import java.net.URI +import java.net.{URI, URISyntaxException} import org.apache.commons.lang3.StringUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.{SecurityManager, SparkConf, SparkException} +import org.apache.spark.deploy.SparkSubmitUtils import org.apache.spark.internal.Logging -import org.apache.spark.util.{MutableURLClassLoader, Utils} -private[deploy] object DependencyUtils extends Logging { +case class IvyProperties( +packagesExclusions: String, +packages: String, +repositories: String, +ivyRepoPath: String, +ivySettingsPath: String) + +private[spark] object DependencyUtils extends Logging { + + def getIvyProperties(): IvyProperties = { +val Seq(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) = Seq( + "spark.jars.excludes", + "spark.jars.packages", + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" +).map(sys.props.get(_).orNull) +IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) + } + + /** + * Parse URI query string's parameter value of `transitive` and `exclude`. + * Other invalid parameters will be ignored. + * + * @param uri Ivy uri need to be downloaded. + * @return Tuple value of parameter `transitive` and `exclude` value. + * + * 1. transitive: whether to download dependency jar of ivy URI, default value is false + *and this parameter value is case-sensitive. Invalid value will be treat as false. + *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true + *Output: true + * + * 2. exclude: comma separated exclusions to apply when resolving transitive dependencies, + *consists of `group:module` pairs separated by commas. + *Example: Input: excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + *Output: [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http] + */ + private def parseQueryParams(uri: URI): (Boolean, String) = { +val uriQuery = uri.getQuery +if (uriQuery == null) { + (false, "") +} else { + val mapTokens = uriQuery.split("&").map(_.split("=")) + if (mapTokens.exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, s"Invalid query string: $uriQuery") + } + val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) + // Parse transitive parameters (e.g., transitive=true) in an ivy URL, default value is false + var transitive: Boolean = false + groupedParams.get("transitive").foreach { params => +if (params.length > 1) { + logWarning("It's best to specify `transitive` parameter in ivy URL query only once." + +" If there are multiple `transitive` parameter, we will select the last one") +} +params.map(_._2).foreach { + case "true" => transitive = true + case _ => transitive = false +} + } + // Parse an excluded list (e.g., exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http) + // in an ivy URL. When download ivy URL jar, Spark won't download transitive jar + // in a excluded list. + val exclusionList = groupedParams.get("exclude").map { params => +params.map(_._2).flatMap { excludeString => + val excludes = excludeString.split(",") + if (excludes.map(_.split(":")).exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, "Invalid exclude string: " + + "expected 'org:module,org:module,..', found " + excludeString) + } + excludes +}.mkString(",") + }.getOrElse("") + + val invalidParams = groupedParams +.filter(entry => !Seq("transitive", "exclude").contains(entry._1)) +.keys.toArray.sorted + if (invalidParams.nonEmpty) { +logWarning( + s"Invalid parameters `${invalidParams.mkString(",")}` found in URI query `$uriQuery`.") + } Review comment: nit format: ``` val validParams = Set("transitive", "exclude") val invalidParams = groupedParams.keys.filterNot(validParams.contains).toSeq.sorted if (invalidParams.nonEmpty) { logWarning(s"Invalid parameters `${invalidParams.mkString(",")}` foun
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533928805 ## File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala ## @@ -15,22 +15,158 @@ * limitations under the License. */ -package org.apache.spark.deploy +package org.apache.spark.util import java.io.File -import java.net.URI +import java.net.{URI, URISyntaxException} import org.apache.commons.lang3.StringUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.{SecurityManager, SparkConf, SparkException} +import org.apache.spark.deploy.SparkSubmitUtils import org.apache.spark.internal.Logging -import org.apache.spark.util.{MutableURLClassLoader, Utils} -private[deploy] object DependencyUtils extends Logging { +case class IvyProperties( +packagesExclusions: String, +packages: String, +repositories: String, +ivyRepoPath: String, +ivySettingsPath: String) + +private[spark] object DependencyUtils extends Logging { + + def getIvyProperties(): IvyProperties = { +val Seq(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) = Seq( + "spark.jars.excludes", + "spark.jars.packages", + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" +).map(sys.props.get(_).orNull) +IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) + } + + /** + * Parse URI query string's parameter value of `transitive` and `exclude`. + * Other invalid parameters will be ignored. + * + * @param uri Ivy uri need to be downloaded. + * @return Tuple value of parameter `transitive` and `exclude` value. + * + * 1. transitive: whether to download dependency jar of ivy URI, default value is false + *and this parameter value is case-sensitive. Invalid value will be treat as false. + *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true + *Output: true + * + * 2. exclude: comma separated exclusions to apply when resolving transitive dependencies, + *consists of `group:module` pairs separated by commas. + *Example: Input: excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + *Output: [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http] + */ + private def parseQueryParams(uri: URI): (Boolean, String) = { +val uriQuery = uri.getQuery +if (uriQuery == null) { + (false, "") +} else { + val mapTokens = uriQuery.split("&").map(_.split("=")) + if (mapTokens.exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, s"Invalid query string: $uriQuery") + } + val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) + // Parse transitive parameters (e.g., transitive=true) in an ivy URL, default value is false + var transitive: Boolean = false + groupedParams.get("transitive").foreach { params => +if (params.length > 1) { + logWarning("It's best to specify `transitive` parameter in ivy URL query only once." + +" If there are multiple `transitive` parameter, we will select the last one") +} +params.map(_._2).foreach { + case "true" => transitive = true + case _ => transitive = false +} + } + // Parse an excluded list (e.g., exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http) + // in an ivy URL. When download ivy URL jar, Spark won't download transitive jar + // in a excluded list. + val exclusionList = groupedParams.get("exclude").map { params => +params.map(_._2).flatMap { excludeString => + val excludes = excludeString.split(",") + if (excludes.map(_.split(":")).exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, "Invalid exclude string: " + + "expected 'org:module,org:module,..', found " + excludeString) + } + excludes +}.mkString(",") + }.getOrElse("") + + val invalidParams = groupedParams +.filter(entry => !Seq("transitive", "exclude").contains(entry._1)) +.keys.toArray.sorted + if (invalidParams.nonEmpty) { +logWarning( + s"Invalid parameters `${invalidParams.mkString(",")}` found in URI query `$uriQuery`.") + } + + groupedParams.foreach { case (key: String, values: Array[(String, String)]) => +if (key != "transitive" || key != "exclude") { + logWarning("Invalid parameter") +} + } Review comment: What's this?
[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support
SparkQA removed a comment on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-736978616 **[Test build #132018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132018/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support
SparkQA commented on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-737025797 **[Test build #132018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132018/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT
HyukjinKwon commented on pull request #30547: URL: https://github.com/apache/spark/pull/30547#issuecomment-737025493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins removed a comment on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737024489 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30547: [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT
HyukjinKwon commented on pull request #30547: URL: https://github.com/apache/spark/pull/30547#issuecomment-737025282 Hm, the test failures look consistent? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533927237 ## File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala ## @@ -15,22 +15,158 @@ * limitations under the License. */ -package org.apache.spark.deploy +package org.apache.spark.util import java.io.File -import java.net.URI +import java.net.{URI, URISyntaxException} import org.apache.commons.lang3.StringUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.{SecurityManager, SparkConf, SparkException} +import org.apache.spark.deploy.SparkSubmitUtils import org.apache.spark.internal.Logging -import org.apache.spark.util.{MutableURLClassLoader, Utils} -private[deploy] object DependencyUtils extends Logging { +case class IvyProperties( +packagesExclusions: String, +packages: String, +repositories: String, +ivyRepoPath: String, +ivySettingsPath: String) + +private[spark] object DependencyUtils extends Logging { + + def getIvyProperties(): IvyProperties = { +val Seq(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) = Seq( + "spark.jars.excludes", + "spark.jars.packages", + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" +).map(sys.props.get(_).orNull) +IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) + } + + /** + * Parse URI query string's parameter value of `transitive` and `exclude`. + * Other invalid parameters will be ignored. + * + * @param uri Ivy uri need to be downloaded. + * @return Tuple value of parameter `transitive` and `exclude` value. + * + * 1. transitive: whether to download dependency jar of ivy URI, default value is false + *and this parameter value is case-sensitive. Invalid value will be treat as false. + *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true + *Output: true + * + * 2. exclude: comma separated exclusions to apply when resolving transitive dependencies, + *consists of `group:module` pairs separated by commas. + *Example: Input: excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + *Output: [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http] + */ + private def parseQueryParams(uri: URI): (Boolean, String) = { +val uriQuery = uri.getQuery +if (uriQuery == null) { + (false, "") +} else { + val mapTokens = uriQuery.split("&").map(_.split("=")) + if (mapTokens.exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, s"Invalid query string: $uriQuery") + } + val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) + // Parse transitive parameters (e.g., transitive=true) in an ivy URL, default value is false + var transitive: Boolean = false + groupedParams.get("transitive").foreach { params => +if (params.length > 1) { + logWarning("It's best to specify `transitive` parameter in ivy URL query only once." + +" If there are multiple `transitive` parameter, we will select the last one") +} +params.map(_._2).foreach { + case "true" => transitive = true + case _ => transitive = false +} + } + // Parse an excluded list (e.g., exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http) + // in an ivy URL. When download ivy URL jar, Spark won't download transitive jar + // in a excluded list. + val exclusionList = groupedParams.get("exclude").map { params => +params.map(_._2).flatMap { excludeString => + val excludes = excludeString.split(",") + if (excludes.map(_.split(":")).exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { Review comment: How about making a helper function for this check? https://github.com/apache/spark/pull/29966/files#diff-3e9f71e7d80c1dc7d02b0edef611de280f219789f0d2b282887f07e999020024R74-R75 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737024489 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
HeartSaVioR closed pull request #30563: URL: https://github.com/apache/spark/pull/30563 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
HeartSaVioR commented on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-737022204 Thanks! Merging to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
AmplabJenkins commented on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-737021958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
AmplabJenkins removed a comment on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-737021958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533924963 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1890,47 +1890,66 @@ class SparkContext(config: SparkConf) extends Logging { throw new IllegalArgumentException( s"Directory ${path} is not allowed for addJar") } - path + Seq(path) } catch { case NonFatal(e) => logError(s"Failed to add $path to Spark environment", e) -null +Nil } } else { -path +Seq(path) } } if (path == null || path.isEmpty) { logWarning("null or empty path specified as parameter to addJar") } else { - val key = if (path.contains("\\") && Utils.isWindows) { + var schema = "" + val keys = if (path.contains("\\") && Utils.isWindows) { // For local paths with backslashes on Windows, URI throws an exception addLocalJarFile(new File(path)) } else { val uri = new Path(path).toUri // SPARK-17650: Make sure this is a valid URL before adding it to the list of dependencies Utils.validateURL(uri) -uri.getScheme match { +schema = uri.getScheme +schema match { // A JAR file which exists only on the driver node case null => // SPARK-22585 path without schema is not url encoded addLocalJarFile(new File(uri.getPath)) // A JAR file which exists only on the driver node case "file" => addLocalJarFile(new File(uri.getPath)) // A JAR file which exists locally on every worker node - case "local" => "file:" + uri.getPath + case "local" => Seq("file:" + uri.getPath) + case "ivy" => +// Since `new Path(path).toUri` will lose query information, +// so here we use `URI.create(path)` +DependencyUtils.resolveMavenDependencies(URI.create(path)) case _ => checkRemoteJarFile(path) } } - if (key != null) { + if (keys.nonEmpty) { val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis -if (addedJars.putIfAbsent(key, timestamp).isEmpty) { - logInfo(s"Added JAR $path at $key with timestamp $timestamp") +val (added, existed) = keys.partition(addedJars.putIfAbsent(_, timestamp).isEmpty) +if (added.nonEmpty) { + if (schema != "ivy") { +logInfo(s"Added JAR $path at ${added.mkString(",")} with timestamp $timestamp") + } else { +logInfo(s"Added dependency jars of ivy uri $path at ${added.mkString(",")}" + + s" with timestamp $timestamp") + } postEnvironmentUpdate() -} else { - logWarning(s"The jar $path has been added already. Overwriting of added jars " + -"is not supported in the current version.") +} +if (existed.nonEmpty) { + if (schema != "ivy") { +logWarning(s"The jar $path has been added already. Overwriting of added jars " + + "is not supported in the current version.") + } else { +logWarning(s"The dependency jars of ivy URI with $path at" + + s" ${existed.mkString(",")} has been added already." + + s" Overwriting of added jars is not supported in the current version.") Review comment: nit: remove `s`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins removed a comment on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-737021222 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-737021222 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533924173 ## File path: core/src/main/scala/org/apache/spark/util/DependencyUtils.scala ## @@ -15,22 +15,158 @@ * limitations under the License. */ -package org.apache.spark.deploy +package org.apache.spark.util import java.io.File -import java.net.URI +import java.net.{URI, URISyntaxException} import org.apache.commons.lang3.StringUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.{SecurityManager, SparkConf, SparkException} +import org.apache.spark.deploy.SparkSubmitUtils import org.apache.spark.internal.Logging -import org.apache.spark.util.{MutableURLClassLoader, Utils} -private[deploy] object DependencyUtils extends Logging { +case class IvyProperties( +packagesExclusions: String, +packages: String, +repositories: String, +ivyRepoPath: String, +ivySettingsPath: String) + +private[spark] object DependencyUtils extends Logging { + + def getIvyProperties(): IvyProperties = { +val Seq(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) = Seq( + "spark.jars.excludes", + "spark.jars.packages", + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" +).map(sys.props.get(_).orNull) +IvyProperties(packagesExclusions, packages, repositories, ivyRepoPath, ivySettingsPath) + } + + /** + * Parse URI query string's parameter value of `transitive` and `exclude`. + * Other invalid parameters will be ignored. + * + * @param uri Ivy uri need to be downloaded. + * @return Tuple value of parameter `transitive` and `exclude` value. + * + * 1. transitive: whether to download dependency jar of ivy URI, default value is false + *and this parameter value is case-sensitive. Invalid value will be treat as false. + *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true + *Output: true + * + * 2. exclude: comma separated exclusions to apply when resolving transitive dependencies, + *consists of `group:module` pairs separated by commas. + *Example: Input: excludeorg.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + *Output: [org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http] + */ + private def parseQueryParams(uri: URI): (Boolean, String) = { +val uriQuery = uri.getQuery +if (uriQuery == null) { + (false, "") +} else { + val mapTokens = uriQuery.split("&").map(_.split("=")) + if (mapTokens.exists(token => +token.length != 2 || StringUtils.isBlank(token(0)) || StringUtils.isBlank(token(1 { +throw new URISyntaxException(uri.toString, s"Invalid query string: $uriQuery") + } + val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) + // Parse transitive parameters (e.g., transitive=true) in an ivy URL, default value is false + var transitive: Boolean = false Review comment: nit: we don't need the type: `var transitive = false` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533923652 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1890,47 +1890,66 @@ class SparkContext(config: SparkConf) extends Logging { throw new IllegalArgumentException( s"Directory ${path} is not allowed for addJar") } - path + Seq(path) } catch { case NonFatal(e) => logError(s"Failed to add $path to Spark environment", e) -null +Nil } } else { -path +Seq(path) } } if (path == null || path.isEmpty) { logWarning("null or empty path specified as parameter to addJar") } else { - val key = if (path.contains("\\") && Utils.isWindows) { + var schema = "" + val keys = if (path.contains("\\") && Utils.isWindows) { // For local paths with backslashes on Windows, URI throws an exception addLocalJarFile(new File(path)) } else { val uri = new Path(path).toUri // SPARK-17650: Make sure this is a valid URL before adding it to the list of dependencies Utils.validateURL(uri) -uri.getScheme match { +schema = uri.getScheme +schema match { // A JAR file which exists only on the driver node case null => // SPARK-22585 path without schema is not url encoded addLocalJarFile(new File(uri.getPath)) // A JAR file which exists only on the driver node case "file" => addLocalJarFile(new File(uri.getPath)) // A JAR file which exists locally on every worker node - case "local" => "file:" + uri.getPath + case "local" => Seq("file:" + uri.getPath) + case "ivy" => +// Since `new Path(path).toUri` will lose query information, +// so here we use `URI.create(path)` +DependencyUtils.resolveMavenDependencies(URI.create(path)) case _ => checkRemoteJarFile(path) } } - if (key != null) { + if (keys.nonEmpty) { val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis -if (addedJars.putIfAbsent(key, timestamp).isEmpty) { - logInfo(s"Added JAR $path at $key with timestamp $timestamp") +val (added, existed) = keys.partition(addedJars.putIfAbsent(_, timestamp).isEmpty) +if (added.nonEmpty) { + if (schema != "ivy") { +logInfo(s"Added JAR $path at ${added.mkString(",")} with timestamp $timestamp") + } else { +logInfo(s"Added dependency jars of ivy uri $path at ${added.mkString(",")}" + + s" with timestamp $timestamp") Review comment: nit: ``` val jarMessage = if (schema != "ivy") "JAR" else "dependency jars of ivy uri" logInfo(s"Added $jarMessage $path at ${added.mkString(",")} with timestamp $timestamp") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-737020328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-737020328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
SparkQA removed a comment on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-736941564 **[Test build #132011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132011/testReport)** for PR 30563 at commit [`27b43f4`](https://github.com/apache/spark/commit/27b43f4e51d5223ab0ac2b52a969eb71e39fcf94). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30563: [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite
SparkQA commented on pull request #30563: URL: https://github.com/apache/spark/pull/30563#issuecomment-737019934 **[Test build #132011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132011/testReport)** for PR 30563 at commit [`27b43f4`](https://github.com/apache/spark/commit/27b43f4e51d5223ab0ac2b52a969eb71e39fcf94). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30564: [SPARK-32670][SQL][FOLLOWUP] Group exception messages in Catalyst Analyzer in one file
SparkQA commented on pull request #30564: URL: https://github.com/apache/spark/pull/30564#issuecomment-737019310 **[Test build #132023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132023/testReport)** for PR 30564 at commit [`086bba3`](https://github.com/apache/spark/commit/086bba330ffeba1856e90962872b87a710326631). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken edited a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
leanken edited a comment on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737018589 > ``` > org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite.Consistent error handling for datetime formatting and parsing functions16 ms 1 > org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23594 GetExternalRowField should support interpreted execution 49 ms 1 > org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23595 ValidateExternalType should support interpreted execution0.64 sec1 > org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtract 0.25 sec1 > org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtractAll > ``` > > It seems the five errors? If so, IMO its okay to fix all of them at once. Anyway, nice catch! it should be zero error after the final patch 16940ce. please ignore the former test result ^_^ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #30564: [SPARK-32670][SQL][FOLLOWUP] Group exception messages in Catalyst Analyzer in one file
beliefer opened a new pull request #30564: URL: https://github.com/apache/spark/pull/30564 ### What changes were proposed in this pull request? This PR follows up https://github.com/apache/spark/pull/29497. Because https://github.com/apache/spark/pull/29497 just give us an example to group all `AnalysisExcpetion` in Analyzer into QueryCompilationErrors. This PR group other `AnalysisExcpetion` into QueryCompilationErrors. ### Why are the changes needed? It will largely help with standardization of error messages and its maintenance. ### Does this PR introduce _any_ user-facing change? No. Error messages remain unchanged. ### How was this patch tested? No new tests - pass all original tests to make sure it doesn't break any existing behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
leanken commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737018589 > ``` > org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite.Consistent error handling for datetime formatting and parsing functions16 ms 1 > org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23594 GetExternalRowField should support interpreted execution 49 ms 1 > org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23595 ValidateExternalType should support interpreted execution0.64 sec1 > org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtract 0.25 sec1 > org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtractAll > ``` > > It seems the five errors? If so, IMO its okay to fix all of them at once. Anyway, nice catch! it should be zero error after the final patch. please ignore the former test result ^_^ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
maropu commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r533920784 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1890,47 +1890,66 @@ class SparkContext(config: SparkConf) extends Logging { throw new IllegalArgumentException( s"Directory ${path} is not allowed for addJar") } - path + Seq(path) } catch { case NonFatal(e) => logError(s"Failed to add $path to Spark environment", e) -null +Nil } } else { -path +Seq(path) } } if (path == null || path.isEmpty) { logWarning("null or empty path specified as parameter to addJar") } else { - val key = if (path.contains("\\") && Utils.isWindows) { + var schema = "" Review comment: `val schema = uri.getScheme`? Please avoid to use `var` where possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins removed a comment on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-737016158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins removed a comment on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737016161 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8
AmplabJenkins removed a comment on pull request #30517: URL: https://github.com/apache/spark/pull/30517#issuecomment-737016159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8
AmplabJenkins commented on pull request #30517: URL: https://github.com/apache/spark/pull/30517#issuecomment-737016159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
AmplabJenkins commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737016161 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-737016158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28036: [SPARK-26341][CORE]Expose executor memory metrics at the stage level, in the Stages tab
AngersZh commented on a change in pull request #28036: URL: https://github.com/apache/spark/pull/28036#discussion_r533919517 ## File path: core/src/main/resources/org/apache/spark/ui/static/stagepage.js ## @@ -288,14 +298,18 @@ $(document).ready(function () { "" + "" + " Select All" + -" Scheduler Delay" + -" Task Deserialization Time" + -" Shuffle Read Blocked Time" + -" Shuffle Remote Reads" + -" Shuffle Write Time" + -" Result Serialization Time" + -" Getting Result Time" + -" Peak Execution Memory" + +" Scheduler Delay" + +" Task Deserialization Time" + +" Shuffle Read Blocked Time" + +" Shuffle Remote Reads" + +" Shuffle Write Time" + +" Result Serialization Time" + +" Getting Result Time" + +" Peak Execution Memory" + +" Executor JVMOnHeapMemory / JVMOffHeapMemory" + Review comment: > @ron8hu @AngersZh If you see the data-column and the id attributes, the value in those attributes corresponds to the column placement in the tables present in the stages page. Currently, the values for the four new columns are overlapping with the other columns, thus, creating errors while loading the page. Try fixing the values and testing the same. If it is still not working, feel free to ping me and I can look into it with you more. They are two different table selector so current `data-column` is not conflict. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
maropu commented on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-737014904 ``` org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite.Consistent error handling for datetime formatting and parsing functions 16 ms 1 org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23594 GetExternalRowField should support interpreted execution 49 ms 1 org.apache.spark.sql.catalyst.expressions.ObjectExpressionsSuite.SPARK-23595 ValidateExternalType should support interpreted execution 0.64 sec1 org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtract 0.25 sec1 org.apache.spark.sql.catalyst.expressions.RegexpExpressionsSuite.RegexExtractAll ``` It seems the five errors? If so, IMO its okay to fix all of them at once. Anyway, nice catch! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
SparkQA removed a comment on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-736966702 **[Test build #132017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132017/testReport)** for PR 29966 at commit [`bdc5035`](https://github.com/apache/spark/commit/bdc50356077c9c9db1cbfab249083053dbe1a7dd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
SparkQA commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-737014366 **[Test build #132017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132017/testReport)** for PR 29966 at commit [`bdc5035`](https://github.com/apache/spark/commit/bdc50356077c9c9db1cbfab249083053dbe1a7dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class IvyProperties(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30559: [SPARK-33617][SQL] spark.sql.files.minPartitionNum effective for LocalTableScan
maropu commented on a change in pull request #30559: URL: https://github.com/apache/spark/pull/30559#discussion_r533917581 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/LocalTableScanExec.scala ## @@ -49,7 +49,9 @@ case class LocalTableScanExec( if (rows.isEmpty) { sqlContext.sparkContext.emptyRDD } else { - val numSlices = math.min(unsafeRows.length, sqlContext.sparkContext.defaultParallelism) + val numSlices = math.min( +unsafeRows.length, + conf.filesMinPartitionNum.getOrElse(sqlContext.sparkContext.defaultParallelism)) Review comment: hm, how about adding a new config to control each partition size? I'm not sure about why `numSlice` depends on the length of `unsafeRows`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8
SparkQA removed a comment on pull request #30517: URL: https://github.com/apache/spark/pull/30517#issuecomment-736928681 **[Test build #132010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132010/testReport)** for PR 30517 at commit [`c617757`](https://github.com/apache/spark/commit/c617757ec165163b85e284304340ee0c37ae12e5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8
SparkQA commented on pull request #30517: URL: https://github.com/apache/spark/pull/30517#issuecomment-737007775 **[Test build #132010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132010/testReport)** for PR 30517 at commit [`c617757`](https://github.com/apache/spark/commit/c617757ec165163b85e284304340ee0c37ae12e5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30560: [SPARK-33619][SQL] Fix GetMapValueUtil code generation error
SparkQA removed a comment on pull request #30560: URL: https://github.com/apache/spark/pull/30560#issuecomment-736960640 **[Test build #132015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132015/testReport)** for PR 30560 at commit [`6d4fb1d`](https://github.com/apache/spark/commit/6d4fb1d287965b1123d204f9a98b31dd249c1617). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30473: [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog
cloud-fan commented on a change in pull request #30473: URL: https://github.com/apache/spark/pull/30473#discussion_r533911399 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCNamespaceTest.scala ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc.v2 + +import scala.collection.JavaConverters._ + +import org.apache.log4j.Level + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.connector.catalog.NamespaceChange +import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.tags.DockerTest + +@DockerTest +private[v2] trait V2JDBCNamespaceTest extends SharedSparkSession { + val catalog = new JDBCTableCatalog() + + test("listNamespaces: basic behavior") { +catalog.createNamespace(Array("foo"), Map("comment" -> "test comment").asJava) +assert(catalog.listNamespaces() === + Array(Array("foo"), Array("information_schema"), Array("pg_catalog"), Array("public"))) Review comment: this seems pgsql specific. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30473: [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog
cloud-fan commented on a change in pull request #30473: URL: https://github.com/apache/spark/pull/30473#discussion_r533910906 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala ## @@ -171,6 +175,130 @@ class JDBCTableCatalog extends TableCatalog with Logging { } } + override def namespaceExists(namespace: Array[String]): Boolean = namespace match { +case Array(db) => + withConnection { conn => +val rs = conn.getMetaData.getSchemas(null, db) +while (rs.next()) { + if (rs.getString(1) == db) return true; +} +false + } +case _ => false + } + + override def listNamespaces(): Array[Array[String]] = { +withConnection { conn => + val schemaBuilder = ArrayBuilder.make[Array[String]] + val rs = conn.getMetaData.getSchemas() + while (rs.next()) { +schemaBuilder += Array(rs.getString(1)) + } + schemaBuilder.result +} + } + + override def listNamespaces(namespace: Array[String]): Array[Array[String]] = { +namespace match { + case Array() => +listNamespaces() + case Array(db) if namespaceExists(namespace) => +Array() + case _ => +throw new NoSuchNamespaceException(namespace) +} + } + + override def loadNamespaceMetadata(namespace: Array[String]): util.Map[String, String] = { +namespace match { + case Array(db) => +if (!namespaceExists(namespace)) throw new NoSuchNamespaceException(db) +mutable.HashMap[String, String]().asJava + + case _ => +throw new NoSuchNamespaceException(namespace) +} + } + + override def createNamespace( + namespace: Array[String], + metadata: util.Map[String, String]): Unit = namespace match { +case Array(db) if !namespaceExists(namespace) => + var comment = "" + if (!metadata.isEmpty) { +metadata.asScala.map { + case (k, v) => k match { +case SupportsNamespaces.PROP_COMMENT => comment = v +case SupportsNamespaces.PROP_OWNER => // ignore +case SupportsNamespaces.PROP_LOCATION => + throw new AnalysisException("CREATE NAMESPACE ... LOCATION ... is not supported in" + +" JDBC catalog.") +case _ => // ignore all the other properties for now + } +} + } + withConnection { conn => +classifyException(s"Failed create name space: $db") { + JdbcUtils.createNamespace(conn, options, db, comment) +} + } + +case Array(_) => + throw new NamespaceAlreadyExistsException(namespace) + +case _ => + throw new IllegalArgumentException(s"Invalid namespace name: ${namespace.quoted}") + } + + override def alterNamespace(namespace: Array[String], changes: NamespaceChange*): Unit = { +namespace match { + case Array(db) => +changes.foreach { + case set: NamespaceChange.SetProperty => +// ignore changes other than comments +if (set.property() == SupportsNamespaces.PROP_COMMENT) { + withConnection { conn => +JdbcUtils.createNamespaceComment(conn, options, db, set.value) + } +} + + case unset: NamespaceChange.RemoveProperty => +// ignore changes other than comments +if (unset.property() == SupportsNamespaces.PROP_COMMENT) { + withConnection { conn => +JdbcUtils.removeNamespaceComment(conn, options, db) + } +} + + case _ => +throw new SQLFeatureNotSupportedException(s"Unsupported NamespaceChange $changes") +} + + case _ => +throw new NoSuchNamespaceException(namespace) +} + } + + override def dropNamespace(namespace: Array[String]): Boolean = namespace match { +case Array(db) if namespaceExists(namespace) => + if (listTables(Array(db)).nonEmpty) { +throw new IllegalStateException(s"Namespace ${namespace.quoted} is not empty") + } + withConnection { conn => +classifyException(s"Failed drop name space: $db") { + JdbcUtils.dropNamespace(conn, options, db) + true +} + } + +case Array(_) => + // exists returned false Review comment: and shall we fail? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30473: [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog
cloud-fan commented on a change in pull request #30473: URL: https://github.com/apache/spark/pull/30473#discussion_r533910609 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala ## @@ -171,6 +175,130 @@ class JDBCTableCatalog extends TableCatalog with Logging { } } + override def namespaceExists(namespace: Array[String]): Boolean = namespace match { +case Array(db) => + withConnection { conn => +val rs = conn.getMetaData.getSchemas(null, db) +while (rs.next()) { + if (rs.getString(1) == db) return true; +} +false + } +case _ => false + } + + override def listNamespaces(): Array[Array[String]] = { +withConnection { conn => + val schemaBuilder = ArrayBuilder.make[Array[String]] + val rs = conn.getMetaData.getSchemas() + while (rs.next()) { +schemaBuilder += Array(rs.getString(1)) + } + schemaBuilder.result +} + } + + override def listNamespaces(namespace: Array[String]): Array[Array[String]] = { +namespace match { + case Array() => +listNamespaces() + case Array(db) if namespaceExists(namespace) => +Array() + case _ => +throw new NoSuchNamespaceException(namespace) +} + } + + override def loadNamespaceMetadata(namespace: Array[String]): util.Map[String, String] = { +namespace match { + case Array(db) => +if (!namespaceExists(namespace)) throw new NoSuchNamespaceException(db) +mutable.HashMap[String, String]().asJava + + case _ => +throw new NoSuchNamespaceException(namespace) +} + } + + override def createNamespace( + namespace: Array[String], + metadata: util.Map[String, String]): Unit = namespace match { +case Array(db) if !namespaceExists(namespace) => + var comment = "" + if (!metadata.isEmpty) { +metadata.asScala.map { + case (k, v) => k match { +case SupportsNamespaces.PROP_COMMENT => comment = v +case SupportsNamespaces.PROP_OWNER => // ignore +case SupportsNamespaces.PROP_LOCATION => + throw new AnalysisException("CREATE NAMESPACE ... LOCATION ... is not supported in" + +" JDBC catalog.") +case _ => // ignore all the other properties for now + } +} + } + withConnection { conn => +classifyException(s"Failed create name space: $db") { + JdbcUtils.createNamespace(conn, options, db, comment) +} + } + +case Array(_) => + throw new NamespaceAlreadyExistsException(namespace) + +case _ => + throw new IllegalArgumentException(s"Invalid namespace name: ${namespace.quoted}") + } + + override def alterNamespace(namespace: Array[String], changes: NamespaceChange*): Unit = { +namespace match { + case Array(db) => +changes.foreach { + case set: NamespaceChange.SetProperty => +// ignore changes other than comments +if (set.property() == SupportsNamespaces.PROP_COMMENT) { + withConnection { conn => +JdbcUtils.createNamespaceComment(conn, options, db, set.value) + } +} + + case unset: NamespaceChange.RemoveProperty => +// ignore changes other than comments +if (unset.property() == SupportsNamespaces.PROP_COMMENT) { + withConnection { conn => +JdbcUtils.removeNamespaceComment(conn, options, db) + } +} + + case _ => +throw new SQLFeatureNotSupportedException(s"Unsupported NamespaceChange $changes") +} + + case _ => +throw new NoSuchNamespaceException(namespace) +} + } + + override def dropNamespace(namespace: Array[String]): Boolean = namespace match { +case Array(db) if namespaceExists(namespace) => + if (listTables(Array(db)).nonEmpty) { +throw new IllegalStateException(s"Namespace ${namespace.quoted} is not empty") + } + withConnection { conn => +classifyException(s"Failed drop name space: $db") { + JdbcUtils.dropNamespace(conn, options, db) + true +} + } + +case Array(_) => + // exists returned false Review comment: not exists This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org