[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page
AmplabJenkins commented on pull request #30427: URL: https://github.com/apache/spark/pull/30427#issuecomment-733037213 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
AmplabJenkins removed a comment on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733035708 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
SparkQA removed a comment on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733031728 **[Test build #131672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131672/testReport)** for PR 30421 at commit [`e312697`](https://github.com/apache/spark/commit/e312697c7e6e9feebed0be15cd2bec0f829bbf49). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
AmplabJenkins commented on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733035708 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
SparkQA commented on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733035666 **[Test build #131672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131672/testReport)** for PR 30421 at commit [`e312697`](https://github.com/apache/spark/commit/e312697c7e6e9feebed0be15cd2bec0f829bbf49). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-733035516 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins removed a comment on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-733035516 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30412: [SPARK-33480][SQL] Support char/varchar type
cloud-fan commented on a change in pull request #30412: URL: https://github.com/apache/spark/pull/30412#discussion_r529618309 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -94,6 +94,10 @@ trait CheckAnalysis extends PredicateHelper { case p if p.analyzed => // Skip already analyzed sub-plans + case leaf: LeafNode if leaf.output.map(_.dataType).exists(CharVarcharUtils.hasCharVarchar) => Review comment: @maropu I changed it back as it's pretty risky to get output from arbitrary logical plans. An example of the error: ``` Caused by: sbt.ForkMain$ForkError: java.lang.AssertionError: assertion failed: Scalar subquery should have only one column at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.expressions.ScalarSubquery.dataType(subquery.scala:229) at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:181) at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:61) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
SparkQA commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-733034804 **[Test build #131674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131674/testReport)** for PR 30412 at commit [`38999b5`](https://github.com/apache/spark/commit/38999b535e78817d2647d186605618438f438220). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model
SparkQA removed a comment on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-733031641 **[Test build #131671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131671/testReport)** for PR 30471 at commit [`5a39258`](https://github.com/apache/spark/commit/5a3925800b8ecff9911b779eec97bee6f1e6d5da). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model
AmplabJenkins removed a comment on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-733032271 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model
SparkQA commented on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-733032253 **[Test build #131671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131671/testReport)** for PR 30471 at commit [`5a39258`](https://github.com/apache/spark/commit/5a3925800b8ecff9911b779eec97bee6f1e6d5da). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model
AmplabJenkins commented on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-733032271 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-733032201 **[Test build #131673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131673/testReport)** for PR 29893 at commit [`c74d4d7`](https://github.com/apache/spark/commit/c74d4d7a0e0768fa5d9970a9e7ddd109306d6caa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
SparkQA commented on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733031728 **[Test build #131672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131672/testReport)** for PR 30421 at commit [`e312697`](https://github.com/apache/spark/commit/e312697c7e6e9feebed0be15cd2bec0f829bbf49). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
SparkQA commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733031627 **[Test build #131670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131670/testReport)** for PR 30486 at commit [`0559c95`](https://github.com/apache/spark/commit/0559c955efe3e62ac7590b3cccefea73fd5e0fc8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30487: [SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script
SparkQA commented on pull request #30487: URL: https://github.com/apache/spark/pull/30487#issuecomment-733031613 **[Test build #131669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131669/testReport)** for PR 30487 at commit [`5c90411`](https://github.com/apache/spark/commit/5c9041103fff89089ca136e97d8181c359e76b7a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model
SparkQA commented on pull request #30471: URL: https://github.com/apache/spark/pull/30471#issuecomment-733031641 **[Test build #131671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131671/testReport)** for PR 30471 at commit [`5a39258`](https://github.com/apache/spark/commit/5a3925800b8ecff9911b779eec97bee6f1e6d5da). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30488: [SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
SparkQA commented on pull request #30488: URL: https://github.com/apache/spark/pull/30488#issuecomment-733031612 **[Test build #131668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131668/testReport)** for PR 30488 at commit [`a617f94`](https://github.com/apache/spark/commit/a617f9430064f897a85e6373702b5c45bb7250b6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal
Ngone51 commented on a change in pull request #30433: URL: https://github.com/apache/spark/pull/30433#discussion_r529611998 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -827,13 +833,16 @@ void resetChunkTracker() { void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException { long idxStartPos = -1; try { -// update the chunk tracker to meta file before index file -writeChunkTracker(mapIndex); idxStartPos = indexFile.getFilePointer(); logger.trace("{} shuffleId {} reduceId {} updated index current {} updated {}", appShuffleId.appId, appShuffleId.shuffleId, reduceId, this.lastChunkOffset, chunkOffset); -indexFile.writeLong(chunkOffset); +indexFile.write(Longs.toByteArray(chunkOffset)); +// Chunk bitmap should be written to the meta file after the index file because if there are +// any exceptions during writing the offset to the index file, meta file should not be +// updated. If the update to the index file is successful but the update to meta file isn't +// then the index file position is reset in the catch clause. +writeChunkTracker(mapIndex); Review comment: Tracking IO exceptions sounds a little bit complex to me...I'd prefer to stop merging in case of IO exception for the seek. Besides, I'm thinking it would be good if we could dynamically change the merger location when such IO exception happens. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
AmplabJenkins removed a comment on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733029357 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
AmplabJenkins removed a comment on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-733029359 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid
AmplabJenkins removed a comment on pull request #30442: URL: https://github.com/apache/spark/pull/30442#issuecomment-733029355 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments
AmplabJenkins removed a comment on pull request #29122: URL: https://github.com/apache/spark/pull/29122#issuecomment-733029356 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid
AmplabJenkins commented on pull request #30442: URL: https://github.com/apache/spark/pull/30442#issuecomment-733029355 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
AmplabJenkins commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-733029359 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
AmplabJenkins commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733029357 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments
AmplabJenkins commented on pull request #29122: URL: https://github.com/apache/spark/pull/29122#issuecomment-733029356 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal
Ngone51 commented on a change in pull request #30433: URL: https://github.com/apache/spark/pull/30433#discussion_r529604208 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -827,13 +833,16 @@ void resetChunkTracker() { void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException { long idxStartPos = -1; try { -// update the chunk tracker to meta file before index file -writeChunkTracker(mapIndex); idxStartPos = indexFile.getFilePointer(); logger.trace("{} shuffleId {} reduceId {} updated index current {} updated {}", appShuffleId.appId, appShuffleId.shuffleId, reduceId, this.lastChunkOffset, chunkOffset); -indexFile.writeLong(chunkOffset); +indexFile.write(Longs.toByteArray(chunkOffset)); +// Chunk bitmap should be written to the meta file after the index file because if there are +// any exceptions during writing the offset to the index file, meta file should not be +// updated. If the update to the index file is successful but the update to meta file isn't +// then the index file position is reset in the catch clause. Review comment: Yeah..but the current implementation seems not correct. It still update the `lastChunkOffset` when exception happens, no? https://github.com/apache/spark/blob/8113c88542ee282b510c7e046d64df1761a85d14/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L846 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -827,13 +833,16 @@ void resetChunkTracker() { void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException { long idxStartPos = -1; try { -// update the chunk tracker to meta file before index file -writeChunkTracker(mapIndex); idxStartPos = indexFile.getFilePointer(); logger.trace("{} shuffleId {} reduceId {} updated index current {} updated {}", appShuffleId.appId, appShuffleId.shuffleId, reduceId, this.lastChunkOffset, chunkOffset); -indexFile.writeLong(chunkOffset); +indexFile.write(Longs.toByteArray(chunkOffset)); +// Chunk bitmap should be written to the meta file after the index file because if there are +// any exceptions during writing the offset to the index file, meta file should not be +// updated. If the update to the index file is successful but the update to meta file isn't +// then the index file position is reset in the catch clause. Review comment: Yeah..I mean the current implementation seems not correct. It still update the `lastChunkOffset` when exception happens, no? https://github.com/apache/spark/blob/8113c88542ee282b510c7e046d64df1761a85d14/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L846 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -827,13 +833,16 @@ void resetChunkTracker() { void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException { long idxStartPos = -1; try { -// update the chunk tracker to meta file before index file -writeChunkTracker(mapIndex); idxStartPos = indexFile.getFilePointer(); logger.trace("{} shuffleId {} reduceId {} updated index current {} updated {}", appShuffleId.appId, appShuffleId.shuffleId, reduceId, this.lastChunkOffset, chunkOffset); -indexFile.writeLong(chunkOffset); +indexFile.write(Longs.toByteArray(chunkOffset)); +// Chunk bitmap should be written to the meta file after the index file because if there are +// any exceptions during writing the offset to the index file, meta file should not be +// updated. If the update to the index file is successful but the update to meta file isn't +// then the index file position is reset in the catch clause. Review comment: Yeah..I mean the current implementation seems not correct. It still updates the `lastChunkOffset` when exception happens, no? https://github.com/apache/spark/blob/8113c88542ee282b510c7e046d64df1761a85d14/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apac
[GitHub] [spark] SparkQA removed a comment on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid
SparkQA removed a comment on pull request #30442: URL: https://github.com/apache/spark/pull/30442#issuecomment-732919060 **[Test build #131656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131656/testReport)** for PR 30442 at commit [`9b76d6a`](https://github.com/apache/spark/commit/9b76d6ac046f3bd8ea67b7628ed5b48659b73f46). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid
SparkQA commented on pull request #30442: URL: https://github.com/apache/spark/pull/30442#issuecomment-733021920 **[Test build #131656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131656/testReport)** for PR 30442 at commit [`9b76d6a`](https://github.com/apache/spark/commit/9b76d6ac046f3bd8ea67b7628ed5b48659b73f46). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
cloud-fan commented on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-733019331 @rdblue there are several mistakes in my previous PR that cause test failures, I'm fixing them in https://github.com/rdblue/spark/pull/9 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
AngersZh commented on a change in pull request #30421: URL: https://github.com/apache/spark/pull/30421#discussion_r529597136 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -503,13 +503,32 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg } } + private def convertTypeConstructedLiteralToString(literal: Literal): String = literal match { +case Literal(data: Int, dataType: DateType) => + UTF8String.fromString( +DateFormatter(getZoneId(SQLConf.get.sessionLocalTimeZone)) + .format(data)).toString +case Literal(data: Long, dataType: TimestampType) => + UTF8String.fromString( + TimestampFormatter.getFractionFormatter(getZoneId(SQLConf.get.sessionLocalTimeZone)) + .format(data)).toString +case Literal(data: CalendarInterval, dataType: CalendarIntervalType) => + UTF8String.fromString(data.toString).toString +case Literal(data: Array[Byte], dataType: BinaryType) => + UTF8String.fromBytes(data).toString +case Literal(data, dataType) => + UTF8String.fromString(data.toString).toString Review comment: > We need this entry? What's an example to match this case? removed this line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value
AngersZh commented on pull request #30421: URL: https://github.com/apache/spark/pull/30421#issuecomment-733017916 > Could you add some descriptions about this syntax and add examples in the `INSERT` document?: https://spark.apache.org/docs/3.0.1/sql-ref-syntax-dml-insert-into.html#parameters Updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon commented on a change in pull request #30486: URL: https://github.com/apache/spark/pull/30486#discussion_r529590417 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -535,13 +538,23 @@ private[spark] object Utils extends Logging { doFetchFile(url, targetDir, fileName, conf, securityMgr, hadoopConf) } -// Decompress the file if it's a .tar or .tar.gz -if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) { - logInfo("Untarring " + fileName) - executeAndGetOutput(Seq("tar", "-xzf", fileName), targetDir) -} else if (fileName.endsWith(".tar")) { - logInfo("Untarring " + fileName) - executeAndGetOutput(Seq("tar", "-xf", fileName), targetDir) +if (shouldUntar) { + // Decompress the file if it's a .tar or .tar.gz + if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) { +logWarning( + "Untarring behavior is deprecated at spark.files and " + +"SparkContext.addFile. Use spark.archives or SparkContext.addArchive " + +"instead.") +logInfo("Untarring " + fileName) +executeAndGetOutput(Seq("tar", "-xzf", fileName), targetDir) Review comment: Our `spark.files` and `SparkContext.addFile` have a sort of undocumented and hidden behaviour. Only in executor side, it untars if the files are `.tar.gz` or `tgz`. I think it makes sense to deprecate this behaviour and encourage users to use explicit archive handling. Also, I believe it's a good practice to avoid relying on external programs anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #30488: [SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
Ngone51 commented on pull request #30488: URL: https://github.com/apache/spark/pull/30488#issuecomment-733012047 cc @cloud-fan @xuanyuanking Could you take a look? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request #30488: [SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
Ngone51 opened a new pull request #30488: URL: https://github.com/apache/spark/pull/30488 ### What changes were proposed in this pull request? Currently, `join()` uses `withPlan(logicalPlan)` for convenient to call some Dataset functions. But it leads to the `dataset_id` inconsistent between the `logicalPlan` and the original `Dataset`(because `withPlan(logicalPlan)` will create a new Dataset with the new id and reset the `dataset_id` with the new id of the `logicalPlan`). As a result, it breaks the rule `DetectAmbiguousSelfJoin`. In this PR, we propose to drop the usage of `withPlan` but use the `logicalPlan` directly so its `dataset_id` doesn't change. ### Why are the changes needed? For the query below, it returns the wrong result while it should throws ambiguous self join exception instead: ```scala val emp1 = Seq[TestData]( TestData(1, "sales"), TestData(2, "personnel"), TestData(3, "develop"), TestData(4, "IT")).toDS() val emp2 = Seq[TestData]( TestData(1, "sales"), TestData(2, "personnel"), TestData(3, "develop")).toDS() val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) emp1.join(emp3, emp1.col("key") === emp3.col("key"), "left_outer") .select(emp1.col("*"), emp3.col("key").as("e2")).show() // wrong result +---+-+---+ |key|value| e2| +---+-+---+ | 1|sales| 1| | 2|personnel| 2| | 3| develop| 3| | 4| IT| 4| +---+-+---+ ``` This PR fixes the wrong behaviour. ### Does this PR introduce _any_ user-facing change? Yes, users hit the exception instead of the wrong result after this PR. ### How was this patch tested? Added a new unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
SparkQA removed a comment on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733004565 **[Test build #131667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131667/testReport)** for PR 30486 at commit [`469eacf`](https://github.com/apache/spark/commit/469eacfb25a6aa21118b8d89728b70ab22fc4dcb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
SparkQA commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733011314 **[Test build #131667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131667/testReport)** for PR 30486 at commit [`469eacf`](https://github.com/apache/spark/commit/469eacfb25a6aa21118b8d89728b70ab22fc4dcb). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `logInfo(s\"Adding $url to class loader\")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #30487: [SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script
LuciferYang commented on pull request #30487: URL: https://github.com/apache/spark/pull/30487#issuecomment-733011172 Wait for Jenkins test to verify the results This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #30487: [WIP][SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script
LuciferYang commented on pull request #30487: URL: https://github.com/apache/spark/pull/30487#issuecomment-733010164 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131634/testReport/org.apache.spark.sql.hive.thriftserver/SparkThriftServerProtocolVersionsSuite/HIVE_CLI_SERVICE_PROTOCOL_V1_get_binary_type/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131659/testReport/org.apache.spark.sql.hive.thriftserver/SparkThriftServerProtocolVersionsSuite/HIVE_CLI_SERVICE_PROTOCOL_V1_get_binary_type/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131619/testReport/org.apache.spark.sql.hive.thriftserver/SparkThriftServerProtocolVersionsSuite/HIVE_CLI_SERVICE_PROTOCOL_V1_get_binary_type/ ![image](https://user-images.githubusercontent.com/1475305/100107069-3a25c480-2ea4-11eb-9a2f-4e3bb4bf7f46.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733009501 cc @zero323 and @fhoering too FYI. This is related to the docs and shipping 3rd party Python packages in PySpark apps. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments
SparkQA removed a comment on pull request #29122: URL: https://github.com/apache/spark/pull/29122#issuecomment-732936032 **[Test build #131658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131658/testReport)** for PR 29122 at commit [`0e372ca`](https://github.com/apache/spark/commit/0e372caa1d3ea33ebdde98de3b4a1afbb4a5fc38). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments
SparkQA commented on pull request #29122: URL: https://github.com/apache/spark/pull/29122#issuecomment-733008784 **[Test build #131658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131658/testReport)** for PR 29122 at commit [`0e372ca`](https://github.com/apache/spark/commit/0e372caa1d3ea33ebdde98de3b4a1afbb4a5fc38). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733008523 @tgravescs, @mridulm, @Ngone51, can you take a look when you guys find some time? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
SparkQA removed a comment on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-732993082 **[Test build #131665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131665/testReport)** for PR 30412 at commit [`b6d74c4`](https://github.com/apache/spark/commit/b6d74c481ecb651286685490b1beded99f0d50f9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
SparkQA commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-733007577 **[Test build #131665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131665/testReport)** for PR 30412 at commit [`b6d74c4`](https://github.com/apache/spark/commit/b6d74c481ecb651286685490b1beded99f0d50f9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request #30487: [WIP][SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script
LuciferYang opened a new pull request #30487: URL: https://github.com/apache/spark/pull/30487 ### What changes were proposed in this pull request? It seems that Jenkins tests tasks in many pr have test failed. The failed cases include: - org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type - org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V2 get binary type - org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V3 get binary type - org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V4 get binary type - org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V5 get binary type The error message as follows: ``` Error Messageorg.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("Stacktracesbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�](" at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) at org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.$anonfun$new$26(SparkThriftServerProtocolVersionsSuite.scala:302) ``` But they can pass the GitHub Action, maybe it's related to the `LANG` of the Jenkins build machine, this pr add `export LANG="en_US.UTF-8"` in `run-test-jenkins` script. ### Why are the changes needed? Ensure LANG in Jenkins test process is `en_US.UTF-8` to pass `HIVE_CLI_SERVICE_PROTOCOL_VX` related tests ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Jenkins tests pass This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon commented on a change in pull request #30486: URL: https://github.com/apache/spark/pull/30486#discussion_r529583595 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1568,21 +1612,44 @@ class SparkContext(config: SparkConf) extends Logging { val key = if (!isLocal && scheme == "file") { env.rpcEnv.fileServer.addFile(new File(uri.getPath)) +} else if (uri.getScheme == null) { + schemeCorrectedURI.toString +} else if (isArchive) { + uri.toString Review comment: For the same reason of keeping the fragment, it uses URI when it's archive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon commented on a change in pull request #30486: URL: https://github.com/apache/spark/pull/30486#discussion_r529583152 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1550,7 +1594,7 @@ class SparkContext(config: SparkConf) extends Logging { val hadoopPath = new Path(schemeCorrectedURI) val scheme = schemeCorrectedURI.getScheme -if (!Array("http", "https", "ftp").contains(scheme)) { +if (!Array("http", "https", "ftp").contains(scheme) && !isArchive) { Review comment: Archive is not supposed to be a directory. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon commented on a change in pull request #30486: URL: https://github.com/apache/spark/pull/30486#discussion_r529582202 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1537,8 +1575,14 @@ class SparkContext(config: SparkConf) extends Logging { addFile(path, recursive, false) } - private def addFile(path: String, recursive: Boolean, addedOnSubmit: Boolean): Unit = { -val uri = new Path(path).toUri + private def addFile( + path: String, recursive: Boolean, addedOnSubmit: Boolean, isArchive: Boolean = false +): Unit = { +val uri = if (!isArchive) { + new Path(path).toUri +} else { + Utils.resolveURI(path) Review comment: Here we cannot rely on `new Path(path).toUri`. it makes the fragment (`#`) in URI as the part of path. `Utils.resolveURI` is used for `spark.yarn.dist.archives` as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
SparkQA commented on pull request #30486: URL: https://github.com/apache/spark/pull/30486#issuecomment-733004565 **[Test build #131667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131667/testReport)** for PR 30486 at commit [`469eacf`](https://github.com/apache/spark/commit/469eacfb25a6aa21118b8d89728b70ab22fc4dcb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively
HyukjinKwon opened a new pull request #30486: URL: https://github.com/apache/spark/pull/30486 ### What changes were proposed in this pull request? TL;DR: - This PR completes the support of archives in Spark itself instead of Yarn-only - After this PR, PySpark users can use Conda to ship Python packages together as below: ```python conda create -y -n pyspark_env -c conda-forge pyarrow==2.0.0 pandas==1.1.4 conda-pack==0.5.0 conda activate pyspark_env conda pack -f -o pyspark_env.tar.gz PYSPARK_DRIVER_PYTHON=python PYSPARK_PYTHON=./environment/bin/python pyspark --archives pyspark_env.tar.gz#environment ``` This PR proposes to add Spark's native `--archives` in Spark submit, and `spark.archives` configuration. Currently, both are supported only in Yarn mode: ```bash ./bin/spark-submit --help ``` ``` Options: ... Spark on YARN only: --queue QUEUE_NAME The YARN queue to submit to (Default: "default"). --archives ARCHIVES Comma separated list of archives to be extracted into the working directory of each executor. ``` This `archives` feature is useful often when you have to ship a directory and unpack into executors. One example is native libraries to use e.g. JNI. Another example is to ship Python packages together by Conda environment. Especially for Conda, PySpark currently does not have a nice way to ship a package that works in general, please see also https://hyukjin-spark.readthedocs.io/en/stable/user_guide/python_packaging.html#using-zipped-virtual-environment (PySpark new documentation demo for 3.1.0). The neatest way is arguably to use Conda environment by shipping zipped Conda environment but this is currently dependent on this archive feature. NOTE that we are able to use `spark.files` by relying on its undocumented behaviour that untars `tar.gz` but I don't think we should document such ways and promote people to more rely on it. Also, note that this PR does not target to add the feature parity of `spark.files.overwrite`, `spark.files.useFetchCache`, etc. yet. I documented that this is an experimental feature as well. ### Why are the changes needed? To complete the feature parity, and to provide a better support of shipping Python libraries together with Conda env. ### Does this PR introduce _any_ user-facing change? Yes, this makes `--archives` works in Spark instead of Yarn-only, and adds a new configuration `spark.archives`. ### How was this patch tested? I added unittests. Also, manually tested in standalone cluster, local-cluster, and local modes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.
AmplabJenkins removed a comment on pull request #30472: URL: https://github.com/apache/spark/pull/30472#issuecomment-732967004 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
AmplabJenkins removed a comment on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-732999100 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
AmplabJenkins commented on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-732999100 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
SparkQA commented on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-732998422 **[Test build #131644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131644/testReport)** for PR 28647 at commit [`c45489a`](https://github.com/apache/spark/commit/c45489ad5b8ddd53d5e81fbba4cd08c0b4fd9850). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ExecutorSource(` * ` case class GetShufflePushMergerLocations(numMergersNeeded: Int, hostsToFilter: Set[String])` * ` case class RemoveShufflePushMergerLocation(host: String) extends ToBlockManagerMaster` * `case class UnresolvedTable(` * `class SubExprEvaluationRuntime(cacheMaxEntries: Int) ` * `case class ExpressionProxy(` * `case class ResultProxy(result: Any)` * `case class CurrentTimeZone() extends LeafExpression with Unevaluable ` * `abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant ` * `case class LikeAll(child: Expression, patterns: Seq[UTF8String]) extends LikeAllBase ` * `case class NotLikeAll(child: Expression, patterns: Seq[UTF8String]) extends LikeAllBase ` * `case class ParseUrl(children: Seq[Expression], failOnError: Boolean = SQLConf.get.ansiEnabled)` * ` implicit class MetadataColumnsHelper(metadata: Array[MetadataColumn]) ` * `trait PathFilterStrategy extends Serializable ` * `trait StrategyBuilder ` * `class PathGlobFilter(filePatten: String) extends PathFilterStrategy ` * `abstract class ModifiedDateFilter extends PathFilterStrategy ` * `class ModifiedBeforeFilter(thresholdTime: Long, val timeZoneId: String)` * `class ModifiedAfterFilter(thresholdTime: Long, val timeZoneId: String)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
SparkQA removed a comment on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-732791895 **[Test build #131644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131644/testReport)** for PR 28647 at commit [`c45489a`](https://github.com/apache/spark/commit/c45489ad5b8ddd53d5e81fbba4cd08c0b4fd9850). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV
AmplabJenkins removed a comment on pull request #30468: URL: https://github.com/apache/spark/pull/30468#issuecomment-732998296 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV
AmplabJenkins commented on pull request #30468: URL: https://github.com/apache/spark/pull/30468#issuecomment-732998296 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast
SparkQA commented on pull request #30440: URL: https://github.com/apache/spark/pull/30440#issuecomment-732993024 **[Test build #131664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131664/testReport)** for PR 30440 at commit [`e762162`](https://github.com/apache/spark/commit/e762162311e04c20bb06f9a4735514547050b832). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refract current grouping analytics
SparkQA commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-732993379 **[Test build #131666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131666/testReport)** for PR 30212 at commit [`74a22f8`](https://github.com/apache/spark/commit/74a22f8efacc39fb3b10fa78a76fc887e44b5362). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
SparkQA commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-732993082 **[Test build #131665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131665/testReport)** for PR 30412 at commit [`b6d74c4`](https://github.com/apache/spark/commit/b6d74c481ecb651286685490b1beded99f0d50f9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30485: [SPARK-33533][SQL] BasicConnectionProvider should consider case-sensitivity for properties.
SparkQA commented on pull request #30485: URL: https://github.com/apache/spark/pull/30485#issuecomment-732992899 **[Test build #131663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131663/testReport)** for PR 30485 at commit [`247c7ba`](https://github.com/apache/spark/commit/247c7baf0abd5b77e58d234327d15d18e5d2e96d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
AmplabJenkins removed a comment on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-732991019 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method
AmplabJenkins removed a comment on pull request #30484: URL: https://github.com/apache/spark/pull/30484#issuecomment-732991017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.
AmplabJenkins removed a comment on pull request #30465: URL: https://github.com/apache/spark/pull/30465#issuecomment-732991024 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type
AmplabJenkins removed a comment on pull request #30408: URL: https://github.com/apache/spark/pull/30408#issuecomment-732991026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
AmplabJenkins removed a comment on pull request #30413: URL: https://github.com/apache/spark/pull/30413#issuecomment-732991015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
AmplabJenkins removed a comment on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-732991018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30479: [WIP][SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases
AmplabJenkins removed a comment on pull request #30479: URL: https://github.com/apache/spark/pull/30479#issuecomment-732991020 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30432: [SPARK-33494][SQL][AQE] Do not use local shuffle reader for repartition
AmplabJenkins removed a comment on pull request #30432: URL: https://github.com/apache/spark/pull/30432#issuecomment-732991025 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency
AmplabJenkins removed a comment on pull request #30470: URL: https://github.com/apache/spark/pull/30470#issuecomment-732991022 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV
AmplabJenkins removed a comment on pull request #30468: URL: https://github.com/apache/spark/pull/30468#issuecomment-732991012 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV
AmplabJenkins commented on pull request #30468: URL: https://github.com/apache/spark/pull/30468#issuecomment-732991012 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
AmplabJenkins commented on pull request #30413: URL: https://github.com/apache/spark/pull/30413#issuecomment-732991037 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
AmplabJenkins commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-732991018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency
AmplabJenkins commented on pull request #30470: URL: https://github.com/apache/spark/pull/30470#issuecomment-732991022 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type
AmplabJenkins commented on pull request #30408: URL: https://github.com/apache/spark/pull/30408#issuecomment-732991026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30432: [SPARK-33494][SQL][AQE] Do not use local shuffle reader for repartition
AmplabJenkins commented on pull request #30432: URL: https://github.com/apache/spark/pull/30432#issuecomment-732991025 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30479: [WIP][SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases
AmplabJenkins commented on pull request #30479: URL: https://github.com/apache/spark/pull/30479#issuecomment-732991020 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.
AmplabJenkins commented on pull request #30465: URL: https://github.com/apache/spark/pull/30465#issuecomment-732991024 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
AmplabJenkins commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-732991019 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method
AmplabJenkins commented on pull request #30484: URL: https://github.com/apache/spark/pull/30484#issuecomment-732991017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
SparkQA removed a comment on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-732846865 **[Test build #131653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131653/testReport)** for PR 30412 at commit [`f46e32f`](https://github.com/apache/spark/commit/f46e32fb1649023eed0ddab4cb23ca4a97b14a0f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
SparkQA commented on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-732988019 **[Test build #131653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131653/testReport)** for PR 30412 at commit [`f46e32f`](https://github.com/apache/spark/commit/f46e32fb1649023eed0ddab4cb23ca4a97b14a0f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30482: [SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs
MaxGekk commented on a change in pull request #30482: URL: https://github.com/apache/spark/pull/30482#discussion_r529558445 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala ## @@ -243,4 +243,22 @@ class AlterTablePartitionV2SQLSuite extends DatasourceV2SQLBase { assert(!partTable.partitionExists(expectedPartition)) } } + + test("SPARK-33529: handle __HIVE_DEFAULT_PARTITION__") { +val t = "testpart.ns1.ns2.tbl" +withTable(t) { + sql(s"CREATE TABLE $t (part0 string) USING foo PARTITIONED BY (part0)") + val partTable = catalog("testpart") +.asTableCatalog +.loadTable(Identifier.of(Array("ns1", "ns2"), "tbl")) +.asPartitionable + val expectedPartition = InternalRow.fromSeq(Seq[Any](null)) + assert(!partTable.partitionExists(expectedPartition)) + val partSpec = "PARTITION (part0 = '__HIVE_DEFAULT_PARTITION__')" Review comment: > It's more like a hive specific thing and we should let v2 implementation to decide ... It is already Spark specific thing too. Implementations don't see `'__HIVE_DEFAULT_PARTITION__'` at all because it is replaced by `null` at the analyzing phase. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #30485: [SPARK-33533][SQL] BasicConnectionProvider should consider case-sensitivity for properties.
sarutak opened a new pull request #30485: URL: https://github.com/apache/spark/pull/30485 ### What changes were proposed in this pull request? This PR fixes an issue that `BasicConnectionProvider` doesn't consider case-sensitivity for properties. For example, the property `oracle.jdbc.mapDateToTimestamp` should be considered case-sensitivity but it is not considered. ### Why are the changes needed? This is a bug introduced by #29024 . Caused by this issue, `OracleIntegrationSuite` doesn't pass. ``` [info] - SPARK-16625: General data types to be mapped to Oracle *** FAILED *** (32 seconds, 129 milliseconds) [info] types.apply(9).equals(org.apache.spark.sql.types.DateType) was false (OracleIntegrationSuite.scala:238) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.jdbc.OracleIntegrationSuite.$anonfun$new$4(OracleIntegrationSuite.scala:238) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182) [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61) [info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) [info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:61) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233) [info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) [info] at scala.collection.immutable.List.foreach(List.scala:392) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232) [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563) [info] at org.scalatest.Suite.run(Suite.scala:1112) [info] at org.scalatest.Suite.run$(Suite.scala:1094) [info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236) [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:61) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513) [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.T
[GitHub] [spark] SparkQA removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA removed a comment on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-732960610 **[Test build #131661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131661/testReport)** for PR 29893 at commit [`c43b964`](https://github.com/apache/spark/commit/c43b96404cbdfeb859784304f2174fc28f66b357). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method
SparkQA removed a comment on pull request #30484: URL: https://github.com/apache/spark/pull/30484#issuecomment-732960198 **[Test build #131659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131659/testReport)** for PR 30484 at commit [`83b85d4`](https://github.com/apache/spark/commit/83b85d4c1101a39e0b85b41d54862badf4247ad5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method
SparkQA commented on pull request #30484: URL: https://github.com/apache/spark/pull/30484#issuecomment-732982229 **[Test build #131659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131659/testReport)** for PR 30484 at commit [`83b85d4`](https://github.com/apache/spark/commit/83b85d4c1101a39e0b85b41d54862badf4247ad5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-732981931 **[Test build #131661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131661/testReport)** for PR 29893 at commit [`c43b964`](https://github.com/apache/spark/commit/c43b96404cbdfeb859784304f2174fc28f66b357). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
SparkQA removed a comment on pull request #30413: URL: https://github.com/apache/spark/pull/30413#issuecomment-732964747 **[Test build #131662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131662/testReport)** for PR 30413 at commit [`ba6bd70`](https://github.com/apache/spark/commit/ba6bd707867f18ba1708dc30e4ce7dc2f1425055). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
SparkQA commented on pull request #30413: URL: https://github.com/apache/spark/pull/30413#issuecomment-732981002 **[Test build #131662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131662/testReport)** for PR 30413 at commit [`ba6bd70`](https://github.com/apache/spark/commit/ba6bd707867f18ba1708dc30e4ce7dc2f1425055). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30212: [SPARK-33308][SQL] Refract current grouping analytics
AngersZh commented on a change in pull request #30212: URL: https://github.com/apache/spark/pull/30212#discussion_r529553455 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -39,45 +39,22 @@ trait GroupingSet extends Expression with CodegenFallback { override def eval(input: InternalRow): Any = throw new UnsupportedOperationException } -// scalastyle:off line.size.limit line.contains.tab -@ExpressionDescription( - usage = """ -_FUNC_([col1[, col2 ..]]) - create a multi-dimensional cube using the specified columns - so that we can run aggregation on them. - """, - examples = """ -Examples: - > SELECT name, age, count(*) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY _FUNC_(name, age); -Bob5 1 -Alice 2 1 -Alice NULL1 -NULL 2 1 -NULL NULL2 -BobNULL1 -NULL 5 1 - """, - since = "2.0.0") -// scalastyle:on line.size.limit line.contains.tab -case class Cube(groupByExprs: Seq[Expression]) extends GroupingSet {} +case class Cube(groupingSets: Seq[Seq[Expression]]) extends GroupingSet { + override def groupByExprs: Seq[Expression] = +groupingSets.flatMap(_.distinct).distinct +} -// scalastyle:off line.size.limit line.contains.tab -@ExpressionDescription( - usage = """ -_FUNC_([col1[, col2 ..]]) - create a multi-dimensional rollup using the specified columns - so that we can run aggregation on them. - """, - examples = """ -Examples: - > SELECT name, age, count(*) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY _FUNC_(name, age); -Bob5 1 -Alice 2 1 -Alice NULL1 -NULL NULL2 -BobNULL1 - """, - since = "2.0.0") -// scalastyle:on line.size.limit line.contains.tab -case class Rollup(groupByExprs: Seq[Expression]) extends GroupingSet {} +case class Rollup(groupingSets: Seq[Seq[Expression]]) extends GroupingSet { + override def groupByExprs: Seq[Expression] = +groupingSets.flatMap(_.distinct).distinct +} + +case class GroupingSetsV2( Review comment: > Why does `GroupingSets` and `GroupingSetsV2` need to co-exist? Updated, removed now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast
gengliangwang commented on pull request #30440: URL: https://github.com/apache/spark/pull/30440#issuecomment-732974090 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.
SparkQA removed a comment on pull request #30465: URL: https://github.com/apache/spark/pull/30465#issuecomment-732846567 **[Test build #131651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131651/testReport)** for PR 30465 at commit [`985352e`](https://github.com/apache/spark/commit/985352ef5d8bc878c5ff07a1a24576a1ac77dfed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.
SparkQA commented on pull request #30465: URL: https://github.com/apache/spark/pull/30465#issuecomment-732973727 **[Test build #131651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131651/testReport)** for PR 30465 at commit [`985352e`](https://github.com/apache/spark/commit/985352ef5d8bc878c5ff07a1a24576a1ac77dfed). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency
SparkQA removed a comment on pull request #30470: URL: https://github.com/apache/spark/pull/30470#issuecomment-732827379 **[Test build #131648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131648/testReport)** for PR 30470 at commit [`bc3cb8b`](https://github.com/apache/spark/commit/bc3cb8b419bb985cdf98aaf172b20c900d40e806). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency
SparkQA commented on pull request #30470: URL: https://github.com/apache/spark/pull/30470#issuecomment-732972419 **[Test build #131648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131648/testReport)** for PR 30470 at commit [`bc3cb8b`](https://github.com/apache/spark/commit/bc3cb8b419bb985cdf98aaf172b20c900d40e806). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org