[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733037213







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733035708







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733031728


   **[Test build #131672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131672/testReport)**
 for PR 30421 at commit 
[`e312697`](https://github.com/apache/spark/commit/e312697c7e6e9feebed0be15cd2bec0f829bbf49).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733035708







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


SparkQA commented on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733035666


   **[Test build #131672 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131672/testReport)**
 for PR 30421 at commit 
[`e312697`](https://github.com/apache/spark/commit/e312697c7e6e9feebed0be15cd2bec0f829bbf49).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-733035516







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-733035516







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


cloud-fan commented on a change in pull request #30412:
URL: https://github.com/apache/spark/pull/30412#discussion_r529618309



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -94,6 +94,10 @@ trait CheckAnalysis extends PredicateHelper {
 
   case p if p.analyzed => // Skip already analyzed sub-plans
 
+  case leaf: LeafNode if 
leaf.output.map(_.dataType).exists(CharVarcharUtils.hasCharVarchar) =>

Review comment:
   @maropu I changed it back as it's pretty risky to get output from 
arbitrary logical plans. An example of the error:
   ```
   Caused by: sbt.ForkMain$ForkError: java.lang.AssertionError: assertion 
failed: Scalar subquery should have only one column
at scala.Predef$.assert(Predef.scala:223)
at 
org.apache.spark.sql.catalyst.expressions.ScalarSubquery.dataType(subquery.scala:229)
at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:181)
at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:61)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


SparkQA commented on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-733034804


   **[Test build #131674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131674/testReport)**
 for PR 30412 at commit 
[`38999b5`](https://github.com/apache/spark/commit/38999b535e78817d2647d186605618438f438220).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-733031641


   **[Test build #131671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131671/testReport)**
 for PR 30471 at commit 
[`5a39258`](https://github.com/apache/spark/commit/5a3925800b8ecff9911b779eec97bee6f1e6d5da).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-733032271







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model

2020-11-24 Thread GitBox


SparkQA commented on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-733032253


   **[Test build #131671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131671/testReport)**
 for PR 30471 at commit 
[`5a39258`](https://github.com/apache/spark/commit/5a3925800b8ecff9911b779eec97bee6f1e6d5da).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-733032271







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-24 Thread GitBox


SparkQA commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-733032201


   **[Test build #131673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131673/testReport)**
 for PR 29893 at commit 
[`c74d4d7`](https://github.com/apache/spark/commit/c74d4d7a0e0768fa5d9970a9e7ddd109306d6caa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


SparkQA commented on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733031728


   **[Test build #131672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131672/testReport)**
 for PR 30421 at commit 
[`e312697`](https://github.com/apache/spark/commit/e312697c7e6e9feebed0be15cd2bec0f829bbf49).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


SparkQA commented on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733031627


   **[Test build #131670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131670/testReport)**
 for PR 30486 at commit 
[`0559c95`](https://github.com/apache/spark/commit/0559c955efe3e62ac7590b3cccefea73fd5e0fc8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30487: [SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


SparkQA commented on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733031613


   **[Test build #131669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131669/testReport)**
 for PR 30487 at commit 
[`5c90411`](https://github.com/apache/spark/commit/5c9041103fff89089ca136e97d8181c359e76b7a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30471: [WIP][SPARK-33520][ML] make CrossValidator/TrainValidateSplit support Python backend estimator/model

2020-11-24 Thread GitBox


SparkQA commented on pull request #30471:
URL: https://github.com/apache/spark/pull/30471#issuecomment-733031641


   **[Test build #131671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131671/testReport)**
 for PR 30471 at commit 
[`5a39258`](https://github.com/apache/spark/commit/5a3925800b8ecff9911b779eec97bee6f1e6d5da).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30488: [SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin

2020-11-24 Thread GitBox


SparkQA commented on pull request #30488:
URL: https://github.com/apache/spark/pull/30488#issuecomment-733031612


   **[Test build #131668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131668/testReport)**
 for PR 30488 at commit 
[`a617f94`](https://github.com/apache/spark/commit/a617f9430064f897a85e6373702b5c45bb7250b6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal

2020-11-24 Thread GitBox


Ngone51 commented on a change in pull request #30433:
URL: https://github.com/apache/spark/pull/30433#discussion_r529611998



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -827,13 +833,16 @@ void resetChunkTracker() {
 void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
   long idxStartPos = -1;
   try {
-// update the chunk tracker to meta file before index file
-writeChunkTracker(mapIndex);
 idxStartPos = indexFile.getFilePointer();
 logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
   appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
   chunkOffset);
-indexFile.writeLong(chunkOffset);
+indexFile.write(Longs.toByteArray(chunkOffset));
+// Chunk bitmap should be written to the meta file after the index 
file because if there are
+// any exceptions during writing the offset to the index file, meta 
file should not be
+// updated. If the update to the index file is successful but the 
update to meta file isn't
+// then the index file position is reset in the catch clause.
+writeChunkTracker(mapIndex);

Review comment:
   Tracking IO exceptions sounds a little bit complex to me...I'd prefer to 
stop merging in case of IO exception for the seek.
   
   Besides, I'm thinking it would be good if we could dynamically change the 
merger location when such IO exception happens.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733029357







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-733029359







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30442:
URL: https://github.com/apache/spark/pull/30442#issuecomment-733029355







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #29122:
URL: https://github.com/apache/spark/pull/29122#issuecomment-733029356







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30442:
URL: https://github.com/apache/spark/pull/30442#issuecomment-733029355







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-733029359







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733029357







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #29122:
URL: https://github.com/apache/spark/pull/29122#issuecomment-733029356







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal

2020-11-24 Thread GitBox


Ngone51 commented on a change in pull request #30433:
URL: https://github.com/apache/spark/pull/30433#discussion_r529604208



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -827,13 +833,16 @@ void resetChunkTracker() {
 void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
   long idxStartPos = -1;
   try {
-// update the chunk tracker to meta file before index file
-writeChunkTracker(mapIndex);
 idxStartPos = indexFile.getFilePointer();
 logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
   appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
   chunkOffset);
-indexFile.writeLong(chunkOffset);
+indexFile.write(Longs.toByteArray(chunkOffset));
+// Chunk bitmap should be written to the meta file after the index 
file because if there are
+// any exceptions during writing the offset to the index file, meta 
file should not be
+// updated. If the update to the index file is successful but the 
update to meta file isn't
+// then the index file position is reset in the catch clause.

Review comment:
   Yeah..but the current implementation seems not correct. It still update 
the `lastChunkOffset` when exception happens, no?
   
https://github.com/apache/spark/blob/8113c88542ee282b510c7e046d64df1761a85d14/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L846

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -827,13 +833,16 @@ void resetChunkTracker() {
 void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
   long idxStartPos = -1;
   try {
-// update the chunk tracker to meta file before index file
-writeChunkTracker(mapIndex);
 idxStartPos = indexFile.getFilePointer();
 logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
   appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
   chunkOffset);
-indexFile.writeLong(chunkOffset);
+indexFile.write(Longs.toByteArray(chunkOffset));
+// Chunk bitmap should be written to the meta file after the index 
file because if there are
+// any exceptions during writing the offset to the index file, meta 
file should not be
+// updated. If the update to the index file is successful but the 
update to meta file isn't
+// then the index file position is reset in the catch clause.

Review comment:
   Yeah..I mean the current implementation seems not correct. It still 
update the `lastChunkOffset` when exception happens, no?
   
https://github.com/apache/spark/blob/8113c88542ee282b510c7e046d64df1761a85d14/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L846

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -827,13 +833,16 @@ void resetChunkTracker() {
 void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
   long idxStartPos = -1;
   try {
-// update the chunk tracker to meta file before index file
-writeChunkTracker(mapIndex);
 idxStartPos = indexFile.getFilePointer();
 logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
   appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
   chunkOffset);
-indexFile.writeLong(chunkOffset);
+indexFile.write(Longs.toByteArray(chunkOffset));
+// Chunk bitmap should be written to the meta file after the index 
file because if there are
+// any exceptions during writing the offset to the index file, meta 
file should not be
+// updated. If the update to the index file is successful but the 
update to meta file isn't
+// then the index file position is reset in the catch clause.

Review comment:
   Yeah..I mean the current implementation seems not correct. It still 
updates the `lastChunkOffset` when exception happens, no?
   
https://github.com/apache/spark/blob/8113c88542ee282b510c7e046d64df1761a85d14/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L846





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apac

[GitHub] [spark] SparkQA removed a comment on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30442:
URL: https://github.com/apache/spark/pull/30442#issuecomment-732919060


   **[Test build #131656 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131656/testReport)**
 for PR 30442 at commit 
[`9b76d6a`](https://github.com/apache/spark/commit/9b76d6ac046f3bd8ea67b7628ed5b48659b73f46).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30442: [SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-24 Thread GitBox


SparkQA commented on pull request #30442:
URL: https://github.com/apache/spark/pull/30442#issuecomment-733021920


   **[Test build #131656 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131656/testReport)**
 for PR 30442 at commit 
[`9b76d6a`](https://github.com/apache/spark/commit/9b76d6ac046f3bd8ea67b7628ed5b48659b73f46).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-11-24 Thread GitBox


cloud-fan commented on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-733019331


   @rdblue there are several mistakes in my previous PR that cause test 
failures, I'm fixing them in https://github.com/rdblue/spark/pull/9



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AngersZh commented on a change in pull request #30421:
URL: https://github.com/apache/spark/pull/30421#discussion_r529597136



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -503,13 +503,32 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
 }
   }
 
+  private def convertTypeConstructedLiteralToString(literal: Literal): String 
= literal match {
+case Literal(data: Int, dataType: DateType) =>
+  UTF8String.fromString(
+DateFormatter(getZoneId(SQLConf.get.sessionLocalTimeZone))
+  .format(data)).toString
+case Literal(data: Long, dataType: TimestampType) =>
+  UTF8String.fromString(
+
TimestampFormatter.getFractionFormatter(getZoneId(SQLConf.get.sessionLocalTimeZone))
+  .format(data)).toString
+case Literal(data: CalendarInterval, dataType: CalendarIntervalType) =>
+  UTF8String.fromString(data.toString).toString
+case Literal(data: Array[Byte], dataType: BinaryType) =>
+  UTF8String.fromBytes(data).toString
+case Literal(data, dataType) =>
+  UTF8String.fromString(data.toString).toString

Review comment:
   > We need this entry? What's an example to match this case?
   
   removed this line





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AngersZh commented on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733017916


   > Could you add some descriptions about this syntax and add examples in the 
`INSERT` document?: 
https://spark.apache.org/docs/3.0.1/sql-ref-syntax-dml-insert-into.html#parameters
   
   Updated



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30486:
URL: https://github.com/apache/spark/pull/30486#discussion_r529590417



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -535,13 +538,23 @@ private[spark] object Utils extends Logging {
   doFetchFile(url, targetDir, fileName, conf, securityMgr, hadoopConf)
 }
 
-// Decompress the file if it's a .tar or .tar.gz
-if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) {
-  logInfo("Untarring " + fileName)
-  executeAndGetOutput(Seq("tar", "-xzf", fileName), targetDir)
-} else if (fileName.endsWith(".tar")) {
-  logInfo("Untarring " + fileName)
-  executeAndGetOutput(Seq("tar", "-xf", fileName), targetDir)
+if (shouldUntar) {
+  // Decompress the file if it's a .tar or .tar.gz
+  if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) {
+logWarning(
+  "Untarring behavior is deprecated at spark.files and " +
+"SparkContext.addFile. Use spark.archives or 
SparkContext.addArchive " +
+"instead.")
+logInfo("Untarring " + fileName)
+executeAndGetOutput(Seq("tar", "-xzf", fileName), targetDir)

Review comment:
   Our `spark.files` and `SparkContext.addFile` have a sort of undocumented 
and hidden behaviour. Only in executor side, it untars if the files are 
`.tar.gz` or `tgz`. I think it makes sense to deprecate this behaviour and 
encourage users to use explicit archive handling.
   
   Also, I believe it's a good practice to avoid relying on external programs 
anyway.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #30488: [SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin

2020-11-24 Thread GitBox


Ngone51 commented on pull request #30488:
URL: https://github.com/apache/spark/pull/30488#issuecomment-733012047


   cc @cloud-fan @xuanyuanking Could you take a look? Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #30488: [SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin

2020-11-24 Thread GitBox


Ngone51 opened a new pull request #30488:
URL: https://github.com/apache/spark/pull/30488


   
   
   ### What changes were proposed in this pull request?
   
   
   Currently, `join()` uses `withPlan(logicalPlan)` for convenient to call some 
Dataset functions. But it leads to the `dataset_id` inconsistent between the 
`logicalPlan` and the original `Dataset`(because `withPlan(logicalPlan)` will 
create a new Dataset with the new id and reset the `dataset_id` with the new id 
of the `logicalPlan`). As a result, it breaks the rule 
`DetectAmbiguousSelfJoin`.
   
   In this PR, we propose to drop the usage of `withPlan` but use the 
`logicalPlan` directly so its `dataset_id` doesn't change.
   
   
   ### Why are the changes needed?
   
   
   For the query below, it returns the wrong result while it should throws 
ambiguous self join exception instead:
   
   ```scala
   val emp1 = Seq[TestData](
 TestData(1, "sales"),
 TestData(2, "personnel"),
 TestData(3, "develop"),
 TestData(4, "IT")).toDS()
   val emp2 = Seq[TestData](
 TestData(1, "sales"),
 TestData(2, "personnel"),
 TestData(3, "develop")).toDS()
   val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*"))
   emp1.join(emp3, emp1.col("key") === emp3.col("key"), "left_outer")
 .select(emp1.col("*"), emp3.col("key").as("e2")).show()
   
   // wrong result
   +---+-+---+
   |key|value| e2|
   +---+-+---+
   |  1|sales|  1|
   |  2|personnel|  2|
   |  3|  develop|  3|
   |  4|   IT|  4|
   +---+-+---+
   ```
   This PR fixes the wrong behaviour.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   Yes, users hit the exception instead of the wrong result after this PR.
   
   
   ### How was this patch tested?
   
   
   Added a new unit test.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733004565


   **[Test build #131667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131667/testReport)**
 for PR 30486 at commit 
[`469eacf`](https://github.com/apache/spark/commit/469eacfb25a6aa21118b8d89728b70ab22fc4dcb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


SparkQA commented on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733011314


   **[Test build #131667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131667/testReport)**
 for PR 30486 at commit 
[`469eacf`](https://github.com/apache/spark/commit/469eacfb25a6aa21118b8d89728b70ab22fc4dcb).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `logInfo(s\"Adding $url to class loader\")`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #30487: [SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


LuciferYang commented on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733011172


   Wait for Jenkins test  to verify the results



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #30487: [WIP][SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


LuciferYang commented on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733010164


   
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131634/testReport/org.apache.spark.sql.hive.thriftserver/SparkThriftServerProtocolVersionsSuite/HIVE_CLI_SERVICE_PROTOCOL_V1_get_binary_type/
   
   
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131659/testReport/org.apache.spark.sql.hive.thriftserver/SparkThriftServerProtocolVersionsSuite/HIVE_CLI_SERVICE_PROTOCOL_V1_get_binary_type/
   
   
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131619/testReport/org.apache.spark.sql.hive.thriftserver/SparkThriftServerProtocolVersionsSuite/HIVE_CLI_SERVICE_PROTOCOL_V1_get_binary_type/
   
   
![image](https://user-images.githubusercontent.com/1475305/100107069-3a25c480-2ea4-11eb-9a2f-4e3bb4bf7f46.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


HyukjinKwon commented on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733009501


   cc @zero323 and @fhoering too FYI. This is related to the docs and shipping 
3rd party Python packages in PySpark apps.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #29122:
URL: https://github.com/apache/spark/pull/29122#issuecomment-732936032


   **[Test build #131658 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131658/testReport)**
 for PR 29122 at commit 
[`0e372ca`](https://github.com/apache/spark/commit/0e372caa1d3ea33ebdde98de3b4a1afbb4a5fc38).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29122: [SPARK-32320][PYSPARK] Remove mutable default arguments

2020-11-24 Thread GitBox


SparkQA commented on pull request #29122:
URL: https://github.com/apache/spark/pull/29122#issuecomment-733008784


   **[Test build #131658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131658/testReport)**
 for PR 29122 at commit 
[`0e372ca`](https://github.com/apache/spark/commit/0e372caa1d3ea33ebdde98de3b4a1afbb4a5fc38).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


HyukjinKwon commented on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733008523


   @tgravescs, @mridulm, @Ngone51, can you take a look when you guys find some 
time?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-732993082


   **[Test build #131665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131665/testReport)**
 for PR 30412 at commit 
[`b6d74c4`](https://github.com/apache/spark/commit/b6d74c481ecb651286685490b1beded99f0d50f9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


SparkQA commented on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-733007577


   **[Test build #131665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131665/testReport)**
 for PR 30412 at commit 
[`b6d74c4`](https://github.com/apache/spark/commit/b6d74c481ecb651286685490b1beded99f0d50f9).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request #30487: [WIP][SPARK-33535][BUILD] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


LuciferYang opened a new pull request #30487:
URL: https://github.com/apache/spark/pull/30487


   ### What changes were proposed in this pull request?
   It seems that Jenkins tests tasks in many pr have test failed. The failed 
cases include:
   
   -  
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
 get binary type
   - 
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V2
 get binary type
   - 
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V3
 get binary type
   - 
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V4
 get binary type
   - 
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V5
 get binary type
   
   The error message as follows:
   
   ```
   Error Messageorg.scalatest.exceptions.TestFailedException: "[?](" did not 
equal "[�]("Stacktracesbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
at 
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.$anonfun$new$26(SparkThriftServerProtocolVersionsSuite.scala:302)
   ```
   
   But they can pass the GitHub Action, maybe it's related to the `LANG` of the 
Jenkins build machine, this pr add `export LANG="en_US.UTF-8"` in 
`run-test-jenkins` script.
   
   ### Why are the changes needed?
   Ensure LANG in Jenkins test process is `en_US.UTF-8` to pass 
`HIVE_CLI_SERVICE_PROTOCOL_VX` related tests
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Jenkins tests pass
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30486:
URL: https://github.com/apache/spark/pull/30486#discussion_r529583595



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1568,21 +1612,44 @@ class SparkContext(config: SparkConf) extends Logging {
 
 val key = if (!isLocal && scheme == "file") {
   env.rpcEnv.fileServer.addFile(new File(uri.getPath))
+} else if (uri.getScheme == null) {
+  schemeCorrectedURI.toString
+} else if (isArchive) {
+  uri.toString

Review comment:
   For the same reason of keeping the fragment, it uses URI when it's 
archive.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30486:
URL: https://github.com/apache/spark/pull/30486#discussion_r529583152



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1550,7 +1594,7 @@ class SparkContext(config: SparkConf) extends Logging {
 
 val hadoopPath = new Path(schemeCorrectedURI)
 val scheme = schemeCorrectedURI.getScheme
-if (!Array("http", "https", "ftp").contains(scheme)) {
+if (!Array("http", "https", "ftp").contains(scheme) && !isArchive) {

Review comment:
   Archive is not supposed to be a directory.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30486:
URL: https://github.com/apache/spark/pull/30486#discussion_r529582202



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1537,8 +1575,14 @@ class SparkContext(config: SparkConf) extends Logging {
 addFile(path, recursive, false)
   }
 
-  private def addFile(path: String, recursive: Boolean, addedOnSubmit: 
Boolean): Unit = {
-val uri = new Path(path).toUri
+  private def addFile(
+  path: String, recursive: Boolean, addedOnSubmit: Boolean, isArchive: 
Boolean = false
+): Unit = {
+val uri = if (!isArchive) {
+  new Path(path).toUri
+} else {
+  Utils.resolveURI(path)

Review comment:
   Here we cannot rely on `new Path(path).toUri`. it makes the fragment 
(`#`) in URI as the part of path. `Utils.resolveURI` is used for 
`spark.yarn.dist.archives` as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


SparkQA commented on pull request #30486:
URL: https://github.com/apache/spark/pull/30486#issuecomment-733004565


   **[Test build #131667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131667/testReport)**
 for PR 30486 at commit 
[`469eacf`](https://github.com/apache/spark/commit/469eacfb25a6aa21118b8d89728b70ab22fc4dcb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #30486: [SPARK-33530][CORE] Support --archives and spark.archives option natively

2020-11-24 Thread GitBox


HyukjinKwon opened a new pull request #30486:
URL: https://github.com/apache/spark/pull/30486


   ### What changes were proposed in this pull request?
   
   TL;DR:
   - This PR completes the support of archives in Spark itself instead of 
Yarn-only
   -  After this PR, PySpark users can use Conda to ship Python packages 
together as below:
   ```python
   conda create -y -n pyspark_env -c conda-forge pyarrow==2.0.0 
pandas==1.1.4 conda-pack==0.5.0
   conda activate pyspark_env
   conda pack -f -o pyspark_env.tar.gz
   PYSPARK_DRIVER_PYTHON=python PYSPARK_PYTHON=./environment/bin/python 
pyspark --archives pyspark_env.tar.gz#environment
  ```
   
   
   This PR proposes to add Spark's native `--archives` in Spark submit, and 
`spark.archives` configuration. Currently, both are supported only in Yarn mode:
   
   ```bash
   ./bin/spark-submit --help
   ```
   
   ```
   Options:
   ...
Spark on YARN only:
 --queue QUEUE_NAME  The YARN queue to submit to (Default: 
"default").
 --archives ARCHIVES Comma separated list of archives to be 
extracted into the
 working directory of each executor.
   ```
   
   This `archives` feature is useful often when you have to ship a directory 
and unpack into executors. One example is native libraries to use e.g. JNI. 
Another example is to ship Python packages together by Conda environment.
   
   Especially for Conda, PySpark currently does not have a nice way to ship a 
package that works in general, please see also 
https://hyukjin-spark.readthedocs.io/en/stable/user_guide/python_packaging.html#using-zipped-virtual-environment
 (PySpark new documentation demo for 3.1.0).
   
   The neatest way is arguably to use Conda environment by shipping zipped 
Conda environment but this is currently dependent on this archive feature. NOTE 
that we are able to use `spark.files` by relying on its undocumented behaviour 
that untars `tar.gz` but I don't think we should document such ways and promote 
people to more rely on it.
   
   Also, note that this PR does not target to add the feature parity of 
`spark.files.overwrite`, `spark.files.useFetchCache`, etc. yet. I documented 
that this is an experimental feature as well.
   
   ### Why are the changes needed?
   
   To complete the feature parity, and to provide a better support of shipping 
Python libraries together with Conda env.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, this makes `--archives` works in Spark instead of Yarn-only, and adds a 
new configuration `spark.archives`.
   
   ### How was this patch tested?
   
   I added unittests. Also, manually tested in standalone cluster, 
local-cluster, and local modes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732967004







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-732999100







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-732999100







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-24 Thread GitBox


SparkQA commented on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-732998422


   **[Test build #131644 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131644/testReport)**
 for PR 28647 at commit 
[`c45489a`](https://github.com/apache/spark/commit/c45489ad5b8ddd53d5e81fbba4cd08c0b4fd9850).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class ExecutorSource(`
 * `  case class GetShufflePushMergerLocations(numMergersNeeded: Int, 
hostsToFilter: Set[String])`
 * `  case class RemoveShufflePushMergerLocation(host: String) extends 
ToBlockManagerMaster`
 * `case class UnresolvedTable(`
 * `class SubExprEvaluationRuntime(cacheMaxEntries: Int) `
 * `case class ExpressionProxy(`
 * `case class ResultProxy(result: Any)`
 * `case class CurrentTimeZone() extends LeafExpression with Unevaluable `
 * `abstract class LikeAllBase extends UnaryExpression with 
ImplicitCastInputTypes with NullIntolerant `
 * `case class LikeAll(child: Expression, patterns: Seq[UTF8String]) 
extends LikeAllBase `
 * `case class NotLikeAll(child: Expression, patterns: Seq[UTF8String]) 
extends LikeAllBase `
 * `case class ParseUrl(children: Seq[Expression], failOnError: Boolean = 
SQLConf.get.ansiEnabled)`
 * `  implicit class MetadataColumnsHelper(metadata: Array[MetadataColumn]) 
`
 * `trait PathFilterStrategy extends Serializable `
 * `trait StrategyBuilder `
 * `class PathGlobFilter(filePatten: String) extends PathFilterStrategy `
 * `abstract class ModifiedDateFilter extends PathFilterStrategy `
 * `class ModifiedBeforeFilter(thresholdTime: Long, val timeZoneId: String)`
 * `class ModifiedAfterFilter(thresholdTime: Long, val timeZoneId: String)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-732791895


   **[Test build #131644 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131644/testReport)**
 for PR 28647 at commit 
[`c45489a`](https://github.com/apache/spark/commit/c45489ad5b8ddd53d5e81fbba4cd08c0b4fd9850).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-732998296







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-732998296







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast

2020-11-24 Thread GitBox


SparkQA commented on pull request #30440:
URL: https://github.com/apache/spark/pull/30440#issuecomment-732993024


   **[Test build #131664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131664/testReport)**
 for PR 30440 at commit 
[`e762162`](https://github.com/apache/spark/commit/e762162311e04c20bb06f9a4735514547050b832).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refract current grouping analytics

2020-11-24 Thread GitBox


SparkQA commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-732993379


   **[Test build #131666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131666/testReport)**
 for PR 30212 at commit 
[`74a22f8`](https://github.com/apache/spark/commit/74a22f8efacc39fb3b10fa78a76fc887e44b5362).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


SparkQA commented on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-732993082


   **[Test build #131665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131665/testReport)**
 for PR 30412 at commit 
[`b6d74c4`](https://github.com/apache/spark/commit/b6d74c481ecb651286685490b1beded99f0d50f9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30485: [SPARK-33533][SQL] BasicConnectionProvider should consider case-sensitivity for properties.

2020-11-24 Thread GitBox


SparkQA commented on pull request #30485:
URL: https://github.com/apache/spark/pull/30485#issuecomment-732992899


   **[Test build #131663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131663/testReport)**
 for PR 30485 at commit 
[`247c7ba`](https://github.com/apache/spark/commit/247c7baf0abd5b77e58d234327d15d18e5d2e96d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-732991019







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30484:
URL: https://github.com/apache/spark/pull/30484#issuecomment-732991017







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30465:
URL: https://github.com/apache/spark/pull/30465#issuecomment-732991024







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732991026







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732991015







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-732991018







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30479: [WIP][SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30479:
URL: https://github.com/apache/spark/pull/30479#issuecomment-732991020







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30432: [SPARK-33494][SQL][AQE] Do not use local shuffle reader for repartition

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30432:
URL: https://github.com/apache/spark/pull/30432#issuecomment-732991025







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30470:
URL: https://github.com/apache/spark/pull/30470#issuecomment-732991022







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-732991012







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-732991012







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732991037







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-732991018







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30470:
URL: https://github.com/apache/spark/pull/30470#issuecomment-732991022







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732991026







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30432: [SPARK-33494][SQL][AQE] Do not use local shuffle reader for repartition

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30432:
URL: https://github.com/apache/spark/pull/30432#issuecomment-732991025







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30479: [WIP][SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30479:
URL: https://github.com/apache/spark/pull/30479#issuecomment-732991020







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30465:
URL: https://github.com/apache/spark/pull/30465#issuecomment-732991024







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-732991019







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30484:
URL: https://github.com/apache/spark/pull/30484#issuecomment-732991017







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-732846865


   **[Test build #131653 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131653/testReport)**
 for PR 30412 at commit 
[`f46e32f`](https://github.com/apache/spark/commit/f46e32fb1649023eed0ddab4cb23ca4a97b14a0f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


SparkQA commented on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-732988019


   **[Test build #131653 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131653/testReport)**
 for PR 30412 at commit 
[`f46e32f`](https://github.com/apache/spark/commit/f46e32fb1649023eed0ddab4cb23ca4a97b14a0f).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #30482: [SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs

2020-11-24 Thread GitBox


MaxGekk commented on a change in pull request #30482:
URL: https://github.com/apache/spark/pull/30482#discussion_r529558445



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala
##
@@ -243,4 +243,22 @@ class AlterTablePartitionV2SQLSuite extends 
DatasourceV2SQLBase {
   assert(!partTable.partitionExists(expectedPartition))
 }
   }
+
+  test("SPARK-33529: handle __HIVE_DEFAULT_PARTITION__") {
+val t = "testpart.ns1.ns2.tbl"
+withTable(t) {
+  sql(s"CREATE TABLE $t (part0 string) USING foo PARTITIONED BY (part0)")
+  val partTable = catalog("testpart")
+.asTableCatalog
+.loadTable(Identifier.of(Array("ns1", "ns2"), "tbl"))
+.asPartitionable
+  val expectedPartition = InternalRow.fromSeq(Seq[Any](null))
+  assert(!partTable.partitionExists(expectedPartition))
+  val partSpec = "PARTITION (part0 = '__HIVE_DEFAULT_PARTITION__')"

Review comment:
   > It's more like a hive specific thing and we should let v2 
implementation to decide ...
   
   It is already Spark specific thing too. Implementations don't see 
`'__HIVE_DEFAULT_PARTITION__'` at all because it is replaced by `null` at the 
analyzing phase. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak opened a new pull request #30485: [SPARK-33533][SQL] BasicConnectionProvider should consider case-sensitivity for properties.

2020-11-24 Thread GitBox


sarutak opened a new pull request #30485:
URL: https://github.com/apache/spark/pull/30485


   ### What changes were proposed in this pull request?
   
   This PR fixes an issue that `BasicConnectionProvider` doesn't consider 
case-sensitivity for properties.
   For example, the property `oracle.jdbc.mapDateToTimestamp` should be 
considered case-sensitivity but it is not considered.
   
   ### Why are the changes needed?
   
   This is a bug introduced by #29024 .
   Caused by this issue, `OracleIntegrationSuite` doesn't pass.
   
   ```
   [info] - SPARK-16625: General data types to be mapped to Oracle *** FAILED 
*** (32 seconds, 129 milliseconds)
   [info]   types.apply(9).equals(org.apache.spark.sql.types.DateType) was 
false (OracleIntegrationSuite.scala:238)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
   [info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
   [info]   at 
org.apache.spark.sql.jdbc.OracleIntegrationSuite.$anonfun$new$4(OracleIntegrationSuite.scala:238)
   [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
   [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
   [info]   at 
org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
   [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
   [info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61)
   [info]   at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
   [info]   at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:61)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
   [info]   at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
   [info]   at scala.collection.immutable.List.foreach(List.scala:392)
   [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
   [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
   [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
   [info]   at 
org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
   [info]   at org.scalatest.Suite.run(Suite.scala:1112)
   [info]   at org.scalatest.Suite.run$(Suite.scala:1094)
   [info]   at 
org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
   [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
   [info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:61)
   [info]   at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
   [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
   [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
   [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61)
   [info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
   [info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
   [info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
   [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [info]   at java.lang.T

[GitHub] [spark] SparkQA removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-732960610


   **[Test build #131661 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131661/testReport)**
 for PR 29893 at commit 
[`c43b964`](https://github.com/apache/spark/commit/c43b96404cbdfeb859784304f2174fc28f66b357).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30484:
URL: https://github.com/apache/spark/pull/30484#issuecomment-732960198


   **[Test build #131659 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131659/testReport)**
 for PR 30484 at commit 
[`83b85d4`](https://github.com/apache/spark/commit/83b85d4c1101a39e0b85b41d54862badf4247ad5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30484: [SPARK-33532][SQL] Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method

2020-11-24 Thread GitBox


SparkQA commented on pull request #30484:
URL: https://github.com/apache/spark/pull/30484#issuecomment-732982229


   **[Test build #131659 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131659/testReport)**
 for PR 30484 at commit 
[`83b85d4`](https://github.com/apache/spark/commit/83b85d4c1101a39e0b85b41d54862badf4247ad5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-24 Thread GitBox


SparkQA commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-732981931


   **[Test build #131661 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131661/testReport)**
 for PR 29893 at commit 
[`c43b964`](https://github.com/apache/spark/commit/c43b96404cbdfeb859784304f2174fc28f66b357).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732964747


   **[Test build #131662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131662/testReport)**
 for PR 30413 at commit 
[`ba6bd70`](https://github.com/apache/spark/commit/ba6bd707867f18ba1708dc30e4ce7dc2f1425055).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732981002


   **[Test build #131662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131662/testReport)**
 for PR 30413 at commit 
[`ba6bd70`](https://github.com/apache/spark/commit/ba6bd707867f18ba1708dc30e4ce7dc2f1425055).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30212: [SPARK-33308][SQL] Refract current grouping analytics

2020-11-24 Thread GitBox


AngersZh commented on a change in pull request #30212:
URL: https://github.com/apache/spark/pull/30212#discussion_r529553455



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -39,45 +39,22 @@ trait GroupingSet extends Expression with CodegenFallback {
   override def eval(input: InternalRow): Any = throw new 
UnsupportedOperationException
 }
 
-// scalastyle:off line.size.limit line.contains.tab
-@ExpressionDescription(
-  usage = """
-_FUNC_([col1[, col2 ..]]) - create a multi-dimensional cube using the 
specified columns
-  so that we can run aggregation on them.
-  """,
-  examples = """
-Examples:
-  > SELECT name, age, count(*) FROM VALUES (2, 'Alice'), (5, 'Bob') 
people(age, name) GROUP BY _FUNC_(name, age);
-Bob5   1
-Alice  2   1
-Alice  NULL1
-NULL   2   1
-NULL   NULL2
-BobNULL1
-NULL   5   1
-  """,
-  since = "2.0.0")
-// scalastyle:on line.size.limit line.contains.tab
-case class Cube(groupByExprs: Seq[Expression]) extends GroupingSet {}
+case class Cube(groupingSets: Seq[Seq[Expression]]) extends GroupingSet {
+  override def groupByExprs: Seq[Expression] =
+groupingSets.flatMap(_.distinct).distinct
+}
 
-// scalastyle:off line.size.limit line.contains.tab
-@ExpressionDescription(
-  usage = """
-_FUNC_([col1[, col2 ..]]) - create a multi-dimensional rollup using the 
specified columns
-  so that we can run aggregation on them.
-  """,
-  examples = """
-Examples:
-  > SELECT name, age, count(*) FROM VALUES (2, 'Alice'), (5, 'Bob') 
people(age, name) GROUP BY _FUNC_(name, age);
-Bob5   1
-Alice  2   1
-Alice  NULL1
-NULL   NULL2
-BobNULL1
-  """,
-  since = "2.0.0")
-// scalastyle:on line.size.limit line.contains.tab
-case class Rollup(groupByExprs: Seq[Expression]) extends GroupingSet {}
+case class Rollup(groupingSets: Seq[Seq[Expression]]) extends GroupingSet {
+  override def groupByExprs: Seq[Expression] =
+groupingSets.flatMap(_.distinct).distinct
+}
+
+case class GroupingSetsV2(

Review comment:
   > Why does `GroupingSets` and `GroupingSetsV2` need to co-exist?
   
   Updated, removed now





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast

2020-11-24 Thread GitBox


gengliangwang commented on pull request #30440:
URL: https://github.com/apache/spark/pull/30440#issuecomment-732974090


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30465:
URL: https://github.com/apache/spark/pull/30465#issuecomment-732846567


   **[Test build #131651 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131651/testReport)**
 for PR 30465 at commit 
[`985352e`](https://github.com/apache/spark/commit/985352ef5d8bc878c5ff07a1a24576a1ac77dfed).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.

2020-11-24 Thread GitBox


SparkQA commented on pull request #30465:
URL: https://github.com/apache/spark/pull/30465#issuecomment-732973727


   **[Test build #131651 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131651/testReport)**
 for PR 30465 at commit 
[`985352e`](https://github.com/apache/spark/commit/985352ef5d8bc878c5ff07a1a24576a1ac77dfed).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30470:
URL: https://github.com/apache/spark/pull/30470#issuecomment-732827379


   **[Test build #131648 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131648/testReport)**
 for PR 30470 at commit 
[`bc3cb8b`](https://github.com/apache/spark/commit/bc3cb8b419bb985cdf98aaf172b20c900d40e806).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency

2020-11-24 Thread GitBox


SparkQA commented on pull request #30470:
URL: https://github.com/apache/spark/pull/30470#issuecomment-732972419


   **[Test build #131648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131648/testReport)**
 for PR 30470 at commit 
[`bc3cb8b`](https://github.com/apache/spark/commit/bc3cb8b419bb985cdf98aaf172b20c900d40e806).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    3   4   5   6   7   8   9   10   11   >