[GitHub] [spark] c21 opened a new pull request #32671: [SPARK-35529][SQL] Add fallback metrics for hash aggregate

2021-05-26 Thread GitBox


c21 opened a new pull request #32671:
URL: https://github.com/apache/spark/pull/32671


   
   
   ### What changes were proposed in this pull request?
   
   Add the metrics to record how many tasks fallback to sort-based aggregation 
for hash aggregation. This will help developers and users to debug and optimize 
query. Object hash aggregation has similar metrics already.
   
   ### Why are the changes needed?
   
   Help developers and users to debug and optimize query with hash aggregation.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the added metrics will show up in Spark web UI.
   
   ### How was this patch tested?
   
   Changed unit test in `SQLMetricsSuite.scala`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


SparkQA commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848522133


   **[Test build #138964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138964/testReport)**
 for PR 32667 at commit 
[`b6241fa`](https://github.com/apache/spark/commit/b6241fab1ea38cab37d6f8503230d33aba0514a9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32669: [SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is deprecated` compilation warning in Scala 2.13

2021-05-26 Thread GitBox


SparkQA commented on pull request #32669:
URL: https://github.com/apache/spark/pull/32669#issuecomment-848524083


   **[Test build #138956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138956/testReport)**
 for PR 32669 at commit 
[`af7fb57`](https://github.com/apache/spark/commit/af7fb57a5b4b37a051d3a2b91408bf9149bd2997).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32550: [SPARK-35282][SQL] Support AQE side shuffled hash join formula using rule

2021-05-26 Thread GitBox


SparkQA commented on pull request #32550:
URL: https://github.com/apache/spark/pull/32550#issuecomment-848526532


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43482/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32660: [SPARK-35509][DOCS] Move text data source options from Python and Scala into a single page

2021-05-26 Thread GitBox


SparkQA commented on pull request #32660:
URL: https://github.com/apache/spark/pull/32660#issuecomment-848528274


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43484/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32669: [SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is deprecated` compilation warning in Scala 2.13

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32669:
URL: https://github.com/apache/spark/pull/32669#issuecomment-84825


   **[Test build #138956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138956/testReport)**
 for PR 32669 at commit 
[`af7fb57`](https://github.com/apache/spark/commit/af7fb57a5b4b37a051d3a2b91408bf9149bd2997).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848502590


   **[Test build #138964 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138964/testReport)**
 for PR 32667 at commit 
[`b6241fa`](https://github.com/apache/spark/commit/b6241fab1ea38cab37d6f8503230d33aba0514a9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #32671: [SPARK-35529][SQL] Add fallback metrics for hash aggregate

2021-05-26 Thread GitBox


c21 commented on pull request #32671:
URL: https://github.com/apache/spark/pull/32671#issuecomment-848530200


   @cloud-fan could you help take a look when you have time? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-26 Thread GitBox


SparkQA commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-848529901


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43485/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


gengliangwang opened a new pull request #32672:
URL: https://github.com/apache/spark/pull/32672


   
   
   ### What changes were proposed in this pull request?
   
   Advise the comment of method TaskMemoryManager.acquireExecutionMemory.
   
   ### Why are the changes needed?
   
   After https://github.com/apache/spark/pull/32625, the returned value should 
be `>= N`. The comment needs to be updated.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   ### How was this patch tested?
   
   Just doc change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


gengliangwang commented on pull request #32672:
URL: https://github.com/apache/spark/pull/32672#issuecomment-848530123


   cc @Ngone51 @ankurdave 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848531143


   **[Test build #138968 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138968/testReport)**
 for PR 32513 at commit 
[`b60e04e`](https://github.com/apache/spark/commit/b60e04e02c2a343f07825f2b8817f308e8112eec).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848502832


   **[Test build #138968 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138968/testReport)**
 for PR 32513 at commit 
[`b60e04e`](https://github.com/apache/spark/commit/b60e04e02c2a343f07825f2b8817f308e8112eec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32671: [SPARK-35529][SQL] Add fallback metrics for hash aggregate

2021-05-26 Thread GitBox


cloud-fan commented on a change in pull request #32671:
URL: https://github.com/apache/spark/pull/32671#discussion_r639463494



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##
@@ -67,7 +67,9 @@ case class HashAggregateExec(
 "spillSize" -> SQLMetrics.createSizeMetric(sparkContext, "spill size"),
 "aggTime" -> SQLMetrics.createTimingMetric(sparkContext, "time in 
aggregation build"),
 "avgHashProbe" ->
-  SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list 
iters"))
+  SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list 
iters"),
+"numTasksFallBacked" -> SQLMetrics.createMetric(sparkContext,
+  "number of tasks fall-backed to sort-based aggregation"))

Review comment:
   is this name the same with object hash agg? it is super long..
   
   probably "number of sort fallback tasks"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak opened a new pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


sarutak opened a new pull request #32673:
URL: https://github.com/apache/spark/pull/32673


   ### What changes were proposed in this pull request?
   
   This PR fixes an test failure of `DifferentiableLossAggregatorSuite` with 
Java 11.
   
   ### Why are the changes needed?
   
   I'm personally checking whether all the tests pass with Java 11 for the 
current master and I found DifferentiableLossAggregatorSuite fails.
   
https://github.com/sarutak/spark/runs/2661859541?check_suite_focus=true#step:9:13895
   
   The reason seems that the implementation of Blas.daxpy is different between 
for Java 8 and Java 11. For Java 11, Math.fma is used.
   
   
https://github.com/luhenry/netlib/blob/v2.2.0/blas/src/main/java/dev/ludovic/netlib/blas/Java8BLAS.java#L92
   
https://github.com/luhenry/netlib/blob/0053ea30b11686336cbdb8c7fceb41d59d268fa2/blas/src/main/java/dev/ludovic/netlib/blas/Java11BLAS.java#L40
   
   To remove the rounding error, this PR changes `TestAggregator.add` with fma.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   I confirmed `DifferentiableLossAggregatorSuite` passes with both Java 8 and 
Java 11.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32550: [SPARK-35282][SQL] Support AQE side shuffled hash join formula using rule

2021-05-26 Thread GitBox


SparkQA commented on pull request #32550:
URL: https://github.com/apache/spark/pull/32550#issuecomment-848531945


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32671: [SPARK-35529][SQL] Add fallback metrics for hash aggregate

2021-05-26 Thread GitBox


c21 commented on a change in pull request #32671:
URL: https://github.com/apache/spark/pull/32671#discussion_r639465467



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##
@@ -67,7 +67,9 @@ case class HashAggregateExec(
 "spillSize" -> SQLMetrics.createSizeMetric(sparkContext, "spill size"),
 "aggTime" -> SQLMetrics.createTimingMetric(sparkContext, "time in 
aggregation build"),
 "avgHashProbe" ->
-  SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list 
iters"))
+  SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list 
iters"),
+"numTasksFallBacked" -> SQLMetrics.createMetric(sparkContext,
+  "number of tasks fall-backed to sort-based aggregation"))

Review comment:
   Yes, same as in https://github.com/apache/spark/pull/31340. Let me 
change them together.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


viirya commented on a change in pull request #32667:
URL: https://github.com/apache/spark/pull/32667#discussion_r639465528



##
File path: python/pyspark/rdd.py
##
@@ -2067,7 +2067,7 @@ def add_shuffle_key(split, iterator):
 avg = int(size / n) >> 20
 # let 1M < avg < 10M
 if avg < 1:
-batch *= 1.5

Review comment:
   Hm..I thought increasing `batch` is for `c > batch`. In other words, it 
increases the size of batch if it reaches the current batch size, but used 
memory is still under `limit` (and the average size of bucket is small).
   
   If it reaches memory limit before reaching the batch size, it seems not make 
sense to increase batch size (even the average size of bucket is small).
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


viirya commented on a change in pull request #32667:
URL: https://github.com/apache/spark/pull/32667#discussion_r639465528



##
File path: python/pyspark/rdd.py
##
@@ -2067,7 +2067,7 @@ def add_shuffle_key(split, iterator):
 avg = int(size / n) >> 20
 # let 1M < avg < 10M
 if avg < 1:
-batch *= 1.5

Review comment:
   Hm..I thought increasing `batch` is for `c > batch`. In other words, it 
increases the size of batch if it reaches the current batch size, but used 
memory is still under `limit` (and the average size of bucket is small).
   
   If it reaches memory limit before reaching the batch size (so it means 
current batch size is more than memory limit), it seems not make sense to 
increase batch size (even the average size of bucket is small).
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32550: [SPARK-35282][SQL] Support AQE side shuffled hash join formula using rule

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32550:
URL: https://github.com/apache/spark/pull/32550#issuecomment-848536560






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32669: [SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is deprecated` compilation warning in Scala 2.13

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32669:
URL: https://github.com/apache/spark/pull/32669#issuecomment-848536566


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138956/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848536559






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-848536565


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43485/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848536564






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-848536565


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43485/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


SparkQA commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848536630


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43483/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32587: [SPARK-35440][SQL] Add function source to `ExpressionInfo` for UDF

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32587:
URL: https://github.com/apache/spark/pull/32587#issuecomment-848536556


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848536561


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138968/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32669: [SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is deprecated` compilation warning in Scala 2.13

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32669:
URL: https://github.com/apache/spark/pull/32669#issuecomment-848536566


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138956/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32587: [SPARK-35440][SQL] Add function source to `ExpressionInfo` for UDF

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32587:
URL: https://github.com/apache/spark/pull/32587#issuecomment-848536556


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32550: [SPARK-35282][SQL] Support AQE side shuffled hash join formula using rule

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32550:
URL: https://github.com/apache/spark/pull/32550#issuecomment-848536557






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848536561


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138968/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


SparkQA commented on pull request #32673:
URL: https://github.com/apache/spark/pull/32673#issuecomment-848537981


   **[Test build #138970 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138970/testReport)**
 for PR 32673 at commit 
[`89cd2b9`](https://github.com/apache/spark/commit/89cd2b9da60da77da218152c69e5c1f665dac217).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


SparkQA commented on pull request #32672:
URL: https://github.com/apache/spark/pull/32672#issuecomment-848538035


   **[Test build #138971 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138971/testReport)**
 for PR 32672 at commit 
[`95d1b19`](https://github.com/apache/spark/commit/95d1b1968dafd2740e74817a3989e7cddaa8e03a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32671: [SPARK-35529][SQL] Add fallback metrics for hash aggregate

2021-05-26 Thread GitBox


c21 commented on a change in pull request #32671:
URL: https://github.com/apache/spark/pull/32671#discussion_r639470412



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##
@@ -67,7 +67,9 @@ case class HashAggregateExec(
 "spillSize" -> SQLMetrics.createSizeMetric(sparkContext, "spill size"),
 "aggTime" -> SQLMetrics.createTimingMetric(sparkContext, "time in 
aggregation build"),
 "avgHashProbe" ->
-  SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list 
iters"))
+  SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list 
iters"),
+"numTasksFallBacked" -> SQLMetrics.createMetric(sparkContext,
+  "number of tasks fall-backed to sort-based aggregation"))

Review comment:
   Updated. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-26 Thread GitBox


SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-848538161


   **[Test build #138974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138974/testReport)**
 for PR 32658 at commit 
[`0e75d8d`](https://github.com/apache/spark/commit/0e75d8d263b179cf4f49ee873df0cb8e68e4110a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32671: [SPARK-35529][SQL] Add fallback metrics for hash aggregate

2021-05-26 Thread GitBox


SparkQA commented on pull request #32671:
URL: https://github.com/apache/spark/pull/32671#issuecomment-848538004


   **[Test build #138972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138972/testReport)**
 for PR 32671 at commit 
[`638cbf5`](https://github.com/apache/spark/commit/638cbf5b415d7c7a796af70d29eaf2d664ee469a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


SparkQA commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848538090


   **[Test build #138973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138973/testReport)**
 for PR 32667 at commit 
[`67e6d71`](https://github.com/apache/spark/commit/67e6d719db88b480be527bcd15af7e021bcff141).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic opened a new pull request #32674: [SPARK-35453] Move Koalas accessor to pandas_on_spark accessor

2021-05-26 Thread GitBox


itholic opened a new pull request #32674:
URL: https://github.com/apache/spark/pull/32674


   ### What changes were proposed in this pull request?
   
   This PR proposes renaming the existing "Koalas Accessor" to "Pandas API on 
Spark Accessor".
   
   ### Why are the changes needed?
   
   Because we don't need use name "Koalas" anymore.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the usage of pandas API on Spark accessor is changed from 
`df.koalas.[...]`. to `df.pandas_on_spark.[...]`
   
   ### How was this patch tested?
   
   Manually tested in local and checked one by one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32674: [SPARK-35453] Move Koalas accessor to pandas_on_spark accessor

2021-05-26 Thread GitBox


SparkQA commented on pull request #32674:
URL: https://github.com/apache/spark/pull/32674#issuecomment-848540493


   **[Test build #138975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138975/testReport)**
 for PR 32674 at commit 
[`2bd4f89`](https://github.com/apache/spark/commit/2bd4f8926389e081956eb3b93e61c43b9132d546).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rluta commented on pull request #32640: [SPARK-35494][SQL][2.4] Timestamp cast performance issue

2021-05-26 Thread GitBox


rluta commented on pull request #32640:
URL: https://github.com/apache/spark/pull/32640#issuecomment-848540249


   I'm fine either way. I created this issue/PR mainly so that other current 
users and fork maintainers of the 2.4 branch are aware of the performance issue.
   
   I will create another ticket to fix the milder performance issue in the 
master branch and contribute the benchmark class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32671: [SPARK-35529][SQL] Add fallback metrics for hash aggregate

2021-05-26 Thread GitBox


SparkQA commented on pull request #32671:
URL: https://github.com/apache/spark/pull/32671#issuecomment-848540598


   **[Test build #138976 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138976/testReport)**
 for PR 32671 at commit 
[`0456d2d`](https://github.com/apache/spark/commit/0456d2dcc6c406c9bbb47fe3e48e4aa9ad10).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


Ngone51 commented on a change in pull request #32672:
URL: https://github.com/apache/spark/pull/32672#discussion_r639475608



##
File path: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
##
@@ -133,7 +133,7 @@ public TaskMemoryManager(MemoryManager memoryManager, long 
taskAttemptId) {
* Acquire N bytes of memory for a consumer. If there is no enough memory, 
it will call
* spill() of consumers to release more memory.
*
-   * @return number of bytes successfully granted (<= N).
+   * @return number of bytes successfully granted (>= N).

Review comment:
   
https://github.com/apache/spark/blob/af1dba7ca501fd9372b158793119163e3fcd1f24/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L219-L220
   ^^^ It's not true if we stop the spill when there's nothing to spill, right?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848548041


   **[Test build #138977 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138977/testReport)**
 for PR 32513 at commit 
[`94ba930`](https://github.com/apache/spark/commit/94ba9307452c7fd36e27e1dd7c60a9d98ee8473b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


gengliangwang commented on a change in pull request #32672:
URL: https://github.com/apache/spark/pull/32672#discussion_r639481159



##
File path: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
##
@@ -133,7 +133,7 @@ public TaskMemoryManager(MemoryManager memoryManager, long 
taskAttemptId) {
* Acquire N bytes of memory for a consumer. If there is no enough memory, 
it will call
* spill() of consumers to release more memory.
*
-   * @return number of bytes successfully granted (<= N).
+   * @return number of bytes successfully granted (>= N).

Review comment:
   You are right. Let's just remove the comparison with N here...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32660: [SPARK-35509][DOCS] Move text data source options from Python and Scala into a single page

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32660:
URL: https://github.com/apache/spark/pull/32660#issuecomment-848548911


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43484/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32660: [SPARK-35509][DOCS] Move text data source options from Python and Scala into a single page

2021-05-26 Thread GitBox


SparkQA commented on pull request #32660:
URL: https://github.com/apache/spark/pull/32660#issuecomment-848548880


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43484/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] opensky142857 opened a new pull request #32675: [SPARK-35531]Can not insert into hive bucket table if create table wi…

2021-05-26 Thread GitBox


opensky142857 opened a new pull request #32675:
URL: https://github.com/apache/spark/pull/32675


   …th upper case schema
   
   
   
   ### What changes were proposed in this pull request?
   
   when convert to HiveTable, respect table schema cases.
   
   ### Why are the changes needed?
   
   When user create a hive bucket table with upper case schema, the table 
schema will be stored as lower cases while bucket column info will stay the 
same with user input.
   
   if we try to insert into this table, an HiveException reports bucket column 
is not in table schema. 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32550: [SPARK-35282][SQL] Support AQE side shuffled hash join formula using rule

2021-05-26 Thread GitBox


SparkQA commented on pull request #32550:
URL: https://github.com/apache/spark/pull/32550#issuecomment-848552877


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #32636: [SPARK-35490][BUILD] Update json4s to 3.7.0-M11

2021-05-26 Thread GitBox


vinodkc commented on pull request #32636:
URL: https://github.com/apache/spark/pull/32636#issuecomment-848554606


   > > Multiple defect fixes and improvements
   > 
   > It would be nice to see some summary of the fixes/improvements. Maybe they 
are not relevant to Spark.
   
   I've updated the summary of the fixes/improvements, thanks for the review 
comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


gengliangwang closed pull request #32672:
URL: https://github.com/apache/spark/pull/32672


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


gengliangwang commented on pull request #32672:
URL: https://github.com/apache/spark/pull/32672#issuecomment-848556141


   Oh, I misunderstood the code. Close this one now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


SparkQA commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848556318


   **[Test build #138973 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138973/testReport)**
 for PR 32667 at commit 
[`67e6d71`](https://github.com/apache/spark/commit/67e6d719db88b480be527bcd15af7e021bcff141).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32674: [SPARK-35453] Move Koalas accessor to pandas_on_spark accessor

2021-05-26 Thread GitBox


SparkQA commented on pull request #32674:
URL: https://github.com/apache/spark/pull/32674#issuecomment-848560922


   **[Test build #138975 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138975/testReport)**
 for PR 32674 at commit 
[`2bd4f89`](https://github.com/apache/spark/commit/2bd4f8926389e081956eb3b93e61c43b9132d546).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


SparkQA commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848560799


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43483/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #32636: [SPARK-35490][BUILD] Update json4s to 3.7.0-M11

2021-05-26 Thread GitBox


MaxGekk commented on pull request #32636:
URL: https://github.com/apache/spark/pull/32636#issuecomment-848562375


   +1, LGTM. Merging to master.
   Thank you @vinodkc, and @srowen @dongjoon-hyun for your review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request #32676: [SPARK-35532][TESTS] Ensure mllib and kafka-0-10 module can be maven test independently in Scala 2.13

2021-05-26 Thread GitBox


LuciferYang opened a new pull request #32676:
URL: https://github.com/apache/spark/pull/32676


   ### What changes were proposed in this pull request?
   Before this pr, when we execute maven test command to test `mllib` and 
`kafka-0-10` module independently, there are some Java UTs failed, the key 
error messages are as follows:
   
   ```
   java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
   ```
   
   and
   
   ```
   java.lang.NoClassDefFoundError: scala/collection/parallel/immutable/ParVector
   ```
   
   The UTs need `scala-parallel-collections_2.13`,  but it not in classpath 
when we run `mvn test -pl mllib -Pscala-2.13` and `mvn test -pl 
external/kafka-0-10 -Pscala-2.13`.
   
   So the main change of this pr is add `scala-2.13` profile to `mllib/pom.xml` 
and `external/kafka-0-10/pom.xml`, the `scala-2.13` profile include dependency 
on `scala-parallel-collections_2.13`, then these two modules can maven test 
independently.
   
   
   ### Why are the changes needed?
   Ensure mllib and kafka-0-10 module can be maven test independently in Scala 
2.13
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   - Pass the GitHub Action Scala 2.13 job
   - Manual test:
   
   1. Execute
   ```
   dev/change-scala-version.sh 2.13
   mvn clean install -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos 
-Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes 
-Phive -Pscala-2.13
   ```
   
   2. Execute
   
   ```
   mvn test -pl mllib -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
-Pscala-2.13
   ```
   
   **Before**
   
   6 Java UTs failed:
   
   ```
   [ERROR] Errors: 
   [ERROR]   JavaStreamingLogisticRegressionSuite.javaAPI:78 » TestFailed 20005 
was not les...
   [ERROR]   JavaStreamingKMeansSuite.javaAPI:78 » TestFailed 20040 was not 
less than 2...
   [ERROR]   JavaPrefixSpanSuite.runPrefixSpan:45 » NoClassDefFound 
scala/collection/parall...
   [ERROR]   JavaPrefixSpanSuite.runPrefixSpanSaveLoad:67 » NoClassDefFound 
scala/collectio...
   [ERROR]   JavaStreamingLinearRegressionSuite.javaAPI:77 » TestFailed 20014 
was not less ...
   [ERROR]   JavaStatisticsSuite.streamingTest:112 » TestFailed 20043 was not 
less than 200...
   [INFO] 
   [ERROR] Tests run: 122, Failures: 0, Errors: 6, Skipped: 0
   ```
   
   **After**
   
   ```
   [INFO] Tests run: 122, Failures: 0, Errors: 0, Skipped: 0
   ```
   
   3. 2. Execute
   
   ```
   mvn test -pl external/kafka-0-10 -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
-Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
-Pkubernetes -Phive -Pscala-2.13
   ```
   
   **Before**
   
   2 Java UTs failed:
   
   ```
   [ERROR] Errors: 
   [ERROR] 
org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.testKafkaStream
   [ERROR]   Run 1: JavaDirectKafkaStreamSuite.testKafkaStream:170 
expected:<[topic1-1, topic1-2, topic2-1, topic1-3, topic2-2, topic2-3]> but 
was:<[]>
   [ERROR]   Run 2: JavaDirectKafkaStreamSuite.tearDown:57 » NoClassDefFound 
scala/collection/para...
   [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0
   ```
   
   **After**
   
   ```
   [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #32636: [SPARK-35490][BUILD] Update json4s to 3.7.0-M11

2021-05-26 Thread GitBox


MaxGekk closed pull request #32636:
URL: https://github.com/apache/spark/pull/32636


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32660: [SPARK-35509][DOCS] Move text data source options from Python and Scala into a single page

2021-05-26 Thread GitBox


HyukjinKwon commented on pull request #32660:
URL: https://github.com/apache/spark/pull/32660#issuecomment-848564451


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32660: [SPARK-35509][DOCS] Move text data source options from Python and Scala into a single page

2021-05-26 Thread GitBox


HyukjinKwon closed pull request #32660:
URL: https://github.com/apache/spark/pull/32660


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848538090


   **[Test build #138973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138973/testReport)**
 for PR 32667 at commit 
[`67e6d71`](https://github.com/apache/spark/commit/67e6d719db88b480be527bcd15af7e021bcff141).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32674:
URL: https://github.com/apache/spark/pull/32674#issuecomment-848540493


   **[Test build #138975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138975/testReport)**
 for PR 32674 at commit 
[`2bd4f89`](https://github.com/apache/spark/commit/2bd4f8926389e081956eb3b93e61c43b9132d546).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


SparkQA commented on pull request #32670:
URL: https://github.com/apache/spark/pull/32670#issuecomment-848570031


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43488/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


HyukjinKwon commented on pull request #32670:
URL: https://github.com/apache/spark/pull/32670#issuecomment-848570391


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


HyukjinKwon closed pull request #32670:
URL: https://github.com/apache/spark/pull/32670


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848571597


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43487/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


HyukjinKwon commented on a change in pull request #32667:
URL: https://github.com/apache/spark/pull/32667#discussion_r639506917



##
File path: python/pyspark/rdd.py
##
@@ -2067,7 +2067,7 @@ def add_shuffle_key(split, iterator):
 avg = int(size / n) >> 20
 # let 1M < avg < 10M
 if avg < 1:
-batch *= 1.5
+batch = min(sys.maxsize, batch * 1.5)

Review comment:
   okay, `sys.maxsize` batch size already doesn't make much sense anyway. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32652: [SPARK-35501][SQL][TESTS] Add a feature for removing pulled container image for docker integration tests

2021-05-26 Thread GitBox


HyukjinKwon closed pull request #32652:
URL: https://github.com/apache/spark/pull/32652


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32652: [SPARK-35501][SQL][TESTS] Add a feature for removing pulled container image for docker integration tests

2021-05-26 Thread GitBox


HyukjinKwon commented on pull request #32652:
URL: https://github.com/apache/spark/pull/32652#issuecomment-848573149


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


SparkQA commented on pull request #32673:
URL: https://github.com/apache/spark/pull/32673#issuecomment-848573508


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43490/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-26 Thread GitBox


SparkQA commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-848575069


   **[Test build #138966 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138966/testReport)**
 for PR 32563 at commit 
[`44265e7`](https://github.com/apache/spark/commit/44265e766eefcee8906d38050114407f69fc3545).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-848502701


   **[Test build #138966 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138966/testReport)**
 for PR 32563 at commit 
[`44265e7`](https://github.com/apache/spark/commit/44265e766eefcee8906d38050114407f69fc3545).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-848576193


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138966/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32550: [SPARK-35282][SQL] Support AQE side shuffled hash join formula using rule

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32550:
URL: https://github.com/apache/spark/pull/32550#issuecomment-848576196


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848576198






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32674:
URL: https://github.com/apache/spark/pull/32674#issuecomment-848576195


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138975/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32674:
URL: https://github.com/apache/spark/pull/32674#issuecomment-848576195


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138975/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32550: [SPARK-35282][SQL] Support AQE side shuffled hash join formula using rule

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32550:
URL: https://github.com/apache/spark/pull/32550#issuecomment-848576196


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848576197






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-848576193


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138966/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32670:
URL: https://github.com/apache/spark/pull/32670#issuecomment-848576201


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43488/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32670:
URL: https://github.com/apache/spark/pull/32670#issuecomment-848576201


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43488/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32676: [SPARK-35532][TESTS] Ensure mllib and kafka-0-10 module can be maven test independently in Scala 2.13

2021-05-26 Thread GitBox


SparkQA commented on pull request #32676:
URL: https://github.com/apache/spark/pull/32676#issuecomment-848577664


   **[Test build #138978 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138978/testReport)**
 for PR 32676 at commit 
[`5823adf`](https://github.com/apache/spark/commit/5823adf386c52bfe5b16fddd8c9d2392d243b0b5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32675:
URL: https://github.com/apache/spark/pull/32675#issuecomment-848577388


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848577848


   **[Test build #138979 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138979/testReport)**
 for PR 32513 at commit 
[`0ed9485`](https://github.com/apache/spark/commit/0ed94859e025635863805ef660d159b594b910cc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-26 Thread GitBox


SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-848579368


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43489/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


SparkQA commented on pull request #32673:
URL: https://github.com/apache/spark/pull/32673#issuecomment-848581512


   **[Test build #138970 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138970/testReport)**
 for PR 32673 at commit 
[`89cd2b9`](https://github.com/apache/spark/commit/89cd2b9da60da77da218152c69e5c1f665dac217).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32673:
URL: https://github.com/apache/spark/pull/32673#issuecomment-848582028


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32673:
URL: https://github.com/apache/spark/pull/32673#issuecomment-848537981


   **[Test build #138970 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138970/testReport)**
 for PR 32673 at commit 
[`89cd2b9`](https://github.com/apache/spark/commit/89cd2b9da60da77da218152c69e5c1f665dac217).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


AmplabJenkins removed a comment on pull request #32673:
URL: https://github.com/apache/spark/pull/32673#issuecomment-848582028


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32672: [SPARK-35486][FOLLOWUP] Advise the comment of method TaskMemoryManager.acquireExecutionMemory

2021-05-26 Thread GitBox


SparkQA commented on pull request #32672:
URL: https://github.com/apache/spark/pull/32672#issuecomment-848587045


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43491/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


SparkQA commented on pull request #32670:
URL: https://github.com/apache/spark/pull/32670#issuecomment-848589208


   **[Test build #138969 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138969/testReport)**
 for PR 32670 at commit 
[`a2465a1`](https://github.com/apache/spark/commit/a2465a17f782908eb944eed97d41e5e7892fe7bf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


SparkQA removed a comment on pull request #32670:
URL: https://github.com/apache/spark/pull/32670#issuecomment-848504910


   **[Test build #138969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138969/testReport)**
 for PR 32670 at commit 
[`a2465a1`](https://github.com/apache/spark/commit/a2465a17f782908eb944eed97d41e5e7892fe7bf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32670: [SPARK-35527][SQL][TESTS] Fix HiveExternalCatalogVersionsSuite to pass with Java 11

2021-05-26 Thread GitBox


AmplabJenkins commented on pull request #32670:
URL: https://github.com/apache/spark/pull/32670#issuecomment-848590552


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138969/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-05-26 Thread GitBox


Ngone51 commented on a change in pull request #32114:
URL: https://github.com/apache/spark/pull/32114#discussion_r639529982



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -422,7 +430,7 @@ class BlockManagerMasterEndpoint(
 val locations = blockLocations.get(blockId)
 if (locations != null) {
   locations.foreach { blockManagerId: BlockManagerId =>
-val blockManager = blockManagerInfo.get(blockManagerId)
+val blockManager = 
blockManagerInfo.get(blockManagerId).filter(_.isAlive)

Review comment:
   I think the helper methods don't solve the problem thoroughly as you 
still have to replace all the usages where `isActive` exists now.
   
   I'm personally ok with pass the inactive BlockManagerInfos to 
`BlockManagerMasterHeartbeatEndpoint`.
   
   @mridulm what's your opinion?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-05-26 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-848595505


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43487/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11

2021-05-26 Thread GitBox


SparkQA commented on pull request #32673:
URL: https://github.com/apache/spark/pull/32673#issuecomment-848598821


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43490/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] nolanliou edited a comment on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


nolanliou edited a comment on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848601975


   > @nolanliou did you face something like this: [#32400 
(comment)](https://github.com/apache/spark/pull/32400#issuecomment-831051189)?
   
   All tests passed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] nolanliou commented on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


nolanliou commented on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848601975


   > @nolanliou did you face something like this: [#32400 
(comment)](https://github.com/apache/spark/pull/32400#issuecomment-831051189)?
   
   All test passed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] nolanliou edited a comment on pull request #32667: [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function

2021-05-26 Thread GitBox


nolanliou edited a comment on pull request #32667:
URL: https://github.com/apache/spark/pull/32667#issuecomment-848601975


   > @nolanliou did you face something like this: [#32400 
(comment)](https://github.com/apache/spark/pull/32400#issuecomment-831051189)?
   
   All test passed ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >