[GitHub] [spark] SparkQA removed a comment on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


SparkQA removed a comment on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781823524


   **[Test build #135258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135258/testReport)**
 for PR 31348 at commit 
[`7e55532`](https://github.com/apache/spark/commit/7e55532799f98b55ec6bae47fd2bd830d5be4b78).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


SparkQA commented on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781894513


   **[Test build #135258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135258/testReport)**
 for PR 31348 at commit 
[`7e55532`](https://github.com/apache/spark/commit/7e55532799f98b55ec6bae47fd2bd830d5be4b78).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


SparkQA commented on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781893314


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39840/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-18 Thread GitBox


Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-781891397


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-18 Thread GitBox


Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-781891645


   cc @tgravescs @mridulm 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-18 Thread GitBox


Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r578979477



##
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
* because it can push the sorting down into the shuffle machinery.
*/
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, 
V)] = self.withScope {
-new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+if (self.partitioner == Some(partitioner)) {
+  self.mapPartitions(iter => {
+val context = TaskContext.get
+val sorter = new ExternalSorter[K, V, V](context, None, None, 
Some(ordering))
+sorter.insertAll(iter)
+context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)

Review comment:
   SGTM





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31548: [SPARK-34127][SQL] Support table valued command

2021-02-18 Thread GitBox


SparkQA commented on pull request #31548:
URL: https://github.com/apache/spark/pull/31548#issuecomment-781887505


   **[Test build #135262 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135262/testReport)**
 for PR 31548 at commit 
[`280f39c`](https://github.com/apache/spark/commit/280f39c09d1091da88625ff827b4e828a7f72c01).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31588: [SPARK-34470][ML] VectorSlicer use ordering if possible

2021-02-18 Thread GitBox


SparkQA commented on pull request #31588:
URL: https://github.com/apache/spark/pull/31588#issuecomment-781887521


   **[Test build #135261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135261/testReport)**
 for PR 31588 at commit 
[`3a9d08f`](https://github.com/apache/spark/commit/3a9d08f0768f3989b840dc11288acfd28fbfeca6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31549: [SPARK-34314][SQL] Fix partitions schema inference

2021-02-18 Thread GitBox


SparkQA commented on pull request #31549:
URL: https://github.com/apache/spark/pull/31549#issuecomment-781886866


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39839/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


SparkQA commented on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781880633


   **[Test build #135263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135263/testReport)**
 for PR 31495 at commit 
[`03801f7`](https://github.com/apache/spark/commit/03801f726a1443736dd7e404ba6a6ac1f8740c10).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


HeartSaVioR commented on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781878903


   retest this, please
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #31588: [SPARK-34470][ML] VectorSlicer use ordering if possible

2021-02-18 Thread GitBox


zhengruifeng commented on pull request #31588:
URL: https://github.com/apache/spark/pull/31588#issuecomment-781877440


   test:
   ```
 test("performance") {
   val rng = new Random(123)
   val n = 10
   val dim = 1
   val nnz = 100
   val vectors = Array.tabulate(n) { i =>
 val indices = rng.shuffle(Seq.range(0, dim)).take(nnz).sorted.toArray
 val values = Array.fill(nnz)(rng.nextGaussian)
 new SparseVector(dim, indices, values)
   }
   
   val slicingIndices = rng.shuffle(Seq.range(0, 
dim)).take(nnz).sorted.toArray
   
   val tic0 = System.currentTimeMillis()
   Seq.range(0, 100).foreach { i =>
 vectors.foreach { sv => sv.slice(slicingIndices) }
   }
   val toc0 = System.currentTimeMillis()
   println(s"slice: ${toc0 - tic0}")
   
   val tic1 = System.currentTimeMillis()
   Seq.range(0, 100).foreach { i =>
 vectors.foreach { sv => sv.sliceSorted(slicingIndices) }
   }
   val toc1 = System.currentTimeMillis()
   println(s"sliceSorted: ${toc1 - tic1}")
 }
   ```
   
   results:
   
   
![image](https://user-images.githubusercontent.com/7322292/108469701-b6714d80-72c3-11eb-9d6b-86f2c6130383.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


SparkQA commented on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781877375


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39840/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781876771


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135260/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31570: [WIP][SPARK-10816][SS] SessionWindow support for Structure Streaming

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31570:
URL: https://github.com/apache/spark/pull/31570#issuecomment-781876774


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39836/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781876772


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135250/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781876773


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39838/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request #31588: [SPARK-34470][ML] VectorSlicer use ordering if possible

2021-02-18 Thread GitBox


zhengruifeng opened a new pull request #31588:
URL: https://github.com/apache/spark/pull/31588


   ### What changes were proposed in this pull request?
   1, add a new method `sliceSorted` for `SparseVector`;
   2, in `VectorSlicer`, switch to `sliceSorted` if input indices are ordered.
   
   
   ### Why are the changes needed?
   The input indices of VectorSlicer are probably ordered.
   VectorSlicer should use this attribute if possible.
   
   I did a simple test and `sliceSorted` is about 70% faster than `slice`
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   added testsuite
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781876772


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135250/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781876773


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39838/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781876771


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135260/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31570: [WIP][SPARK-10816][SS] SessionWindow support for Structure Streaming

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31570:
URL: https://github.com/apache/spark/pull/31570#issuecomment-781876774


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39836/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon removed a comment on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


HyukjinKwon removed a comment on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781871010


   @tgravescs and @holdenk, I plan to cut the RC as soon as possible but it 
seems like this PR includes two small changes that might matter in 
compatibility:
   
   - 
https://github.com/apache/spark/pull/31496/files#diff-6f08c35f1a0ae172ac66c7a8148e2516f3f2ce4887b592a64fb8207055dbb4e3R28
   - 
https://github.com/apache/spark/pull/31496/files#diff-a6d96a65d9905b310451b125acac6610ffbd6b4548461bd1d5a18dc29282814aR95
   
   Would you mind taking a quick look please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


HyukjinKwon commented on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781874079


   @tgravescs and @holdenk, I plan to cut the RC as soon as possible (after the 
blocker #31550 is merged), but it seems like this PR includes two small changes 
that might matter in compatibility:
   
   - 
https://github.com/apache/spark/pull/31496/files#diff-6f08c35f1a0ae172ac66c7a8148e2516f3f2ce4887b592a64fb8207055dbb4e3R28
   - 
https://github.com/apache/spark/pull/31496/files#diff-a6d96a65d9905b310451b125acac6610ffbd6b4548461bd1d5a18dc29282814aR95
   
   The changes look pretty fine to me, and I would like to merge this soon. I 
would appreciate if you guys take another look when you guys fine some time :-).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31549: [SPARK-34314][SQL] Fix partitions schema inference

2021-02-18 Thread GitBox


SparkQA commented on pull request #31549:
URL: https://github.com/apache/spark/pull/31549#issuecomment-781871787


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39839/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


HyukjinKwon commented on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781871010


   @tgravescs and @holdenk, I plan to cut the RC as soon as possible but it 
seems like this PR includes two small changes that might matter in 
compatibility:
   
   - 
https://github.com/apache/spark/pull/31496/files#diff-6f08c35f1a0ae172ac66c7a8148e2516f3f2ce4887b592a64fb8207055dbb4e3R28
   - 
https://github.com/apache/spark/pull/31496/files#diff-a6d96a65d9905b310451b125acac6610ffbd6b4548461bd1d5a18dc29282814aR95
   
   Would you mind taking a quick look please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31570: [WIP][SPARK-10816][SS] SessionWindow support for Structure Streaming

2021-02-18 Thread GitBox


SparkQA commented on pull request #31570:
URL: https://github.com/apache/spark/pull/31570#issuecomment-781870068


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39836/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #31316: [SPARK-33599][SQL][FOLLOWUP] Group exception messages in catalyst/analysis

2021-02-18 Thread GitBox


beliefer commented on pull request #31316:
URL: https://github.com/apache/spark/pull/31316#issuecomment-781866313


   ping @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-18 Thread GitBox


zhengruifeng commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r578958152



##
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
* because it can push the sorting down into the shuffle machinery.
*/
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, 
V)] = self.withScope {
-new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+if (self.partitioner == Some(partitioner)) {
+  self.mapPartitions(iter => {
+val context = TaskContext.get
+val sorter = new ExternalSorter[K, V, V](context, None, None, 
Some(ordering))
+sorter.insertAll(iter)
+context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)

Review comment:
   I review the related codes and it seems that `sorter.iterator` may spill 
during traverse:
   `isShuffleSort = false` in `def iterator` makes internals iterator 
`destructiveIterator` a `SpillableIterator`.
   
   
   
   We can add the update of `taskMetrics` to a task completion listener if 
necessary, but maybe in a new ticket.
   
   As to this PR, I perfer to keep the line with existing impl.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


SparkQA commented on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781860361


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39838/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


SparkQA removed a comment on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781807866


   **[Test build #135250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135250/testReport)**
 for PR 31495 at commit 
[`03801f7`](https://github.com/apache/spark/commit/03801f726a1443736dd7e404ba6a6ac1f8740c10).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


SparkQA commented on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781859857


   **[Test build #135250 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135250/testReport)**
 for PR 31495 at commit 
[`03801f7`](https://github.com/apache/spark/commit/03801f726a1443736dd7e404ba6a6ac1f8740c10).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31549: [SPARK-34314][SQL] Fix partitions schema inference

2021-02-18 Thread GitBox


SparkQA commented on pull request #31549:
URL: https://github.com/apache/spark/pull/31549#issuecomment-781859366


   **[Test build #135259 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135259/testReport)**
 for PR 31549 at commit 
[`14cb5ab`](https://github.com/apache/spark/commit/14cb5ab4429918a806c3c4dfcaa74b3d0bed5eaa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


SparkQA commented on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781857670


   **[Test build #135260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135260/testReport)**
 for PR 31545 at commit 
[`24302b9`](https://github.com/apache/spark/commit/24302b9b7e1bb5568c7918369599a43123912f88).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has 0/1 partition

2021-02-18 Thread GitBox


viirya commented on a change in pull request #31468:
URL: https://github.com/apache/spark/pull/31468#discussion_r578951440



##
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
##
@@ -52,16 +53,25 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) 
extends LimitExec {
 SQLShuffleReadMetricsReporter.createShuffleReadMetrics(sparkContext)
   override lazy val metrics = readMetrics ++ writeMetrics
   protected override def doExecute(): RDD[InternalRow] = {
-val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit))
-val shuffled = new ShuffledRowRDD(
-  ShuffleExchangeExec.prepareShuffleDependency(
-locallyLimited,
-child.output,
-SinglePartition,
-serializer,
-writeMetrics),
-  readMetrics)
-shuffled.mapPartitionsInternal(_.take(limit))
+val childRDD = child.execute()
+if (childRDD.getNumPartitions == 0) {
+  new ParallelCollectionRDD(sparkContext, Seq.empty[InternalRow], 1, 
Map.empty)

Review comment:
   > or we can use `EmptyRDDWithPartitions` defined in `CoalesceExec`?
   > maybe we can make `EmptyRDD` support number of partition, so we can use it 
in both `CollectLimitExec` and `CoalesceExec`. But I am not sure whether we 
should do this.
   
   I think it sounds over-engineering. At least for now I don't think we need 
it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31570: [WIP][SPARK-10816][SS] SessionWindow support for Structure Streaming

2021-02-18 Thread GitBox


SparkQA commented on pull request #31570:
URL: https://github.com/apache/spark/pull/31570#issuecomment-781851450


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39836/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has 0/1 partition

2021-02-18 Thread GitBox


viirya commented on a change in pull request #31468:
URL: https://github.com/apache/spark/pull/31468#discussion_r578951088



##
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
##
@@ -52,16 +53,25 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) 
extends LimitExec {
 SQLShuffleReadMetricsReporter.createShuffleReadMetrics(sparkContext)
   override lazy val metrics = readMetrics ++ writeMetrics
   protected override def doExecute(): RDD[InternalRow] = {
-val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit))
-val shuffled = new ShuffledRowRDD(
-  ShuffleExchangeExec.prepareShuffleDependency(
-locallyLimited,
-child.output,
-SinglePartition,
-serializer,
-writeMetrics),
-  readMetrics)
-shuffled.mapPartitionsInternal(_.take(limit))
+val childRDD = child.execute()
+if (childRDD.getNumPartitions == 0) {
+  new ParallelCollectionRDD(sparkContext, Seq.empty[InternalRow], 1, 
Map.empty)

Review comment:
   > @viirya the `outputPartitioning` of `CollectLimitExec` is 
`SinglePartition`, so the output rdd should have single partition.
   
   Oh, my previous comment was confusing. I mean I am not sure if its 
`outputPartitioning` must be ` SinglePartition`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781848151







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781848152


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39831/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31273: [SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31273:
URL: https://github.com/apache/spark/pull/31273#issuecomment-781848158


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39834/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781848148


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39830/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781848147


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39835/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781848149







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781848150


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39832/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781848150


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39832/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781848148


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39830/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781848147


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39835/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31273: [SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31273:
URL: https://github.com/apache/spark/pull/31273#issuecomment-781848158


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39834/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781848149







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781848151







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781848152


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39831/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


SparkQA commented on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781845567


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39838/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


SparkQA removed a comment on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781788234


   **[Test build #135246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135246/testReport)**
 for PR 31496 at commit 
[`9e2bd4b`](https://github.com/apache/spark/commit/9e2bd4b2fb59eca549baa88dc772e25833751e5c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


SparkQA commented on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781841961


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39826/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has 0/1 partition

2021-02-18 Thread GitBox


zhengruifeng commented on a change in pull request #31468:
URL: https://github.com/apache/spark/pull/31468#discussion_r578943394



##
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
##
@@ -52,16 +53,25 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) 
extends LimitExec {
 SQLShuffleReadMetricsReporter.createShuffleReadMetrics(sparkContext)
   override lazy val metrics = readMetrics ++ writeMetrics
   protected override def doExecute(): RDD[InternalRow] = {
-val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit))
-val shuffled = new ShuffledRowRDD(
-  ShuffleExchangeExec.prepareShuffleDependency(
-locallyLimited,
-child.output,
-SinglePartition,
-serializer,
-writeMetrics),
-  readMetrics)
-shuffled.mapPartitionsInternal(_.take(limit))
+val childRDD = child.execute()
+if (childRDD.getNumPartitions == 0) {
+  new ParallelCollectionRDD(sparkContext, Seq.empty[InternalRow], 1, 
Map.empty)

Review comment:
   or we can use `EmptyRDDWithPartitions` defined in `CoalesceExec`?
   maybe we can make `EmptyRDD` support number of partition, so we can use it 
in both `CollectLimitExec` and `CoalesceExec`. But I am not sure whether we 
should do this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31496: [SPARK-34384][CORE] Add missing docs for ResourceProfile APIs

2021-02-18 Thread GitBox


SparkQA commented on pull request #31496:
URL: https://github.com/apache/spark/pull/31496#issuecomment-781839349


   **[Test build #135246 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135246/testReport)**
 for PR 31496 at commit 
[`9e2bd4b`](https://github.com/apache/spark/commit/9e2bd4b2fb59eca549baa88dc772e25833751e5c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


SparkQA commented on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781836772


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39835/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has 0/1 partition

2021-02-18 Thread GitBox


zhengruifeng commented on a change in pull request #31468:
URL: https://github.com/apache/spark/pull/31468#discussion_r578940920



##
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
##
@@ -52,16 +53,25 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) 
extends LimitExec {
 SQLShuffleReadMetricsReporter.createShuffleReadMetrics(sparkContext)
   override lazy val metrics = readMetrics ++ writeMetrics
   protected override def doExecute(): RDD[InternalRow] = {
-val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit))
-val shuffled = new ShuffledRowRDD(
-  ShuffleExchangeExec.prepareShuffleDependency(
-locallyLimited,
-child.output,
-SinglePartition,
-serializer,
-writeMetrics),
-  readMetrics)
-shuffled.mapPartitionsInternal(_.take(limit))
+val childRDD = child.execute()
+if (childRDD.getNumPartitions == 0) {
+  new ParallelCollectionRDD(sparkContext, Seq.empty[InternalRow], 1, 
Map.empty)

Review comment:
   @viirya the `outputPartitioning` of `CollectLimitExec` is  
`SinglePartition`, so the output rdd should have single partition.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31273: [SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase

2021-02-18 Thread GitBox


SparkQA commented on pull request #31273:
URL: https://github.com/apache/spark/pull/31273#issuecomment-781835341


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39834/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


SparkQA commented on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781833776


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39830/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


SparkQA removed a comment on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781829338


   **[Test build #135257 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135257/testReport)**
 for PR 31545 at commit 
[`daf7c8a`](https://github.com/apache/spark/commit/daf7c8a97aa10721cdcb181059b6cf1c2134a295).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


SparkQA commented on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781832030


   **[Test build #135257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135257/testReport)**
 for PR 31545 at commit 
[`daf7c8a`](https://github.com/apache/spark/commit/daf7c8a97aa10721cdcb181059b6cf1c2134a295).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


SparkQA commented on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781832039


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39831/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31570: [WIP][SPARK-10816][SS] SessionWindow support for Structure Streaming

2021-02-18 Thread GitBox


SparkQA commented on pull request #31570:
URL: https://github.com/apache/spark/pull/31570#issuecomment-781829637


   **[Test build #135256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135256/testReport)**
 for PR 31570 at commit 
[`fc3d122`](https://github.com/apache/spark/commit/fc3d1224bf2f66dd8c30bc58db1ade37bfdcad1e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot

2021-02-18 Thread GitBox


SparkQA commented on pull request #31545:
URL: https://github.com/apache/spark/pull/31545#issuecomment-781829338


   **[Test build #135257 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135257/testReport)**
 for PR 31545 at commit 
[`daf7c8a`](https://github.com/apache/spark/commit/daf7c8a97aa10721cdcb181059b6cf1c2134a295).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31390:
URL: https://github.com/apache/spark/pull/31390#issuecomment-781826004


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39833/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31390:
URL: https://github.com/apache/spark/pull/31390#issuecomment-781826004


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39833/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim

2021-02-18 Thread GitBox


SparkQA commented on pull request #31390:
URL: https://github.com/apache/spark/pull/31390#issuecomment-781825994


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39833/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


SparkQA commented on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781824605


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39835/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim

2021-02-18 Thread GitBox


SparkQA commented on pull request #31390:
URL: https://github.com/apache/spark/pull/31390#issuecomment-781824395


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39833/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


SparkQA commented on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781823524


   **[Test build #135258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135258/testReport)**
 for PR 31348 at commit 
[`7e55532`](https://github.com/apache/spark/commit/7e55532799f98b55ec6bae47fd2bd830d5be4b78).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781822666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135255/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31404: [SPARK-34283][SQL] Combines all adjacent 'Union' operators into a single 'Union' when using 'Dataset.union.distinct.union.disti

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31404:
URL: https://github.com/apache/spark/pull/31404#issuecomment-781822664


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39828/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781822661







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-781822668


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39829/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


AmplabJenkins removed a comment on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781822662


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135245/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781822665







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31273: [SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase

2021-02-18 Thread GitBox


SparkQA commented on pull request #31273:
URL: https://github.com/apache/spark/pull/31273#issuecomment-781822794


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39834/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31404: [SPARK-34283][SQL] Combines all adjacent 'Union' operators into a single 'Union' when using 'Dataset.union.distinct.union.distinct'

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31404:
URL: https://github.com/apache/spark/pull/31404#issuecomment-781822664


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39828/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781822666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135255/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781822662


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135245/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2021-02-18 Thread GitBox


AmplabJenkins commented on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-781822668


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39829/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baibaichen edited a comment on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-02-18 Thread GitBox


baibaichen edited a comment on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-781817208


   @LuciferYang, where footer is cached, driver or executor?  As I understand, 
the footer will be used at executor side, are you caching the footer at 
executor side?
   
   If you cache footer at executor, how do you schedule tasks to the cached 
executor?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-18 Thread GitBox


SparkQA commented on pull request #31495:
URL: https://github.com/apache/spark/pull/31495#issuecomment-781819057


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39830/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


SparkQA commented on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781818065


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39831/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amandeep-sharma commented on a change in pull request #31545: [SPARK-34417] [SQL] org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name h

2021-02-18 Thread GitBox


amandeep-sharma commented on a change in pull request #31545:
URL: https://github.com/apache/spark/pull/31545#discussion_r578925797



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
##
@@ -394,10 +395,11 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   }
 
   private def fillMap(values: Seq[(String, Any)]): DataFrame = {
+val resolved = mutable.Map[String, Any]()

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baibaichen commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-02-18 Thread GitBox


baibaichen commented on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-781817208


   @LuciferYang, where footer is cached, driver or executor?  As I understand, 
the footer will be used at executor side, are you caching the footer at 
executor side?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


SparkQA removed a comment on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781806582


   **[Test build #135251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135251/testReport)**
 for PR 31479 at commit 
[`950958c`](https://github.com/apache/spark/commit/950958c1a33d178f1ad415a6cdd134d917a4abc2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


SparkQA commented on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781816957


   **[Test build #135251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135251/testReport)**
 for PR 31479 at commit 
[`950958c`](https://github.com/apache/spark/commit/950958c1a33d178f1ad415a6cdd134d917a4abc2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #31586: [SPARK-34466][SQL][DOCS] Improve docs for `ALTER TABLE .. RENAME TO`

2021-02-18 Thread GitBox


cloud-fan closed pull request #31586:
URL: https://github.com/apache/spark/pull/31586


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #31586: [SPARK-34466][SQL][DOCS] Improve docs for `ALTER TABLE .. RENAME TO`

2021-02-18 Thread GitBox


cloud-fan commented on pull request #31586:
URL: https://github.com/apache/spark/pull/31586#issuecomment-781815508


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-18 Thread GitBox


SparkQA commented on pull request #31479:
URL: https://github.com/apache/spark/pull/31479#issuecomment-781814471


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39827/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #31586: [SPARK-34466][SQL][DOCS] Improve docs for `ALTER TABLE .. RENAME TO`

2021-02-18 Thread GitBox


MaxGekk commented on pull request #31586:
URL: https://github.com/apache/spark/pull/31586#issuecomment-781814151


   @cloud-fan Do the changes make sense to you?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


SparkQA removed a comment on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781809577


   **[Test build #135255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135255/testReport)**
 for PR 31587 at commit 
[`b2a34f8`](https://github.com/apache/spark/commit/b2a34f8ffc593c9e8f23ea2413fb3c5c5551bfbb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


SparkQA commented on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781813769


   **[Test build #135255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135255/testReport)**
 for PR 31587 at commit 
[`b2a34f8`](https://github.com/apache/spark/commit/b2a34f8ffc593c9e8f23ea2413fb3c5c5551bfbb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


SparkQA removed a comment on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781762077


   **[Test build #135245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135245/testReport)**
 for PR 31348 at commit 
[`06f6d0d`](https://github.com/apache/spark/commit/06f6d0d515e58d5af54b0f452889ff01f05ca6b5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

2021-02-18 Thread GitBox


SparkQA commented on pull request #31348:
URL: https://github.com/apache/spark/pull/31348#issuecomment-781812504


   **[Test build #135245 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135245/testReport)**
 for PR 31348 at commit 
[`06f6d0d`](https://github.com/apache/spark/commit/06f6d0d515e58d5af54b0f452889ff01f05ca6b5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2021-02-18 Thread GitBox


SparkQA commented on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-781811974


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39829/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31587: [SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped

2021-02-18 Thread GitBox


SparkQA commented on pull request #31587:
URL: https://github.com/apache/spark/pull/31587#issuecomment-781809577


   **[Test build #135255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135255/testReport)**
 for PR 31587 at commit 
[`b2a34f8`](https://github.com/apache/spark/commit/b2a34f8ffc593c9e8f23ea2413fb3c5c5551bfbb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31404: [SPARK-34283][SQL] Combines all adjacent 'Union' operators into a single 'Union' when using 'Dataset.union.distinct.union.distinct'

2021-02-18 Thread GitBox


SparkQA commented on pull request #31404:
URL: https://github.com/apache/spark/pull/31404#issuecomment-781808687


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39828/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >