[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820133369


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41973/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820133369


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41973/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32107: [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum`

2021-04-14 Thread GitBox


HyukjinKwon commented on pull request #32107:
URL: https://github.com/apache/spark/pull/32107#issuecomment-820133395


   oh yeah to retrigger the test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820133344






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #32107: [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum`

2021-04-14 Thread GitBox


MaxGekk commented on pull request #32107:
URL: https://github.com/apache/spark/pull/32107#issuecomment-820132545


   @HyukjinKwon Should this be rebased/merged on the master?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32124: [SPARK-35024][ML][WIP] Refactor LinearSVC - support virtual centering

2021-04-14 Thread GitBox


SparkQA commented on pull request #32124:
URL: https://github.com/apache/spark/pull/32124#issuecomment-820130895


   **[Test build #137400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137400/testReport)**
 for PR 32124 at commit 
[`5857a52`](https://github.com/apache/spark/commit/5857a52f9cb9fe787b22dfdfd647e04b801224db).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


SparkQA commented on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820130820


   **[Test build #137399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137399/testReport)**
 for PR 32180 at commit 
[`1d981c7`](https://github.com/apache/spark/commit/1d981c7651054c5e05292b8f0d0e94d0cd50f518).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #32161: [SPARK-35025] Move Parquet data source options from Python and Scala into a single page.

2021-04-14 Thread GitBox


itholic commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-820130117


   cc @HyukjinKwon 
   
   Could you please review this when you find some time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-820129361


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41974/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820129359


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137393/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820129357


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137387/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820129378


   **[Test build 
#750871196](https://github.com/Ngone51/spark/actions/runs/750871196)** for PR 
32180 at commit 
[`1d981c7`](https://github.com/Ngone51/spark/commit/1d981c7651054c5e05292b8f0d0e94d0cd50f518).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820129357


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137387/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-820129361


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41974/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820129359


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137393/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle

2021-04-14 Thread GitBox


SparkQA commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-820128753






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


SparkQA removed a comment on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820021274


   **[Test build #137393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137393/testReport)**
 for PR 32139 at commit 
[`643418a`](https://github.com/apache/spark/commit/643418a77559d4747780d5176235afa99584053e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


SparkQA commented on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820127913


   **[Test build #137393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137393/testReport)**
 for PR 32139 at commit 
[`643418a`](https://github.com/apache/spark/commit/643418a77559d4747780d5176235afa99584053e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #32124: [SPARK-35024][ML][WIP] Refactor LinearSVC - support virtual centering

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #32124:
URL: https://github.com/apache/spark/pull/32124#issuecomment-820127625


   **[Test build 
#750853487](https://github.com/zhengruifeng/spark/actions/runs/750853487)** for 
PR 32124 at commit 
[`5857a52`](https://github.com/zhengruifeng/spark/commit/5857a52f9cb9fe787b22dfdfd647e04b801224db).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a change in pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


attilapiros commented on a change in pull request #32180:
URL: https://github.com/apache/spark/pull/32180#discussion_r613765906



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -551,8 +551,10 @@ final class ShuffleBlockFetcherIterator(
 // Send out initial requests for blocks, up to our maxBytesInFlight
 fetchUpToMaxBytes()
 
-val numFetches = remoteRequests.size - fetchRequests.size
-logInfo(s"Started $numFetches remote fetches in 
${Utils.getUsedTimeNs(startTimeNs)}")
+val numDeferredRequest = deferredFetchRequests.values.map(_.size).sum
+val numFetches = remoteRequests.size - fetchRequests.size - 
numDeferredRequest
+logInfo(s"Started $numFetches remote fetches in 
${Utils.getUsedTimeNs(startTimeNs)}" +
+  s"${if (numDeferredRequest > 0 ) s", deferred $numDeferredRequest 
requests" else "" }")

Review comment:
   ```suggestion
 (if (numDeferredRequest > 0 ) s", deferred $numDeferredRequest 
requests" else ""))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple

2021-04-14 Thread GitBox


dongjoon-hyun commented on pull request #32164:
URL: https://github.com/apache/spark/pull/32164#issuecomment-820118853


   Thanks, @sarutak and @HyukjinKwon . Merged to master/3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple

2021-04-14 Thread GitBox


dongjoon-hyun closed pull request #32164:
URL: https://github.com/apache/spark/pull/32164


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820111567


   @mridulm you mean using the `TaskCompletionListener`, right?
   
   As I see the code of the `MonitorThread`:  one of its responsibility to 
handle task interruption:
   
https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L582-L584
   
   The code goes on what to do when the task is interrupted and not completed.
   
   But task interruption is not a completion you can see when it flagged to be 
interrupted no listener informed:
   
https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/TaskContextImpl.scala#L149-L151
 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


SparkQA removed a comment on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820015616


   **[Test build #137387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137387/testReport)**
 for PR 32180 at commit 
[`357c36c`](https://github.com/apache/spark/commit/357c36c593df5a08e9fecc200e163b3b26d1c5a1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


SparkQA commented on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820104323


   **[Test build #137387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137387/testReport)**
 for PR 32180 at commit 
[`357c36c`](https://github.com/apache/spark/commit/357c36c593df5a08e9fecc200e163b3b26d1c5a1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


Ngone51 commented on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820100868


   @mridulm Thanks for the approval!
   
   cc @tgravescs @attilapiros for taking a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too

2021-04-14 Thread GitBox


wangyum commented on pull request #32163:
URL: https://github.com/apache/spark/pull/32163#issuecomment-820096175


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum closed pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too

2021-04-14 Thread GitBox


wangyum closed pull request #32163:
URL: https://github.com/apache/spark/pull/32163


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32107: [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum`

2021-04-14 Thread GitBox


HyukjinKwon commented on pull request #32107:
URL: https://github.com/apache/spark/pull/32107#issuecomment-820095715


   Just curious. Does other DBMSes support aggregation on such interval types?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints

2021-04-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #32147:
URL: https://github.com/apache/spark/pull/32147#discussion_r613756126



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
##
@@ -2844,6 +2844,25 @@ abstract class JsonSuite
   assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
 }
   }
+
+  test("Write Non-ASCII character as codepoint") {
+// scalastyle:off nonascii
+withTempPath { path =>
+  val basePath = path.getCanonicalPath
+  Seq("a", "\n", "\u3042").toDF.write
+.option("writeNonAsciiCharacterAsCodePoint", "true").json(s"$basePath")
+  val actualText = spark.read.text(s"$basePath")
+.sort("value").map(_.getString(0)).collect().mkString
+  val expectedText = 
"{\"value\":\"\\n\"}{\"value\":\"\\u3042\"}{\"value\":\"a\"}"
+  assert(actualText === expectedText)
+
+  val actualJson = spark.read.json(s"$basePath")
+.sort("value").map(_.getString(0)).collect().mkString
+  val expectedJson = "\na\u3042"
+  assert(actualJson === expectedJson)
+}
+// scalastyle:on nonascii
+  }

Review comment:
   It would be great if we can have a test coverage for setting both pretty 
and non-ascii-as-codepoint at the same time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints

2021-04-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #32147:
URL: https://github.com/apache/spark/pull/32147#discussion_r613755840



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
##
@@ -2844,6 +2844,25 @@ abstract class JsonSuite
   assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
 }
   }
+
+  test("Write Non-ASCII character as codepoint") {

Review comment:
   If you don't mind, shall we have a JIRA ID prefix?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints

2021-04-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #32147:
URL: https://github.com/apache/spark/pull/32147#discussion_r613755701



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
##
@@ -73,7 +73,12 @@ private[sql] class JacksonGenerator(
 
   private val gen = {
 val generator = new 
JsonFactory().createGenerator(writer).setRootValueSeparator(null)
-if (options.pretty) generator.useDefaultPrettyPrinter() else generator
+val ppGenerator = if (options.pretty) generator.useDefaultPrettyPrinter() 
else generator
+if (options.writeNonAsciiCharacterAsCodePoint) {
+  generator.setHighestNonEscapedChar(0x7F)

Review comment:
   This code means that we cannot set both `options.pretty` and 
`options.writeNonAsciiCharacterAsCodePoint`. If this is true, shall we document 
it somewhere?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


mridulm commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820094334


   @attilapiros I dont have much context about python runner; but curious if 
`MonitorThread` can follow the same pattern/lifecycle as `writerThread` in that 
method ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file

2021-04-14 Thread GitBox


yaooqinn commented on pull request #32184:
URL: https://github.com/apache/spark/pull/32184#issuecomment-820092327


   looks nice, and good to have it tested per @dongjoon-hyun's suggestion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints

2021-04-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #32147:
URL: https://github.com/apache/spark/pull/32147#discussion_r613752661



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
##
@@ -135,6 +135,9 @@ private[sql] class JSONOptions(
*/
   val inferTimestamp: Boolean = 
parameters.get("inferTimestamp").map(_.toBoolean).getOrElse(false)
 
+  val writeNonAsciiCharacterAsCodePoint: Boolean =

Review comment:
   Shall we add some description?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState

2021-04-14 Thread GitBox


dongjoon-hyun commented on pull request #32162:
URL: https://github.com/apache/spark/pull/32162#issuecomment-820090451


   Thank you, @sarutak .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState

2021-04-14 Thread GitBox


dongjoon-hyun closed pull request #32162:
URL: https://github.com/apache/spark/pull/32162


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle

2021-04-14 Thread GitBox


SparkQA commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-820090027


   **[Test build #137398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137398/testReport)**
 for PR 30480 at commit 
[`9614a0c`](https://github.com/apache/spark/commit/9614a0c4124b521915923d34ea192bfde4eddcc2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


mridulm commented on a change in pull request #32180:
URL: https://github.com/apache/spark/pull/32180#discussion_r613751403



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -551,8 +551,10 @@ final class ShuffleBlockFetcherIterator(
 // Send out initial requests for blocks, up to our maxBytesInFlight
 fetchUpToMaxBytes()
 
-val numFetches = remoteRequests.size - fetchRequests.size
-logInfo(s"Started $numFetches remote fetches in 
${Utils.getUsedTimeNs(startTimeNs)}")
+val numDeferredRequest = deferredFetchRequests.values.map(_.size).sum
+val numFetches = remoteRequests.size - fetchRequests.size - 
numDeferredRequest
+logInfo(s"Started $numFetches remote fetches in 
${Utils.getUsedTimeNs(startTimeNs)}" +
+  s"${if (numDeferredRequest > 0 ) s", deferred $numDeferredRequest 
requests" else "" }")

Review comment:
   There are no `deferredFetchRequests` when `initialize` is invoked - it 
will always be empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820089733


   **[Test build #137397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137397/testReport)**
 for PR 32169 at commit 
[`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32164:
URL: https://github.com/apache/spark/pull/32164#issuecomment-820089388


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137383/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #31451:
URL: https://github.com/apache/spark/pull/31451#issuecomment-820089390


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41968/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #31010:
URL: https://github.com/apache/spark/pull/31010#issuecomment-820089389


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41972/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32163:
URL: https://github.com/apache/spark/pull/32163#issuecomment-820089385


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137384/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820089384


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41969/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32184:
URL: https://github.com/apache/spark/pull/32184#issuecomment-820089386


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41971/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32185: Branch 3.2

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32185:
URL: https://github.com/apache/spark/pull/32185#issuecomment-820089563


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32184:
URL: https://github.com/apache/spark/pull/32184#issuecomment-820089386


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41971/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #31451:
URL: https://github.com/apache/spark/pull/31451#issuecomment-820089390


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41968/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #31010:
URL: https://github.com/apache/spark/pull/31010#issuecomment-820089389


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41972/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32163:
URL: https://github.com/apache/spark/pull/32163#issuecomment-820089385


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137384/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820089384


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41969/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32164:
URL: https://github.com/apache/spark/pull/32164#issuecomment-820089388


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137383/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file

2021-04-14 Thread GitBox


SparkQA commented on pull request #32184:
URL: https://github.com/apache/spark/pull/32184#issuecomment-820089269


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41971/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32179: [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated

2021-04-14 Thread GitBox


dongjoon-hyun commented on pull request #32179:
URL: https://github.com/apache/spark/pull/32179#issuecomment-820088230


   Could you check the relevant UT failures, @allisonwang-db ?
   ```
   org.apache.spark.sql.AnalysisException
   [info]   Correlated column is not allowed in a non-equality predicate:
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820087911


   **[Test build 
#750728539](https://github.com/attilapiros/spark/actions/runs/750728539)** for 
PR 32169 at commit 
[`c4a5e2d`](https://github.com/attilapiros/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause

2021-04-14 Thread GitBox


SparkQA commented on pull request #31010:
URL: https://github.com/apache/spark/pull/31010#issuecomment-820087686






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file

2021-04-14 Thread GitBox


SparkQA commented on pull request #32184:
URL: https://github.com/apache/spark/pull/32184#issuecomment-820087668


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41971/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle

2021-04-14 Thread GitBox


mridulm commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-820087150


   Looks good to me, thanks for the changes @venkata91
   +CC @Ngone51, @tgravescs, @attilapiros for another pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too

2021-04-14 Thread GitBox


SparkQA removed a comment on pull request #32163:
URL: https://github.com/apache/spark/pull/32163#issuecomment-819957750


   **[Test build #137384 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137384/testReport)**
 for PR 32163 at commit 
[`4d88408`](https://github.com/apache/spark/commit/4d8840857288f4c71d02ac02a0f818249fb82caa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too

2021-04-14 Thread GitBox


SparkQA commented on pull request #32163:
URL: https://github.com/apache/spark/pull/32163#issuecomment-820084653


   **[Test build #137384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137384/testReport)**
 for PR 32163 at commit 
[`4d88408`](https://github.com/apache/spark/commit/4d8840857288f4c71d02ac02a0f818249fb82caa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #31935: [SPARK-34789][TEST] Introduce Jetty based construct for integration tests where HTTP server is used

2021-04-14 Thread GitBox


dongjoon-hyun closed pull request #31935:
URL: https://github.com/apache/spark/pull/31935


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on pull request #32179: [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated

2021-04-14 Thread GitBox


allisonwang-db commented on pull request #32179:
URL: https://github.com/apache/spark/pull/32179#issuecomment-820083789


   > BTW, is it necessary to be a subquery with aggregation? From the fix, I 
cannot tell how aggregation affects it.
   
   SPARK-17348 provides some explanations on why aggregate is causing the 
issue. Basically, when a correlated predicate is pulled up, all attributes from 
the inner query will be added as GROUP BY columns. When the mapping is not 
one-to-one, for instance `a + b = outer(c)` in the example above, both `a` and 
`b` will be added as group by columns, and the `count(*)` will count the number 
rows for each combination of (a, b), instead of (a + b).
   
https://github.com/apache/spark/blob/7ff9d2e3eec514962e891420dbb3961e85826612/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L878-L901
   
   Pull up correlated predicates through Aggregate:
   
https://github.com/apache/spark/blob/3e218ade9cf6becc5de8b20a4385e345021a509d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala#L258-L264


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ChenDou2021 opened a new pull request #32185: Branch 3.2

2021-04-14 Thread GitBox


ChenDou2021 opened a new pull request #32185:
URL: https://github.com/apache/spark/pull/32185


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   Simplifying Python Code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

2021-04-14 Thread GitBox


HyukjinKwon commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820083032


   @attilapiros, there was a bit of unexpected infra issue for GA build. Can 
you sync/replace to the latest master branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple

2021-04-14 Thread GitBox


SparkQA removed a comment on pull request #32164:
URL: https://github.com/apache/spark/pull/32164#issuecomment-819957745


   **[Test build #137383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137383/testReport)**
 for PR 32164 at commit 
[`1446667`](https://github.com/apache/spark/commit/1446667bee96a9d3d1b0e642a226efe5ee67ee21).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple

2021-04-14 Thread GitBox


SparkQA commented on pull request #32164:
URL: https://github.com/apache/spark/pull/32164#issuecomment-820082684


   **[Test build #137383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137383/testReport)**
 for PR 32164 at commit 
[`1446667`](https://github.com/apache/spark/commit/1446667bee96a9d3d1b0e642a226efe5ee67ee21).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `public class VectorizedBLAS extends F2jBLAS `
 * `trait AnalysisOnlyCommand extends Command `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #32154: [SPARK-34995] Port/integrate Koalas remaining codes into PySpark

2021-04-14 Thread GitBox


itholic commented on pull request #32154:
URL: https://github.com/apache/spark/pull/32154#issuecomment-820082557


   Yeah, it includes the all changes after the porting in the main code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan

2021-04-14 Thread GitBox


SparkQA commented on pull request #31451:
URL: https://github.com/apache/spark/pull/31451#issuecomment-820081492


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41968/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


HyukjinKwon commented on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820081307


   Hm, one test failed:
   ```
   ==
   2021-04-15T03:54:15.0802269Z ERROR [61.702s]: test_monotonic 
(pyspark.pandas.tests.indexes.test_base.IndexesTest)
   2021-04-15T03:54:15.0803808Z 
--
   2021-04-15T03:54:15.0804406Z Traceback (most recent call last):
   2021-04-15T03:54:15.0835123Z   File 
"/__w/spark/spark/python/pyspark/pandas/tests/indexes/test_base.py", line 1280, 
in test_monotonic
   2021-04-15T03:54:15.0836254Z 
self.assert_eq(kmidx.is_monotonic_increasing, pmidx.is_monotonic_increasing)
   2021-04-15T03:54:15.0837366Z   File 
"/__w/spark/spark/python/pyspark/pandas/testing/utils.py", line 293, in 
assert_eq
   2021-04-15T03:54:15.0838145Z self.assertEqual(lobj, robj)
   2021-04-15T03:54:15.0838748Z AssertionError: False != True
   2021-04-15T03:54:15.0839117Z 
   2021-04-15T03:54:15.0840085Z 
--
   2021-04-15T03:54:15.0840665Z Ran 76 tests in 290.024s
   2021-04-15T03:54:15.0840965Z 
   2021-04-15T03:54:15.0841355Z FAILED (errors=1)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32179: [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated

2021-04-14 Thread GitBox


allisonwang-db commented on a change in pull request #32179:
URL: https://github.com/apache/spark/pull/32179#discussion_r613741888



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -950,9 +950,15 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
   case f: Filter =>
 val (correlated, _) = 
splitConjunctivePredicates(f.condition).partition(containsOuter)
 
-// Find any non-equality correlated predicates
+// Find any non-equality correlated predicates and equality predicates 
that do not
+// guarantee one-on-one mapping between inner and outer attributes. 
E.G:
+// a = outer(c) -> true
+// a > outer(c) -> false
+// a + b = outer(c) -> false (because there can be multiple 
combinations of a, b that
+// satisfy the condition)
 foundNonEqualCorrelatedPred = foundNonEqualCorrelatedPred || 
correlated.exists {
-  case _: EqualTo | _: EqualNullSafe => false
+  case Equality(_: Attribute, b) => 
b.find(_.isInstanceOf[Attribute]).isDefined

Review comment:
   Will do!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark

2021-04-14 Thread GitBox


SparkQA commented on pull request #32139:
URL: https://github.com/apache/spark/pull/32139#issuecomment-820077352






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle

2021-04-14 Thread GitBox


mridulm commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-820074290


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] exmy commented on pull request #32088: [SPARK-34987][SQL] AQE improve: change shuffle hash join to sort merg…

2021-04-14 Thread GitBox


exmy commented on pull request #32088:
URL: https://github.com/apache/spark/pull/32088#issuecomment-820071294


   @cloud-fan  could you help take a look and provide some suggestions? thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

2021-04-14 Thread GitBox


SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820067610


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820067686


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause

2021-04-14 Thread GitBox


SparkQA removed a comment on pull request #31010:
URL: https://github.com/apache/spark/pull/31010#issuecomment-820036980


   **[Test build #137396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137396/testReport)**
 for PR 31010 at commit 
[`1082710`](https://github.com/apache/spark/commit/108271015b84abb5bfda37b9546bfe3b138f43a9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #31010:
URL: https://github.com/apache/spark/pull/31010#issuecomment-820063786


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137396/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #31010:
URL: https://github.com/apache/spark/pull/31010#issuecomment-820063786


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137396/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32154: [SPARK-34995] Port/integrate Koalas remaining codes into PySpark

2021-04-14 Thread GitBox


HyukjinKwon commented on pull request #32154:
URL: https://github.com/apache/spark/pull/32154#issuecomment-820063668


   Are they all changes to port? Might be good to confirm with @ueshin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause

2021-04-14 Thread GitBox


SparkQA commented on pull request #31010:
URL: https://github.com/apache/spark/pull/31010#issuecomment-820062892


   **[Test build #137396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137396/testReport)**
 for PR 31010 at commit 
[`1082710`](https://github.com/apache/spark/commit/108271015b84abb5bfda37b9546bfe3b138f43a9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

2021-04-14 Thread GitBox


SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820035283


   **[Test build #137395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137395/testReport)**
 for PR 32177 at commit 
[`de6cb1e`](https://github.com/apache/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820061103


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137395/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820061103


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137395/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

2021-04-14 Thread GitBox


SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820060610


   **[Test build #137395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137395/testReport)**
 for PR 32177 at commit 
[`de6cb1e`](https://github.com/apache/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32090: [SPARK-34212][SQL][FOLLOWUP] Move the added test to ParquetQuerySuite

2021-04-14 Thread GitBox


cloud-fan commented on a change in pull request #32090:
URL: https://github.com/apache/spark/pull/32090#discussion_r613738091



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
##
@@ -840,6 +840,67 @@ abstract class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedS
 testMigration(fromTsType = "INT96", toTsType = "TIMESTAMP_MICROS")
 testMigration(fromTsType = "TIMESTAMP_MICROS", toTsType = "INT96")
   }
+
+  test("SPARK-34212 Parquet should read decimals correctly") {
+def readParquet(schema: String, path: File): DataFrame = {
+  spark.read.schema(schema).parquet(path.toString)
+}
+
+withTempPath { path =>
+  // a is int-decimal (4 bytes), b is long-decimal (8 bytes), c is 
binary-decimal (16 bytes)
+  val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS 
DECIMAL(36, 2)) c")
+  df.write.parquet(path.toString)
+
+  withAllParquetReaders {
+// We can read the decimal parquet field with a larger precision, if 
scale is the same.
+val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)"
+checkAnswer(readParquet(schema, path), df)
+  }
+
+  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
+val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)"
+checkAnswer(readParquet(schema1, path), df)
+val schema2 = "a DECIMAL(3, 0), b DECIMAL(18, 1), c DECIMAL(37, 1)"
+checkAnswer(readParquet(schema2, path), Row(1, 1.2, 1.2))
+  }
+
+  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true") {
+Seq("a DECIMAL(3, 2)", "b DECIMAL(18, 1)", "c DECIMAL(37, 1)").foreach 
{ schema =>
+  val e = intercept[SparkException] {
+readParquet(schema, path).collect()
+  }.getCause.getCause
+  assert(e.isInstanceOf[SchemaColumnConvertNotSupportedException])
+}
+  }
+}
+
+// tests for parquet types without decimal metadata.

Review comment:
   @viirya looking at the test, I think it was decided before that reading 
plain int/long as decimal is hard to implement in vectorized reader.
   
   Basically we need to do 2 steps:
   1. read the decimal from int/long as its actual precision/scale. Since it's 
a plain int/long, the precision should be max precision for int/long.
   2. cast the decimal to the required precision/scale.
   
   For vectorized reader, we can create a `Decimal` object with max precision 
for int/long, do the cast, and set the int/long to the vector if there is no 
overflow. This is super slow, but is still doable.
   
   It's not a real regression, as @wangyum demonstrated before, the previous 
behavior in 2.4 was not reasonable when overflow happens.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32166: [SPARK-35071][PYTHON] Rename Koalas to pandas-on-Spark in main codes

2021-04-14 Thread GitBox


HyukjinKwon closed pull request #32166:
URL: https://github.com/apache/spark/pull/32166


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32166: [SPARK-35071][PYTHON] Rename Koalas to pandas-on-Spark in main codes

2021-04-14 Thread GitBox


HyukjinKwon commented on pull request #32166:
URL: https://github.com/apache/spark/pull/32166#issuecomment-820049121


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #32182: [SPARK-35082][INFRA] Use permissive and squshed merge when syncing to the latest branch in GitHub Actions testing

2021-04-14 Thread GitBox


yaooqinn commented on pull request #32182:
URL: https://github.com/apache/spark/pull/32182#issuecomment-820047934


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shaneknapp commented on a change in pull request #32178: [DONOTMERGE] initial commit for skeleton ansible for jenkins worker config

2021-04-14 Thread GitBox


shaneknapp commented on a change in pull request #32178:
URL: https://github.com/apache/spark/pull/32178#discussion_r613737082



##
File path: dev/ansible-for-test-node/README.md
##
@@ -0,0 +1,25 @@
+# jenkins-infra
+
+This is a rough skeleton of the ansible used to deploy RISELab/Apache Spark 
Jenkins build workers on Ubuntu 20LTS.
+
+WARNING:  this will not work "directly out of the box" and will need to be 
tweaked to work on any ubuntu servers you might want to try this on.
+
+### deploy a new worker node
+ TL;DR:
+all of the configs for the workers live in roles/common/... and 
roles/jenkins-worker...
+
+ prereqs:
+* fresh install of ubuntu 20
+* a service account w/sudo
+* python 3, ansible, ansible-playbook installed locally
+* add hostname(s) to the `hosts` file
+* add this to your `~/.ansible.cfg`:
+```[defaults] host_key_checking = False```
+
+ fire ansible cannon!
+`ansible-playbook -u  deploy-jenkins-worker.yml -i 
 -k -b -K` 
+
+tips:
+* if you are installing more than a few workers, it's best to run the playbook 
on smaller (2-3) batches at a time.  this way it's easier to track down errors, 
as ansible is very noisy.
+* when you encounter an error, you should comment out any previously-run plays 
and tasks.  this saves time when debugging, and let's you easily track where 
you are in the process.
+* `apt-get remove ` and `apt-get purge ` are 
your friends

Review comment:
   TODO:  explain more the scope of the ansible in the bigger picture of 
the build system




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on pull request #32174: [SPARK-35048][INFRA] Only trigger the notify test workflow in upstream

2021-04-14 Thread GitBox


Yikun commented on pull request #32174:
URL: https://github.com/apache/spark/pull/32174#issuecomment-820044810


   Yep


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState

2021-04-14 Thread GitBox


AmplabJenkins commented on pull request #32162:
URL: https://github.com/apache/spark/pull/32162#issuecomment-820043597


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137385/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32162:
URL: https://github.com/apache/spark/pull/32162#issuecomment-820043597


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137385/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on pull request #32182: [SPARK-35082][INFRA] Use permissive and squshed merge when syncing to the latest branch in GitHub Actions testing

2021-04-14 Thread GitBox


Yikun commented on pull request #32182:
URL: https://github.com/apache/spark/pull/32182#issuecomment-820043537


   Thanks! It makes me feel the active and powerful of our community. : )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32136: [WIP][SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-04-14 Thread GitBox


viirya commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-820040504


   > Yes, that's true. But normally, even in the case of offering a single 
resource that released from a single task, it seems it's less possible to 
schedule tasks unevenly unless the resources are really scarce.
   
   As I saw in previous tests, it is by chance to have all tasks are evenly 
distributed to all executors. Sometimes it is, but sometimes only partial 
executors are scheduled at the first batch.

   > Do you have logs related to the scheduling? I'd like to see how it happens.
   
   I don't have logs now. It is a general and simple SS job reading from Kafka.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState

2021-04-14 Thread GitBox


SparkQA removed a comment on pull request #32162:
URL: https://github.com/apache/spark/pull/32162#issuecomment-819957837


   **[Test build #137385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137385/testReport)**
 for PR 32162 at commit 
[`badeb87`](https://github.com/apache/spark/commit/badeb87a363d68ac92349c9edfa9c921fd37d750).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan

2021-04-14 Thread GitBox


SparkQA commented on pull request #31451:
URL: https://github.com/apache/spark/pull/31451#issuecomment-820040127


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41968/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState

2021-04-14 Thread GitBox


SparkQA commented on pull request #32162:
URL: https://github.com/apache/spark/pull/32162#issuecomment-820040096


   **[Test build #137385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137385/testReport)**
 for PR 32162 at commit 
[`badeb87`](https://github.com/apache/spark/commit/badeb87a363d68ac92349c9edfa9c921fd37d750).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `public class VectorizedBLAS extends F2jBLAS `
 * `trait AnalysisOnlyCommand extends Command `
 * `  implicit class MetadataColumnHelper(attr: Attribute) `
 * `case class WriteToDataSourceV2(`
 * `case class WriteToMicroBatchDataSource(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32178: [DONOTMERGE] initial commit for skeleton ansible for jenkins worker config

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32178:
URL: https://github.com/apache/spark/pull/32178#issuecomment-820038308


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137381/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log

2021-04-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32180:
URL: https://github.com/apache/spark/pull/32180#issuecomment-820037077


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41964/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >