[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
AmplabJenkins removed a comment on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820133369 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
AmplabJenkins commented on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820133369 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32107: [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum`
HyukjinKwon commented on pull request #32107: URL: https://github.com/apache/spark/pull/32107#issuecomment-820133395 oh yeah to retrigger the test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
SparkQA commented on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820133344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #32107: [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum`
MaxGekk commented on pull request #32107: URL: https://github.com/apache/spark/pull/32107#issuecomment-820132545 @HyukjinKwon Should this be rebased/merged on the master? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32124: [SPARK-35024][ML][WIP] Refactor LinearSVC - support virtual centering
SparkQA commented on pull request #32124: URL: https://github.com/apache/spark/pull/32124#issuecomment-820130895 **[Test build #137400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137400/testReport)** for PR 32124 at commit [`5857a52`](https://github.com/apache/spark/commit/5857a52f9cb9fe787b22dfdfd647e04b801224db). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
SparkQA commented on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820130820 **[Test build #137399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137399/testReport)** for PR 32180 at commit [`1d981c7`](https://github.com/apache/spark/commit/1d981c7651054c5e05292b8f0d0e94d0cd50f518). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #32161: [SPARK-35025] Move Parquet data source options from Python and Scala into a single page.
itholic commented on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-820130117 cc @HyukjinKwon Could you please review this when you find some time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle
AmplabJenkins removed a comment on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-820129361 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
AmplabJenkins removed a comment on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820129359 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137393/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
AmplabJenkins removed a comment on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820129357 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137387/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
github-actions[bot] commented on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820129378 **[Test build #750871196](https://github.com/Ngone51/spark/actions/runs/750871196)** for PR 32180 at commit [`1d981c7`](https://github.com/Ngone51/spark/commit/1d981c7651054c5e05292b8f0d0e94d0cd50f518). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
AmplabJenkins commented on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820129357 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137387/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle
AmplabJenkins commented on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-820129361 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
AmplabJenkins commented on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820129359 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137393/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle
SparkQA commented on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-820128753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
SparkQA removed a comment on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820021274 **[Test build #137393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137393/testReport)** for PR 32139 at commit [`643418a`](https://github.com/apache/spark/commit/643418a77559d4747780d5176235afa99584053e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
SparkQA commented on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820127913 **[Test build #137393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137393/testReport)** for PR 32139 at commit [`643418a`](https://github.com/apache/spark/commit/643418a77559d4747780d5176235afa99584053e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32124: [SPARK-35024][ML][WIP] Refactor LinearSVC - support virtual centering
github-actions[bot] commented on pull request #32124: URL: https://github.com/apache/spark/pull/32124#issuecomment-820127625 **[Test build #750853487](https://github.com/zhengruifeng/spark/actions/runs/750853487)** for PR 32124 at commit [`5857a52`](https://github.com/zhengruifeng/spark/commit/5857a52f9cb9fe787b22dfdfd647e04b801224db). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on a change in pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
attilapiros commented on a change in pull request #32180: URL: https://github.com/apache/spark/pull/32180#discussion_r613765906 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -551,8 +551,10 @@ final class ShuffleBlockFetcherIterator( // Send out initial requests for blocks, up to our maxBytesInFlight fetchUpToMaxBytes() -val numFetches = remoteRequests.size - fetchRequests.size -logInfo(s"Started $numFetches remote fetches in ${Utils.getUsedTimeNs(startTimeNs)}") +val numDeferredRequest = deferredFetchRequests.values.map(_.size).sum +val numFetches = remoteRequests.size - fetchRequests.size - numDeferredRequest +logInfo(s"Started $numFetches remote fetches in ${Utils.getUsedTimeNs(startTimeNs)}" + + s"${if (numDeferredRequest > 0 ) s", deferred $numDeferredRequest requests" else "" }") Review comment: ```suggestion (if (numDeferredRequest > 0 ) s", deferred $numDeferredRequest requests" else "")) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple
dongjoon-hyun commented on pull request #32164: URL: https://github.com/apache/spark/pull/32164#issuecomment-820118853 Thanks, @sarutak and @HyukjinKwon . Merged to master/3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple
dongjoon-hyun closed pull request #32164: URL: https://github.com/apache/spark/pull/32164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
attilapiros commented on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820111567 @mridulm you mean using the `TaskCompletionListener`, right? As I see the code of the `MonitorThread`: one of its responsibility to handle task interruption: https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L582-L584 The code goes on what to do when the task is interrupted and not completed. But task interruption is not a completion you can see when it flagged to be interrupted no listener informed: https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/TaskContextImpl.scala#L149-L151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
SparkQA removed a comment on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820015616 **[Test build #137387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137387/testReport)** for PR 32180 at commit [`357c36c`](https://github.com/apache/spark/commit/357c36c593df5a08e9fecc200e163b3b26d1c5a1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
SparkQA commented on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820104323 **[Test build #137387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137387/testReport)** for PR 32180 at commit [`357c36c`](https://github.com/apache/spark/commit/357c36c593df5a08e9fecc200e163b3b26d1c5a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
Ngone51 commented on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820100868 @mridulm Thanks for the approval! cc @tgravescs @attilapiros for taking a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too
wangyum commented on pull request #32163: URL: https://github.com/apache/spark/pull/32163#issuecomment-820096175 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum closed pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too
wangyum closed pull request #32163: URL: https://github.com/apache/spark/pull/32163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32107: [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum`
HyukjinKwon commented on pull request #32107: URL: https://github.com/apache/spark/pull/32107#issuecomment-820095715 Just curious. Does other DBMSes support aggregation on such interval types? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints
dongjoon-hyun commented on a change in pull request #32147: URL: https://github.com/apache/spark/pull/32147#discussion_r613756126 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ## @@ -2844,6 +2844,25 @@ abstract class JsonSuite assert(readback.collect sameElements Array(Row(0), Row(1), Row(2))) } } + + test("Write Non-ASCII character as codepoint") { +// scalastyle:off nonascii +withTempPath { path => + val basePath = path.getCanonicalPath + Seq("a", "\n", "\u3042").toDF.write +.option("writeNonAsciiCharacterAsCodePoint", "true").json(s"$basePath") + val actualText = spark.read.text(s"$basePath") +.sort("value").map(_.getString(0)).collect().mkString + val expectedText = "{\"value\":\"\\n\"}{\"value\":\"\\u3042\"}{\"value\":\"a\"}" + assert(actualText === expectedText) + + val actualJson = spark.read.json(s"$basePath") +.sort("value").map(_.getString(0)).collect().mkString + val expectedJson = "\na\u3042" + assert(actualJson === expectedJson) +} +// scalastyle:on nonascii + } Review comment: It would be great if we can have a test coverage for setting both pretty and non-ascii-as-codepoint at the same time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints
dongjoon-hyun commented on a change in pull request #32147: URL: https://github.com/apache/spark/pull/32147#discussion_r613755840 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ## @@ -2844,6 +2844,25 @@ abstract class JsonSuite assert(readback.collect sameElements Array(Row(0), Row(1), Row(2))) } } + + test("Write Non-ASCII character as codepoint") { Review comment: If you don't mind, shall we have a JIRA ID prefix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints
dongjoon-hyun commented on a change in pull request #32147: URL: https://github.com/apache/spark/pull/32147#discussion_r613755701 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala ## @@ -73,7 +73,12 @@ private[sql] class JacksonGenerator( private val gen = { val generator = new JsonFactory().createGenerator(writer).setRootValueSeparator(null) -if (options.pretty) generator.useDefaultPrettyPrinter() else generator +val ppGenerator = if (options.pretty) generator.useDefaultPrettyPrinter() else generator +if (options.writeNonAsciiCharacterAsCodePoint) { + generator.setHighestNonEscapedChar(0x7F) Review comment: This code means that we cannot set both `options.pretty` and `options.writeNonAsciiCharacterAsCodePoint`. If this is true, shall we document it somewhere? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
mridulm commented on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820094334 @attilapiros I dont have much context about python runner; but curious if `MonitorThread` can follow the same pattern/lifecycle as `writerThread` in that method ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file
yaooqinn commented on pull request #32184: URL: https://github.com/apache/spark/pull/32184#issuecomment-820092327 looks nice, and good to have it tested per @dongjoon-hyun's suggestion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32147: [SPARK-35047][SQL] Allow Json datasources to write non-ascii characters as codepoints
dongjoon-hyun commented on a change in pull request #32147: URL: https://github.com/apache/spark/pull/32147#discussion_r613752661 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala ## @@ -135,6 +135,9 @@ private[sql] class JSONOptions( */ val inferTimestamp: Boolean = parameters.get("inferTimestamp").map(_.toBoolean).getOrElse(false) + val writeNonAsciiCharacterAsCodePoint: Boolean = Review comment: Shall we add some description? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState
dongjoon-hyun commented on pull request #32162: URL: https://github.com/apache/spark/pull/32162#issuecomment-820090451 Thank you, @sarutak . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState
dongjoon-hyun closed pull request #32162: URL: https://github.com/apache/spark/pull/32162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle
SparkQA commented on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-820090027 **[Test build #137398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137398/testReport)** for PR 30480 at commit [`9614a0c`](https://github.com/apache/spark/commit/9614a0c4124b521915923d34ea192bfde4eddcc2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
mridulm commented on a change in pull request #32180: URL: https://github.com/apache/spark/pull/32180#discussion_r613751403 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -551,8 +551,10 @@ final class ShuffleBlockFetcherIterator( // Send out initial requests for blocks, up to our maxBytesInFlight fetchUpToMaxBytes() -val numFetches = remoteRequests.size - fetchRequests.size -logInfo(s"Started $numFetches remote fetches in ${Utils.getUsedTimeNs(startTimeNs)}") +val numDeferredRequest = deferredFetchRequests.values.map(_.size).sum +val numFetches = remoteRequests.size - fetchRequests.size - numDeferredRequest +logInfo(s"Started $numFetches remote fetches in ${Utils.getUsedTimeNs(startTimeNs)}" + + s"${if (numDeferredRequest > 0 ) s", deferred $numDeferredRequest requests" else "" }") Review comment: There are no `deferredFetchRequests` when `initialize` is invoked - it will always be empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
SparkQA commented on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820089733 **[Test build #137397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137397/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple
AmplabJenkins removed a comment on pull request #32164: URL: https://github.com/apache/spark/pull/32164#issuecomment-820089388 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137383/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
AmplabJenkins removed a comment on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-820089390 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
AmplabJenkins removed a comment on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-820089389 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too
AmplabJenkins removed a comment on pull request #32163: URL: https://github.com/apache/spark/pull/32163#issuecomment-820089385 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137384/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
AmplabJenkins removed a comment on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820089384 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41969/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file
AmplabJenkins removed a comment on pull request #32184: URL: https://github.com/apache/spark/pull/32184#issuecomment-820089386 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41971/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32185: Branch 3.2
AmplabJenkins commented on pull request #32185: URL: https://github.com/apache/spark/pull/32185#issuecomment-820089563 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file
AmplabJenkins commented on pull request #32184: URL: https://github.com/apache/spark/pull/32184#issuecomment-820089386 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41971/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
AmplabJenkins commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-820089390 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
AmplabJenkins commented on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-820089389 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too
AmplabJenkins commented on pull request #32163: URL: https://github.com/apache/spark/pull/32163#issuecomment-820089385 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137384/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
AmplabJenkins commented on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820089384 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41969/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple
AmplabJenkins commented on pull request #32164: URL: https://github.com/apache/spark/pull/32164#issuecomment-820089388 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137383/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file
SparkQA commented on pull request #32184: URL: https://github.com/apache/spark/pull/32184#issuecomment-820089269 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41971/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32179: [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated
dongjoon-hyun commented on pull request #32179: URL: https://github.com/apache/spark/pull/32179#issuecomment-820088230 Could you check the relevant UT failures, @allisonwang-db ? ``` org.apache.spark.sql.AnalysisException [info] Correlated column is not allowed in a non-equality predicate: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
github-actions[bot] commented on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820087911 **[Test build #750728539](https://github.com/attilapiros/spark/actions/runs/750728539)** for PR 32169 at commit [`c4a5e2d`](https://github.com/attilapiros/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
SparkQA commented on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-820087686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool file
SparkQA commented on pull request #32184: URL: https://github.com/apache/spark/pull/32184#issuecomment-820087668 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41971/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle
mridulm commented on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-820087150 Looks good to me, thanks for the changes @venkata91 +CC @Ngone51, @tgravescs, @attilapiros for another pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too
SparkQA removed a comment on pull request #32163: URL: https://github.com/apache/spark/pull/32163#issuecomment-819957750 **[Test build #137384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137384/testReport)** for PR 32163 at commit [`4d88408`](https://github.com/apache/spark/commit/4d8840857288f4c71d02ac02a0f818249fb82caa). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32163: [SPARK-35086][SQL][CORE] --verbose should be passed to Spark SQL CLI too
SparkQA commented on pull request #32163: URL: https://github.com/apache/spark/pull/32163#issuecomment-820084653 **[Test build #137384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137384/testReport)** for PR 32163 at commit [`4d88408`](https://github.com/apache/spark/commit/4d8840857288f4c71d02ac02a0f818249fb82caa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #31935: [SPARK-34789][TEST] Introduce Jetty based construct for integration tests where HTTP server is used
dongjoon-hyun closed pull request #31935: URL: https://github.com/apache/spark/pull/31935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on pull request #32179: [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated
allisonwang-db commented on pull request #32179: URL: https://github.com/apache/spark/pull/32179#issuecomment-820083789 > BTW, is it necessary to be a subquery with aggregation? From the fix, I cannot tell how aggregation affects it. SPARK-17348 provides some explanations on why aggregate is causing the issue. Basically, when a correlated predicate is pulled up, all attributes from the inner query will be added as GROUP BY columns. When the mapping is not one-to-one, for instance `a + b = outer(c)` in the example above, both `a` and `b` will be added as group by columns, and the `count(*)` will count the number rows for each combination of (a, b), instead of (a + b). https://github.com/apache/spark/blob/7ff9d2e3eec514962e891420dbb3961e85826612/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L878-L901 Pull up correlated predicates through Aggregate: https://github.com/apache/spark/blob/3e218ade9cf6becc5de8b20a4385e345021a509d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala#L258-L264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ChenDou2021 opened a new pull request #32185: Branch 3.2
ChenDou2021 opened a new pull request #32185: URL: https://github.com/apache/spark/pull/32185 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Simplifying Python Code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context
HyukjinKwon commented on pull request #32169: URL: https://github.com/apache/spark/pull/32169#issuecomment-820083032 @attilapiros, there was a bit of unexpected infra issue for GA build. Can you sync/replace to the latest master branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple
SparkQA removed a comment on pull request #32164: URL: https://github.com/apache/spark/pull/32164#issuecomment-819957745 **[Test build #137383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137383/testReport)** for PR 32164 at commit [`1446667`](https://github.com/apache/spark/commit/1446667bee96a9d3d1b0e642a226efe5ee67ee21). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32164: [SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolveURI to make the way to get URI simple
SparkQA commented on pull request #32164: URL: https://github.com/apache/spark/pull/32164#issuecomment-820082684 **[Test build #137383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137383/testReport)** for PR 32164 at commit [`1446667`](https://github.com/apache/spark/commit/1446667bee96a9d3d1b0e642a226efe5ee67ee21). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class VectorizedBLAS extends F2jBLAS ` * `trait AnalysisOnlyCommand extends Command ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #32154: [SPARK-34995] Port/integrate Koalas remaining codes into PySpark
itholic commented on pull request #32154: URL: https://github.com/apache/spark/pull/32154#issuecomment-820082557 Yeah, it includes the all changes after the porting in the main code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
SparkQA commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-820081492 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
HyukjinKwon commented on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820081307 Hm, one test failed: ``` == 2021-04-15T03:54:15.0802269Z ERROR [61.702s]: test_monotonic (pyspark.pandas.tests.indexes.test_base.IndexesTest) 2021-04-15T03:54:15.0803808Z -- 2021-04-15T03:54:15.0804406Z Traceback (most recent call last): 2021-04-15T03:54:15.0835123Z File "/__w/spark/spark/python/pyspark/pandas/tests/indexes/test_base.py", line 1280, in test_monotonic 2021-04-15T03:54:15.0836254Z self.assert_eq(kmidx.is_monotonic_increasing, pmidx.is_monotonic_increasing) 2021-04-15T03:54:15.0837366Z File "/__w/spark/spark/python/pyspark/pandas/testing/utils.py", line 293, in assert_eq 2021-04-15T03:54:15.0838145Z self.assertEqual(lobj, robj) 2021-04-15T03:54:15.0838748Z AssertionError: False != True 2021-04-15T03:54:15.0839117Z 2021-04-15T03:54:15.0840085Z -- 2021-04-15T03:54:15.0840665Z Ran 76 tests in 290.024s 2021-04-15T03:54:15.0840965Z 2021-04-15T03:54:15.0841355Z FAILED (errors=1) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32179: [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated
allisonwang-db commented on a change in pull request #32179: URL: https://github.com/apache/spark/pull/32179#discussion_r613741888 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -950,9 +950,15 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { case f: Filter => val (correlated, _) = splitConjunctivePredicates(f.condition).partition(containsOuter) -// Find any non-equality correlated predicates +// Find any non-equality correlated predicates and equality predicates that do not +// guarantee one-on-one mapping between inner and outer attributes. E.G: +// a = outer(c) -> true +// a > outer(c) -> false +// a + b = outer(c) -> false (because there can be multiple combinations of a, b that +// satisfy the condition) foundNonEqualCorrelatedPred = foundNonEqualCorrelatedPred || correlated.exists { - case _: EqualTo | _: EqualNullSafe => false + case Equality(_: Attribute, b) => b.find(_.isInstanceOf[Attribute]).isDefined Review comment: Will do! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32139: [SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
SparkQA commented on pull request #32139: URL: https://github.com/apache/spark/pull/32139#issuecomment-820077352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #30480: [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle
mridulm commented on pull request #30480: URL: https://github.com/apache/spark/pull/30480#issuecomment-820074290 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] exmy commented on pull request #32088: [SPARK-34987][SQL] AQE improve: change shuffle hash join to sort merg…
exmy commented on pull request #32088: URL: https://github.com/apache/spark/pull/32088#issuecomment-820071294 @cloud-fan could you help take a look and provide some suggestions? thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils
SparkQA commented on pull request #32177: URL: https://github.com/apache/spark/pull/32177#issuecomment-820067610 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41970/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils
AmplabJenkins commented on pull request #32177: URL: https://github.com/apache/spark/pull/32177#issuecomment-820067686 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41970/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
SparkQA removed a comment on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-820036980 **[Test build #137396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137396/testReport)** for PR 31010 at commit [`1082710`](https://github.com/apache/spark/commit/108271015b84abb5bfda37b9546bfe3b138f43a9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
AmplabJenkins removed a comment on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-820063786 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137396/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
AmplabJenkins commented on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-820063786 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137396/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32154: [SPARK-34995] Port/integrate Koalas remaining codes into PySpark
HyukjinKwon commented on pull request #32154: URL: https://github.com/apache/spark/pull/32154#issuecomment-820063668 Are they all changes to port? Might be good to confirm with @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31010: [SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
SparkQA commented on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-820062892 **[Test build #137396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137396/testReport)** for PR 31010 at commit [`1082710`](https://github.com/apache/spark/commit/108271015b84abb5bfda37b9546bfe3b138f43a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils
SparkQA removed a comment on pull request #32177: URL: https://github.com/apache/spark/pull/32177#issuecomment-820035283 **[Test build #137395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137395/testReport)** for PR 32177 at commit [`de6cb1e`](https://github.com/apache/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils
AmplabJenkins removed a comment on pull request #32177: URL: https://github.com/apache/spark/pull/32177#issuecomment-820061103 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137395/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils
AmplabJenkins commented on pull request #32177: URL: https://github.com/apache/spark/pull/32177#issuecomment-820061103 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137395/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils
SparkQA commented on pull request #32177: URL: https://github.com/apache/spark/pull/32177#issuecomment-820060610 **[Test build #137395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137395/testReport)** for PR 32177 at commit [`de6cb1e`](https://github.com/apache/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32090: [SPARK-34212][SQL][FOLLOWUP] Move the added test to ParquetQuerySuite
cloud-fan commented on a change in pull request #32090: URL: https://github.com/apache/spark/pull/32090#discussion_r613738091 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala ## @@ -840,6 +840,67 @@ abstract class ParquetQuerySuite extends QueryTest with ParquetTest with SharedS testMigration(fromTsType = "INT96", toTsType = "TIMESTAMP_MICROS") testMigration(fromTsType = "TIMESTAMP_MICROS", toTsType = "INT96") } + + test("SPARK-34212 Parquet should read decimals correctly") { +def readParquet(schema: String, path: File): DataFrame = { + spark.read.schema(schema).parquet(path.toString) +} + +withTempPath { path => + // a is int-decimal (4 bytes), b is long-decimal (8 bytes), c is binary-decimal (16 bytes) + val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS DECIMAL(36, 2)) c") + df.write.parquet(path.toString) + + withAllParquetReaders { +// We can read the decimal parquet field with a larger precision, if scale is the same. +val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)" +checkAnswer(readParquet(schema, path), df) + } + + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") { +val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)" +checkAnswer(readParquet(schema1, path), df) +val schema2 = "a DECIMAL(3, 0), b DECIMAL(18, 1), c DECIMAL(37, 1)" +checkAnswer(readParquet(schema2, path), Row(1, 1.2, 1.2)) + } + + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true") { +Seq("a DECIMAL(3, 2)", "b DECIMAL(18, 1)", "c DECIMAL(37, 1)").foreach { schema => + val e = intercept[SparkException] { +readParquet(schema, path).collect() + }.getCause.getCause + assert(e.isInstanceOf[SchemaColumnConvertNotSupportedException]) +} + } +} + +// tests for parquet types without decimal metadata. Review comment: @viirya looking at the test, I think it was decided before that reading plain int/long as decimal is hard to implement in vectorized reader. Basically we need to do 2 steps: 1. read the decimal from int/long as its actual precision/scale. Since it's a plain int/long, the precision should be max precision for int/long. 2. cast the decimal to the required precision/scale. For vectorized reader, we can create a `Decimal` object with max precision for int/long, do the cast, and set the int/long to the vector if there is no overflow. This is super slow, but is still doable. It's not a real regression, as @wangyum demonstrated before, the previous behavior in 2.4 was not reasonable when overflow happens. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32166: [SPARK-35071][PYTHON] Rename Koalas to pandas-on-Spark in main codes
HyukjinKwon closed pull request #32166: URL: https://github.com/apache/spark/pull/32166 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32166: [SPARK-35071][PYTHON] Rename Koalas to pandas-on-Spark in main codes
HyukjinKwon commented on pull request #32166: URL: https://github.com/apache/spark/pull/32166#issuecomment-820049121 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32182: [SPARK-35082][INFRA] Use permissive and squshed merge when syncing to the latest branch in GitHub Actions testing
yaooqinn commented on pull request #32182: URL: https://github.com/apache/spark/pull/32182#issuecomment-820047934 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shaneknapp commented on a change in pull request #32178: [DONOTMERGE] initial commit for skeleton ansible for jenkins worker config
shaneknapp commented on a change in pull request #32178: URL: https://github.com/apache/spark/pull/32178#discussion_r613737082 ## File path: dev/ansible-for-test-node/README.md ## @@ -0,0 +1,25 @@ +# jenkins-infra + +This is a rough skeleton of the ansible used to deploy RISELab/Apache Spark Jenkins build workers on Ubuntu 20LTS. + +WARNING: this will not work "directly out of the box" and will need to be tweaked to work on any ubuntu servers you might want to try this on. + +### deploy a new worker node + TL;DR: +all of the configs for the workers live in roles/common/... and roles/jenkins-worker... + + prereqs: +* fresh install of ubuntu 20 +* a service account w/sudo +* python 3, ansible, ansible-playbook installed locally +* add hostname(s) to the `hosts` file +* add this to your `~/.ansible.cfg`: +```[defaults] host_key_checking = False``` + + fire ansible cannon! +`ansible-playbook -u deploy-jenkins-worker.yml -i -k -b -K` + +tips: +* if you are installing more than a few workers, it's best to run the playbook on smaller (2-3) batches at a time. this way it's easier to track down errors, as ansible is very noisy. +* when you encounter an error, you should comment out any previously-run plays and tasks. this saves time when debugging, and let's you easily track where you are in the process. +* `apt-get remove ` and `apt-get purge ` are your friends Review comment: TODO: explain more the scope of the ansible in the bigger picture of the build system -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Yikun commented on pull request #32174: [SPARK-35048][INFRA] Only trigger the notify test workflow in upstream
Yikun commented on pull request #32174: URL: https://github.com/apache/spark/pull/32174#issuecomment-820044810 Yep -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState
AmplabJenkins commented on pull request #32162: URL: https://github.com/apache/spark/pull/32162#issuecomment-820043597 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137385/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState
AmplabJenkins removed a comment on pull request #32162: URL: https://github.com/apache/spark/pull/32162#issuecomment-820043597 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137385/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Yikun commented on pull request #32182: [SPARK-35082][INFRA] Use permissive and squshed merge when syncing to the latest branch in GitHub Actions testing
Yikun commented on pull request #32182: URL: https://github.com/apache/spark/pull/32182#issuecomment-820043537 Thanks! It makes me feel the active and powerful of our community. : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32136: [WIP][SPARK-35022][CORE] Task Scheduling Plugin in Spark
viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-820040504 > Yes, that's true. But normally, even in the case of offering a single resource that released from a single task, it seems it's less possible to schedule tasks unevenly unless the resources are really scarce. As I saw in previous tests, it is by chance to have all tasks are evenly distributed to all executors. Sometimes it is, but sometimes only partial executors are scheduled at the first batch. > Do you have logs related to the scheduling? I'd like to see how it happens. I don't have logs now. It is a general and simple SS job reading from Kafka. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState
SparkQA removed a comment on pull request #32162: URL: https://github.com/apache/spark/pull/32162#issuecomment-819957837 **[Test build #137385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137385/testReport)** for PR 32162 at commit [`badeb87`](https://github.com/apache/spark/commit/badeb87a363d68ac92349c9edfa9c921fd37d750). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
SparkQA commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-820040127 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32162: [MINOR][SQL] Refactor the comments in HiveClientImpl.withHiveState
SparkQA commented on pull request #32162: URL: https://github.com/apache/spark/pull/32162#issuecomment-820040096 **[Test build #137385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137385/testReport)** for PR 32162 at commit [`badeb87`](https://github.com/apache/spark/commit/badeb87a363d68ac92349c9edfa9c921fd37d750). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class VectorizedBLAS extends F2jBLAS ` * `trait AnalysisOnlyCommand extends Command ` * ` implicit class MetadataColumnHelper(attr: Attribute) ` * `case class WriteToDataSourceV2(` * `case class WriteToMicroBatchDataSource(` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32178: [DONOTMERGE] initial commit for skeleton ansible for jenkins worker config
AmplabJenkins removed a comment on pull request #32178: URL: https://github.com/apache/spark/pull/32178#issuecomment-820038308 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137381/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32180: [MINOR][CORE] Correct the number of started fetch requests in log
AmplabJenkins removed a comment on pull request #32180: URL: https://github.com/apache/spark/pull/32180#issuecomment-820037077 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41964/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org