[GitHub] [spark] AmplabJenkins removed a comment on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
AmplabJenkins removed a comment on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851202485 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139089/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
dongjoon-hyun closed pull request #32706: URL: https://github.com/apache/spark/pull/32706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
dongjoon-hyun edited a comment on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851202576 This PR affects only GitHub Action. - At the first commits, all test passed except SparkR job. - At the second commit, we recover only SparkR job and SparkR passed already (https://github.com/dongjoon-hyun/spark/runs/2707469445?check_suite_focus=true). I'll merge this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
dongjoon-hyun commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851202576 This PR affects only GitHub Action. - At the first commits, all test passed except SparkR job. - At the second commit, we recover only SparkR job and it passed already (https://github.com/dongjoon-hyun/spark/runs/2707469445?check_suite_focus=true). I'll merge this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
AmplabJenkins commented on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851202485 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139089/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
SparkQA removed a comment on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851112957 **[Test build #139089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139089/testReport)** for PR 32558 at commit [`29a5e43`](https://github.com/apache/spark/commit/29a5e4331c4fa301a484e52206174663f97a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
SparkQA commented on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851201158 **[Test build #139089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139089/testReport)** for PR 32558 at commit [`29a5e43`](https://github.com/apache/spark/commit/29a5e4331c4fa301a484e52206174663f97a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
SparkQA commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851199336 **[Test build #139097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139097/testReport)** for PR 32693 at commit [`eb22ac9`](https://github.com/apache/spark/commit/eb22ac95330404325c245602c3efde9dbe2272b4). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851198587 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43617/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum edited a comment on pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema
wangyum edited a comment on pull request #32675: URL: https://github.com/apache/spark/pull/32675#issuecomment-851195220 @cloud-fan I have added the stacktrace to PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema
wangyum commented on pull request #32675: URL: https://github.com/apache/spark/pull/32675#issuecomment-851195220 @cloud-fan I hive add stacktrace to PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
SparkQA commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851191449 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43616/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] pan3793 commented on pull request #32395: [SPARK-35270][SQL][CORE] Remove the use of guava in order to upgrade guava version to 27
pan3793 commented on pull request #32395: URL: https://github.com/apache/spark/pull/32395#issuecomment-851187049 Seems `spark-core` already shaded guava, and for Hadoop 3.2, since spark already moved to Hadoop Shaded Client, I only see Curator depends on guava, from https://cwiki.apache.org/confluence/display/CURATOR/TN13 , I think it's ok to bundle a high version of guava in Spark hadoop-3.2 binary dist? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
otterc commented on a change in pull request #32007: URL: https://github.com/apache/spark/pull/32007#discussion_r642215418 ## File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ## @@ -153,6 +198,60 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea } } + /** + * Get the list of configured local dirs storing merged shuffle blocks created by executors + * if push based shuffle is enabled. Note that the files in this directory will be created + * by the external shuffle services. We only create the merge_manager directories and + * subdirectories here because currently the shuffle service doesn't have permission to + * create directories under application local directories. + */ + private def createLocalDirsForMergedShuffleBlocks(conf: SparkConf): Array[File] = { Review comment: @zhouyejoe The earlier comment must be because the PR didn't have latest code and it was needed for initializing `activeMergedShuffleDirs`. There is no need for `activeMergedShuffleDirs`. As mentioned in the other comment it is not being used anywhere in `DiskBlockManager`. The dirs are being passed by the methods so why does this need to return the files. cc @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
shahidki31 commented on pull request #32704: URL: https://github.com/apache/spark/pull/32704#issuecomment-851181036 @cloud-fan Yes, `collectWithSubqueries` will include nested subqueries as well. ![image](https://user-images.githubusercontent.com/23054875/120143148-31b89880-c1fd-11eb-9ca0-871255b59b68.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
cloud-fan commented on pull request #32704: URL: https://github.com/apache/spark/pull/32704#issuecomment-851178214 Does this fix nested subqueries? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema
cloud-fan commented on pull request #32675: URL: https://github.com/apache/spark/pull/32675#issuecomment-851177143 Can you post the full stacktrace? I'm a bit curious about how/where the error happens. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
AmplabJenkins removed a comment on pull request #32707: URL: https://github.com/apache/spark/pull/32707#issuecomment-851175379 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43612/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
SparkQA commented on pull request #32707: URL: https://github.com/apache/spark/pull/32707#issuecomment-851175353 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43612/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
AmplabJenkins commented on pull request #32707: URL: https://github.com/apache/spark/pull/32707#issuecomment-851175379 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43612/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32705: [SPARK-35568][SQL] Avoid UnsupportedOperationException when enabling both AQE and DPP
AmplabJenkins removed a comment on pull request #32705: URL: https://github.com/apache/spark/pull/32705#issuecomment-851172007 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139088/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
AmplabJenkins commented on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851172026 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32705: [SPARK-35568][SQL] Avoid UnsupportedOperationException when enabling both AQE and DPP
AmplabJenkins commented on pull request #32705: URL: https://github.com/apache/spark/pull/32705#issuecomment-851172007 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139088/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32687: [SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions
cloud-fan closed pull request #32687: URL: https://github.com/apache/spark/pull/32687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32687: [SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions
cloud-fan commented on pull request #32687: URL: https://github.com/apache/spark/pull/32687#issuecomment-851171920 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32705: [SPARK-35568][SQL] Avoid UnsupportedOperationException when enabling both AQE and DPP
SparkQA removed a comment on pull request #32705: URL: https://github.com/apache/spark/pull/32705#issuecomment-851098166 **[Test build #139088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139088/testReport)** for PR 32705 at commit [`932edd7`](https://github.com/apache/spark/commit/932edd7808ba8ae9220658eff37c9c3af77eb09f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
AmplabJenkins removed a comment on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851171260 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43615/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
SparkQA commented on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851171248 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43615/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
AmplabJenkins commented on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851171260 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43615/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32705: [SPARK-35568][SQL] Avoid UnsupportedOperationException when enabling both AQE and DPP
SparkQA commented on pull request #32705: URL: https://github.com/apache/spark/pull/32705#issuecomment-851171058 **[Test build #139088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139088/testReport)** for PR 32705 at commit [`932edd7`](https://github.com/apache/spark/commit/932edd7808ba8ae9220658eff37c9c3af77eb09f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851169203 **[Test build #139096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139096/testReport)** for PR 32686 at commit [`0542922`](https://github.com/apache/spark/commit/0542922a77d660af1797c0a6f0840d77d87c059a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
SparkQA commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851169175 **[Test build #139095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139095/testReport)** for PR 32706 at commit [`53adc9e`](https://github.com/apache/spark/commit/53adc9ef6befb092a812449a1949837d320f927c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins removed a comment on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851168498 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43614/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
AmplabJenkins removed a comment on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851168497 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43613/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851168498 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43614/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
AmplabJenkins commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851168497 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43613/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
dongjoon-hyun commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851168280 Thank you again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
viirya commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851167925 Thanks @dongjoon-hyun. It sounds good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851166355 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43614/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851165704 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43614/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
SparkQA commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851164672 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43613/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action
SparkQA commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851163828 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43613/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
SparkQA commented on pull request #32707: URL: https://github.com/apache/spark/pull/32707#issuecomment-851163291 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43612/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
dongjoon-hyun closed pull request #32707: URL: https://github.com/apache/spark/pull/32707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
dongjoon-hyun commented on pull request #32707: URL: https://github.com/apache/spark/pull/32707#issuecomment-851161273 I manually verified this with the following. **BEFORE** ``` $ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12" Using `mvn` from path: /usr/local/bin/mvn 2.12.10 ``` **AFTER** ``` $ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12" Using `mvn` from path: /usr/local/bin/mvn 2.12.14 ``` Merged to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 and upgrade R to 4.1.0 in the docker image for GitHub Action
dongjoon-hyun commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851159207 Thank you, @viirya . SparkR linter and doc works correctly, but it seems that SparkR has some UT issues with the latest `arrow`. I'll narrow down the scope of this PR by excluding the image update of SparkR GitHub Action job. ``` 2. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow optimi collect(createDataFrame(rdf)) not equal to `expected`. Component “g”: 'tzone' attributes are inconsistent ('UTC' and '') ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
dongjoon-hyun commented on pull request #32707: URL: https://github.com/apache/spark/pull/32707#issuecomment-851156588 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
AmplabJenkins removed a comment on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851153739 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139094/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
SparkQA removed a comment on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851152459 **[Test build #139094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139094/testReport)** for PR 32701 at commit [`e067bdb`](https://github.com/apache/spark/commit/e067bdb6fee7dceb4299917c9ef76af74e20720e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
SparkQA commented on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851153717 **[Test build #139094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139094/testReport)** for PR 32701 at commit [`e067bdb`](https://github.com/apache/spark/commit/e067bdb6fee7dceb4299917c9ef76af74e20720e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
AmplabJenkins commented on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851153739 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139094/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
SparkQA commented on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851152459 **[Test build #139094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139094/testReport)** for PR 32701 at commit [`e067bdb`](https://github.com/apache/spark/commit/e067bdb6fee7dceb4299917c9ef76af74e20720e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
AmplabJenkins removed a comment on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-850782008 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
yaooqinn commented on pull request #32701: URL: https://github.com/apache/spark/pull/32701#issuecomment-851152150 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851151414 **[Test build #139093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139093/testReport)** for PR 32658 at commit [`3bda8db`](https://github.com/apache/spark/commit/3bda8db8e50d6550089b1cb1770d6cfe078bcaf8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huskysun commented on a change in pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
huskysun commented on a change in pull request #32701: URL: https://github.com/apache/spark/pull/32701#discussion_r642193027 ## File path: docs/submitting-applications.md ## @@ -146,7 +146,7 @@ export HADOOP_CONF_DIR=XXX ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master k8s://xx.yy.zz.ww:443 \ - --deploy-mode cluster \ + --deploy-mode cluster \ # can be client for client mode Review comment: @yaooqinn @dongjoon-hyun Thanks for the review. Yeah I should've changed L145 as well, from `# Run on a Kubernetes cluster in cluster deploy mode` to `# Run on a Kubernetes cluster`. I was trying to mimicking L117. However I won't do that anymore, because: > you cannot add # after \ in bash You're right, `#` can't come after `\`. Then, L122 should also be fixed. I will revert this line, and also fix L122. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
itholic commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851150735 > @itholic the generated doc looks a bit weird: > > ![Screen Shot 2021-05-28 at 1 09 38 PM](https://user-images.githubusercontent.com/6477701/119928166-04c67480-bfb6-11eb-8449-428b01f2144a.png) > > It includes `# noqa` > > Can you double check and fix? Seems like we should fix other places for JSON, etc. Thanks, @HyukjinKwon . Just fixed them in every place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32703: [SPARK-35566][SS] Fix StateStoreRestoreExec output rows
AmplabJenkins removed a comment on pull request #32703: URL: https://github.com/apache/spark/pull/32703#issuecomment-851150536 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139087/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32703: [SPARK-35566][SS] Fix StateStoreRestoreExec output rows
AmplabJenkins commented on pull request #32703: URL: https://github.com/apache/spark/pull/32703#issuecomment-851150536 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139087/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
itholic commented on a change in pull request #32658: URL: https://github.com/apache/spark/pull/32658#discussion_r642194447 ## File path: docs/sql-data-sources-csv.md ## @@ -38,3 +36,217 @@ Spark SQL provides `spark.read().csv("file_name")` to read a file or directory o + +## Data Source Option + +Data source options of CSV can be set via: +* the `.option`/`.options` methods of + * `DataFrameReader` + * `DataFrameWriter` + * `DataStreamReader` + * `DataStreamWriter` +* the built-in functions below + * `from_csv` + * `to_csv` + * `schema_of_csv` +* `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) + + + + Property NameDefaultMeaningScope + +sep +, +Sets a separator (one or more characters) for each field and value. +read/write + + +encoding +UTF-8 for reading, not set for writing +Specifies encoding (charset) for reading or writing CSV files +read/write + + +quote +" +Sets a single character used for escaping quoted values where the separator can be part of the value. If you would like to turn off quotations, you need to set an empty string. If an empty string is set, it uses u (null character) for wirting, and it disables the quotation handling for reading. +read/write + + +quoteAll +false +A flag indicating whether all values should always be enclosed in quotes. It only escapes values containing a quote character by default. +write + + +escape +\ +Sets a single character used for escaping quotes inside an already quoted value. +read/write + + +escapeQuotes +true +A flag indicating whether values containing quotes should always be enclosed in quotes. It escapes all values containing a quote character by default. +write + + +comment +empty string +Sets a single character used for skipping lines beginning with this character. It's disabled by default +read + + +header +false +For reading, uses the first line as names of columns. For writing, writes the names of columns as the first line. Note that if the given path is a RDD of Strings, this header option will remove all lines same with the header if exists. +read/write + + +inferSchema +false +Infers the input schema automatically from data. It requires one extra pass over the data. +read + + +enforceSchema +true +If it is set to true, the specified or inferred schema will be forcibly applied to datasource files, and headers in CSV files will be ignored. If the option is set to false, the schema will be validated against all headers in CSV files or the first header in RDD if the header option is set to true. Field names in the schema and column names in CSV headers are checked by their positions taking into account spark.sql.caseSensitive. Though the default value is true, it is recommended to disable the enforceSchema option to avoid incorrect results. +read + + +ignoreLeadingWhiteSpace +false (for reading), true (for writing) +A flag indicating whether or not leading whitespaces from values being read/written should be skipped. +read/write + + +ignoreTrailingWhiteSpace +false (for reading), true (for writing) +A flag indicating whether or not trailing whitespaces from values being read/written should be skipped. +read/write + + +nullValue +empty string +Sets the string representation of a null value. Since 2.0.1, this nullValue param applies to all supported types including the string type. +read/write + + +nanValue +NaN +Sets the string representation of a non-number value. +read + + +positiveInf +Inf +Sets the string representation of a positive infinity value. +read + + +negativeInf +-Inf +Sets the string representation of a negative infinity value. +read + + +dateFormat +-MM-dd +Sets the string that indicates a date format. Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html;> Datetime Patterns. This applies to date type. +read/write + + +timestampFormat +-MM-dd'T'HH:mm:ss[.SSS][XXX] +Sets the string that indicates a timestamp format. Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html;>Datetime Patterns. This applies to timestamp type. +read/write + + +maxColumns +20480 +Defines a hard limit of how many columns a record can have. +read + + +maxCharsPerColumn +-1 +Defines the maximum number of characters allowed for any given value being read. The default value -1 means unlimited length. +read + + +mode +PERMISSIVE +Allows a mode for dealing with corrupt records during parsing. Note that Spark
[GitHub] [spark] SparkQA removed a comment on pull request #32703: [SPARK-35566][SS] Fix StateStoreRestoreExec output rows
SparkQA removed a comment on pull request #32703: URL: https://github.com/apache/spark/pull/32703#issuecomment-851081590 **[Test build #139087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139087/testReport)** for PR 32703 at commit [`a8135d8`](https://github.com/apache/spark/commit/a8135d85a46e48715ce60d8f5e4ac5f4dbf26b36). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32703: [SPARK-35566][SS] Fix StateStoreRestoreExec output rows
SparkQA commented on pull request #32703: URL: https://github.com/apache/spark/pull/32703#issuecomment-851150023 **[Test build #139087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139087/testReport)** for PR 32703 at commit [`a8135d8`](https://github.com/apache/spark/commit/a8135d85a46e48715ce60d8f5e4ac5f4dbf26b36). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
AmplabJenkins removed a comment on pull request #32704: URL: https://github.com/apache/spark/pull/32704#issuecomment-851149791 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139086/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
AmplabJenkins commented on pull request #32704: URL: https://github.com/apache/spark/pull/32704#issuecomment-851149791 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139086/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
SparkQA removed a comment on pull request #32704: URL: https://github.com/apache/spark/pull/32704#issuecomment-851081580 **[Test build #139086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139086/testReport)** for PR 32704 at commit [`f1eced2`](https://github.com/apache/spark/commit/f1eced2782ec742d2dd04a122f15a4bf47ef237d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32706: [SPARK-35507][INFRA] Add Python 3.9 and upgrade R to 4.1.0 in the docker image for GitHub Action
SparkQA commented on pull request #32706: URL: https://github.com/apache/spark/pull/32706#issuecomment-851149438 **[Test build #139092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139092/testReport)** for PR 32706 at commit [`4a5fa6a`](https://github.com/apache/spark/commit/4a5fa6a1901810df5163c4e026b15569dda7177c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
SparkQA commented on pull request #32707: URL: https://github.com/apache/spark/pull/32707#issuecomment-851149411 **[Test build #139091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139091/testReport)** for PR 32707 at commit [`53c55de`](https://github.com/apache/spark/commit/53c55de603ad55b3cdd4aeeac445a35aee68d34a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
SparkQA commented on pull request #32704: URL: https://github.com/apache/spark/pull/32704#issuecomment-851149228 **[Test build #139086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139086/testReport)** for PR 32704 at commit [`f1eced2`](https://github.com/apache/spark/commit/f1eced2782ec742d2dd04a122f15a4bf47ef237d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huskysun commented on a change in pull request #32701: [SPARK-35562][DOC] Fix docs about Kubernetes
huskysun commented on a change in pull request #32701: URL: https://github.com/apache/spark/pull/32701#discussion_r642193027 ## File path: docs/submitting-applications.md ## @@ -146,7 +146,7 @@ export HADOOP_CONF_DIR=XXX ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master k8s://xx.yy.zz.ww:443 \ - --deploy-mode cluster \ + --deploy-mode cluster \ # can be client for client mode Review comment: @yaooqinn @dongjoon-hyun Thanks for the review. Yeah I should've changed L145 as well, from `# Run on a Kubernetes cluster in cluster deploy mode` to `# Run on a Kubernetes cluster`. I was trying to mimicking L117. However I won't do that anymore, because: > you cannot add # after \ in bash You're right, `#` can come after `\`. Then, L122 should also be fixed. I will revert this line, and also fix L122. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
AmplabJenkins removed a comment on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851132964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
AmplabJenkins commented on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851148815 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139090/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
SparkQA removed a comment on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851113186 **[Test build #139090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139090/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
SparkQA commented on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851148106 **[Test build #139090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139090/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #32707: [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
dongjoon-hyun opened a new pull request #32707: URL: https://github.com/apache/spark/pull/32707 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32697: [SPARK-31168][BUILD] Upgrade Scala to 2.12.14
dongjoon-hyun commented on a change in pull request #32697: URL: https://github.com/apache/spark/pull/32697#discussion_r642190770 ## File path: pom.xml ## @@ -162,7 +162,7 @@ 3.4.1 3.2.2 -2.12.10 +2.12.14 Review comment: Oh, thanks. I missed there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32696: [SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2.13 on GA
cloud-fan commented on pull request #32696: URL: https://github.com/apache/spark/pull/32696#issuecomment-851140959 thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #32706: [SPARK-35507][INFRA] Move Python 3.9 to the docker image
dongjoon-hyun opened a new pull request #32706: URL: https://github.com/apache/spark/pull/32706 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
AmplabJenkins commented on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851132964 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #32468: [SPARK-35335][SQL] Improve CoalesceShufflePartitions to avoid generating small files
AngersZh commented on a change in pull request #32468: URL: https://github.com/apache/spark/pull/32468#discussion_r642180154 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala ## @@ -83,10 +84,16 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl // is not set, so to avoid perf regressions compared to no coalescing. val minPartitionNum = conf.getConf(SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM) .getOrElse(session.sparkContext.defaultParallelism) +val minNumPartitions = if (isFinalStage) { Review comment: minFinalStagePartitionNum? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
AmplabJenkins removed a comment on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851113533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
AmplabJenkins removed a comment on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851129415 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43610/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
AmplabJenkins commented on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851129415 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43610/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
AmplabJenkins commented on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851129414 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43611/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zzcclp commented on a change in pull request #32697: [SPARK-31168][BUILD] Upgrade Scala to 2.12.14
zzcclp commented on a change in pull request #32697: URL: https://github.com/apache/spark/pull/32697#discussion_r642178486 ## File path: pom.xml ## @@ -162,7 +162,7 @@ 3.4.1 3.2.2 -2.12.10 +2.12.14 Review comment: @dongjoon-hyun please modify the scala version to 2.12.14 in profile `scala-2.12`, otherwise there maybe a `Error:scala: bad option: -P:silencer:globalFilters=.*deprecated.*` error when tick the profile `scala-2.12`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
SparkQA commented on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851126950 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43610/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum edited a comment on pull request #28032: [SPARK-31264][SQL] Repartition by dynamic partition columns before insert partition table
wangyum edited a comment on pull request #28032: URL: https://github.com/apache/spark/pull/28032#issuecomment-851119698 @HyukjinKwon I mainly want to make the whole cluster more stable. If a user does not add it manually, a large number of files may be generated. For example: Suppose there are 1 tasks, each task contains 500 separate partition values, and the number of files generated is 1 * 500. This pr mainly to avoid this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
SparkQA commented on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851126484 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43611/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
SparkQA commented on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851126356 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43610/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32676: [SPARK-35532][TESTS] Ensure mllib and kafka-0-10 module can be maven test independently in Scala 2.13
LuciferYang commented on pull request #32676: URL: https://github.com/apache/spark/pull/32676#issuecomment-851126312 thx @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
SparkQA commented on pull request #31565: URL: https://github.com/apache/spark/pull/31565#issuecomment-851126220 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43611/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang edited a comment on pull request #32669: [SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is deprecated` compilation warning in Scala 2.13
LuciferYang edited a comment on pull request #32669: URL: https://github.com/apache/spark/pull/32669#issuecomment-851126030 > +1 for the idea, @LuciferYang . Ok, I will give a new pr do this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32669: [SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is deprecated` compilation warning in Scala 2.13
LuciferYang commented on pull request #32669: URL: https://github.com/apache/spark/pull/32669#issuecomment-851126030 > +1 for the idea, @LuciferYang . Ok, I will give a new pr do this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
LuciferYang commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-851125841 thx all ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #32694: [SPARK-35059][SQL] Group exception messages in hive/execution
beliefer commented on pull request #32694: URL: https://github.com/apache/spark/pull/32694#issuecomment-851124316 ping @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
huaxingao commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-851121633 @cloud-fan @maropu I addressed the comments. Could you please take another look? Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
HyukjinKwon commented on a change in pull request #32704: URL: https://github.com/apache/spark/pull/32704#discussion_r642172184 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ## @@ -256,13 +255,9 @@ class QueryExecution( // trigger to compute stats for logical plans try { - optimizedPlan.foreach(_.expressions.foreach(_.foreach { -case subqueryExpression: SubqueryExpression => - // trigger subquery's child plan stats propagation - subqueryExpression.plan.stats -case _ => - })) - optimizedPlan.stats + optimizedPlan.collectWithSubqueries { Review comment: cc @cloud-fan @maryannxue FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #28032: [SPARK-31264][SQL] Repartition by dynamic partition columns before insert partition table
wangyum commented on pull request #28032: URL: https://github.com/apache/spark/pull/28032#issuecomment-851119698 @HyukjinKwon I mainly want to make the whole cluster more stable. If a user does not add it manually, a large number of files may be generated. Please see this picture: ![image](https://user-images.githubusercontent.com/5399861/77612239-9bd30f00-6f62-11ea-9178-3bcd65aa4034.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32398: [WIP] hive version upgraded from 2.3.7 to 2.3.8
dongjoon-hyun closed pull request #32398: URL: https://github.com/apache/spark/pull/32398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32398: [WIP] hive version upgraded from 2.3.7 to 2.3.8
dongjoon-hyun commented on pull request #32398: URL: https://github.com/apache/spark/pull/32398#issuecomment-851117258 Hi, All. According to the above discussion, I'll close this PR for now. BTW, Apache Spark 3.1.2 is available, @bhupeshdhiman84 . - https://downloads.apache.org/spark/spark-3.1.2/ - https://spark.apache.org/docs/3.1.2/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring
dongjoon-hyun commented on a change in pull request #32388: URL: https://github.com/apache/spark/pull/32388#discussion_r642167700 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) { private final Timer registerExecutorRequestLatencyMillis = new Timer(); // Time latency for processing finalize shuffle merge request latency in ms private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); +// Block transfer rate in blocks per second Review comment: Is this valid when we do `getContinuousBlocksData`? To be clear to the metric audience, could you revise the definition you are aiming? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org