[GitHub] [spark] sarutak opened a new pull request #32691: Docker integration test ga take2
sarutak opened a new pull request #32691: URL: https://github.com/apache/spark/pull/32691 ### What changes were proposed in this pull request? This PR proposes to add `docker-integratin-tests` to `run-tests.py` and GA. Once #32631 was merged but there was a lack of consideration. Diff between this change and https://github.com/apache/spark/pull/32631/commits/692d95d1458993cbb9cbd47014202e84cd6aa328 merged in #32631 is as follows. ``` if: github.repository != 'apache/spark' id: sync-branch run: | +apache_spark_ref=`git rev-parse HEAD` git fetch https://github.com/$GITHUB_REPOSITORY.git ${GITHUB_REF#refs/heads/} git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' merge --no-commit --progress --squash FETCH_HEAD git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' commit -m "Merged commit" +echo "::set-output name=APACHE_SPARK_REF::$apache_spark_ref" - name: Cache Scala, SBT and Maven uses: actions/cache@v2 with: ``` ### Why are the changes needed? CI for `docker-integration-tests` is absent for now. ### Does this PR introduce _any_ user-facing change? GA. ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850157887 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43564/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
SparkQA commented on pull request #32653: URL: https://github.com/apache/spark/pull/32653#issuecomment-850152542 **[Test build #139048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139048/testReport)** for PR 32653 at commit [`b59e02e`](https://github.com/apache/spark/commit/b59e02e131f82f80489c527f202064d3de5f4fb9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
SparkQA commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850150535 **[Test build #139047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139047/testReport)** for PR 32688 at commit [`f5bafee`](https://github.com/apache/spark/commit/f5bafeeb22f677a2dc823b7a9e95590020a02f8e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
LuciferYang commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850149331 thx @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32690: [SPARK-35510][PYTHON] Fix and reenable test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true
SparkQA commented on pull request #32690: URL: https://github.com/apache/spark/pull/32690#issuecomment-850148551 **[Test build #139046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139046/testReport)** for PR 32690 at commit [`22780dd`](https://github.com/apache/spark/commit/22780ddf9fe367693b0ba30260a5455fbb364807). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-850146664 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43565/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
AmplabJenkins removed a comment on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850146667 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139039/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #32631: [SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA.
sarutak commented on a change in pull request #32631: URL: https://github.com/apache/spark/pull/32631#discussion_r641279731 ## File path: .github/workflows/build_and_test.yml ## @@ -625,3 +625,83 @@ jobs: with: name: unit-tests-log-tpcds--8-hadoop3.2-hive2.3 path: "**/target/unit-tests.log" + + docker-integration-tests: +name: Run docker integration tests +runs-on: ubuntu-20.04 +env: + HADOOP_PROFILE: hadoop3.2 + HIVE_PROFILE: hive2.3 + GITHUB_PREV_SHA: ${{ github.event.before }} + SPARK_LOCAL_IP: localhost + ORACLE_DOCKER_IMAGE_NAME: oracle/database:18.4.0-xe +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 + with: +fetch-depth: 0 +repository: apache/spark +ref: master +- name: Sync the current branch with the latest in Apache Spark + if: github.repository != 'apache/spark' + id: sync-branch + run: | +git fetch https://github.com/$GITHUB_REPOSITORY.git ${GITHUB_REF#refs/heads/} +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' merge --no-commit --progress --squash FETCH_HEAD +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' commit -m "Merged commit" Review comment: Ah, O.K. I'll do it. Thanks for letting me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850092153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins removed a comment on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850146665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-850146664 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43565/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850146665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850146669 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139040/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
AmplabJenkins commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850146667 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139039/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #32689: [SPARK-35552][SQL] Make query stage materialized more readable
HyukjinKwon edited a comment on pull request #32689: URL: https://github.com/apache/spark/pull/32689#issuecomment-850145032 @LuciferYang, the docker test failure should be now fixed in the latest master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32689: [SPARK-35552][SQL] Make query stage materialized more readable
HyukjinKwon commented on pull request #32689: URL: https://github.com/apache/spark/pull/32689#issuecomment-850145032 @LuciferYang, the docker test failure should be fixed in the latest master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
HyukjinKwon edited a comment on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850144781 @LuciferYang, the test failure should be now fixed in the latest master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
HyukjinKwon commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850144781 @LuciferYang, the test failure should be fixed in the latest master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #32631: [SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA.
HyukjinKwon edited a comment on pull request #32631: URL: https://github.com/apache/spark/pull/32631#issuecomment-850143899 sorry for reverting quickly - I reverted first as the issue is sort of minor but it takes a while to test related to this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32631: [SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA.
HyukjinKwon commented on pull request #32631: URL: https://github.com/apache/spark/pull/32631#issuecomment-850143899 sorry for a revert quickly - I reverted first as the issue is sort of minor but it takes a while to test related to this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32631: [SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA.
HyukjinKwon commented on a change in pull request #32631: URL: https://github.com/apache/spark/pull/32631#discussion_r641277034 ## File path: .github/workflows/build_and_test.yml ## @@ -625,3 +625,83 @@ jobs: with: name: unit-tests-log-tpcds--8-hadoop3.2-hive2.3 path: "**/target/unit-tests.log" + + docker-integration-tests: +name: Run docker integration tests +runs-on: ubuntu-20.04 +env: + HADOOP_PROFILE: hadoop3.2 + HIVE_PROFILE: hive2.3 + GITHUB_PREV_SHA: ${{ github.event.before }} + SPARK_LOCAL_IP: localhost + ORACLE_DOCKER_IMAGE_NAME: oracle/database:18.4.0-xe +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 + with: +fetch-depth: 0 +repository: apache/spark +ref: master +- name: Sync the current branch with the latest in Apache Spark + if: github.repository != 'apache/spark' + id: sync-branch + run: | +git fetch https://github.com/$GITHUB_REPOSITORY.git ${GITHUB_REF#refs/heads/} +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' merge --no-commit --progress --squash FETCH_HEAD +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' commit -m "Merged commit" Review comment: @sarutak would you mind opening a Pr again for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32631: [SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA.
HyukjinKwon commented on a change in pull request #32631: URL: https://github.com/apache/spark/pull/32631#discussion_r641276796 ## File path: .github/workflows/build_and_test.yml ## @@ -625,3 +625,83 @@ jobs: with: name: unit-tests-log-tpcds--8-hadoop3.2-hive2.3 path: "**/target/unit-tests.log" + + docker-integration-tests: +name: Run docker integration tests +runs-on: ubuntu-20.04 +env: + HADOOP_PROFILE: hadoop3.2 + HIVE_PROFILE: hive2.3 + GITHUB_PREV_SHA: ${{ github.event.before }} + SPARK_LOCAL_IP: localhost + ORACLE_DOCKER_IMAGE_NAME: oracle/database:18.4.0-xe +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 + with: +fetch-depth: 0 +repository: apache/spark +ref: master +- name: Sync the current branch with the latest in Apache Spark + if: github.repository != 'apache/spark' + id: sync-branch + run: | +git fetch https://github.com/$GITHUB_REPOSITORY.git ${GITHUB_REF#refs/heads/} +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' merge --no-commit --progress --squash FETCH_HEAD +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' commit -m "Merged commit" Review comment: Oh, we should add `echo "::set-output name=APACHE_SPARK_REF::$apache_spark_ref"` after this line because we're running tests with `run-tests.py`. ## File path: .github/workflows/build_and_test.yml ## @@ -625,3 +625,83 @@ jobs: with: name: unit-tests-log-tpcds--8-hadoop3.2-hive2.3 path: "**/target/unit-tests.log" + + docker-integration-tests: +name: Run docker integration tests +runs-on: ubuntu-20.04 +env: + HADOOP_PROFILE: hadoop3.2 + HIVE_PROFILE: hive2.3 + GITHUB_PREV_SHA: ${{ github.event.before }} + SPARK_LOCAL_IP: localhost + ORACLE_DOCKER_IMAGE_NAME: oracle/database:18.4.0-xe +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 + with: +fetch-depth: 0 +repository: apache/spark +ref: master +- name: Sync the current branch with the latest in Apache Spark + if: github.repository != 'apache/spark' + id: sync-branch + run: | +git fetch https://github.com/$GITHUB_REPOSITORY.git ${GITHUB_REF#refs/heads/} +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' merge --no-commit --progress --squash FETCH_HEAD +git -c user.name='Apache Spark Test Account' -c user.email='sparktest...@gmail.com' commit -m "Merged commit" Review comment: I will revert this for now .. seems like it breaks other tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32689: [SPARK-35552][SQL] Make query stage materialized more readable
SparkQA commented on pull request #32689: URL: https://github.com/apache/spark/pull/32689#issuecomment-850142972 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43563/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-850142364 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43565/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850140897 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43564/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850068550 **[Test build #139040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139040/testReport)** for PR 32686 at commit [`408851d`](https://github.com/apache/spark/commit/408851d641be4aa13146c640a48ebfb9bc158be8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850137924 **[Test build #139040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139040/testReport)** for PR 32686 at commit [`408851d`](https://github.com/apache/spark/commit/408851d641be4aa13146c640a48ebfb9bc158be8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850137712 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43562/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850135302 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43561/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
SparkQA removed a comment on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850068523 **[Test build #139039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139039/testReport)** for PR 32688 at commit [`72c425c`](https://github.com/apache/spark/commit/72c425c2022eb76436af499f27f514556da18444). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
SparkQA commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850134234 **[Test build #139039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139039/testReport)** for PR 32688 at commit [`72c425c`](https://github.com/apache/spark/commit/72c425c2022eb76436af499f27f514556da18444). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32690: [SPARK-35510][PYTHON] Fix and reenable test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true
HyukjinKwon commented on pull request #32690: URL: https://github.com/apache/spark/pull/32690#issuecomment-850130270 cc @xinrong-databricks and @itholic too fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #32690: [SPARK-35510][PYTHON] Fix and reenable test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true
HyukjinKwon opened a new pull request #32690: URL: https://github.com/apache/spark/pull/32690 ### What changes were proposed in this pull request? This PR proposes to fix and reenable `test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true` that was disabled when we upgrade Python 3.9 in CI at https://github.com/apache/spark/pull/32657. Seems like this is because of the latest NumPy's behaviour change, see also `https://github.com/numpy/numpy/pull/16273#discussion_r641264085`. pandas inherits this behaviour but it doesn't make sense when `numeric_only` is set to `True` in pandas. I will track and follow the status of the issue between pandas and NumPy. For the time being, I propose to exclude boolean case alone in percentile/quartile test case ### Why are the changes needed? To keep the test coverage. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? I roughly locally tested. But it should pass in CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850126315 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43562/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] lidiyag commented on pull request #32664: [SPARK-35516][WEBUI] Storage UI tab Storage Level tool tip correction
lidiyag commented on pull request #32664: URL: https://github.com/apache/spark/pull/32664#issuecomment-850124776 @dongjoon-hyun @srowen please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-850124566 **[Test build #139045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139045/testReport)** for PR 32473 at commit [`d4d39d3`](https://github.com/apache/spark/commit/d4d39d3fdecccd3551b07f8249a4015a0420a170). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850124515 **[Test build #139044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139044/testReport)** for PR 32658 at commit [`a4a2bb2`](https://github.com/apache/spark/commit/a4a2bb239e428b6da21c0a2214f348c89067048d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32582: [SPARK-35436][SS] RocksDBFileManager - save checkpoint to DFS
AmplabJenkins removed a comment on pull request #32582: URL: https://github.com/apache/spark/pull/32582#issuecomment-850124238 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43559/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32689: [SPARK-35552][SQL] Make query stage materialized more readable
SparkQA commented on pull request #32689: URL: https://github.com/apache/spark/pull/32689#issuecomment-850124454 **[Test build #139043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139043/testReport)** for PR 32689 at commit [`06a5cd7`](https://github.com/apache/spark/commit/06a5cd7889190714ec13db3f3124fc249398038e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32689: [SPARK-35552][SQL] Make query stage materialized more readable
ulysses-you commented on pull request #32689: URL: https://github.com/apache/spark/pull/32689#issuecomment-850124397 cc @maropu @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #32689: [SPARK-35552][SQL] Make query stage materialized more readable
ulysses-you opened a new pull request #32689: URL: https://github.com/apache/spark/pull/32689 ### What changes were proposed in this pull request? Add a new method `isMaterialized` in `QueryStageExec`. ### Why are the changes needed? Currently, we use `resultOption().get.isDefined` to check if a query stage has materialized. The code is not readable at a glance. It's better to use a new method like `isMaterialized` to define it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32582: [SPARK-35436][SS] RocksDBFileManager - save checkpoint to DFS
AmplabJenkins commented on pull request #32582: URL: https://github.com/apache/spark/pull/32582#issuecomment-850124238 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43559/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850123795 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43561/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-85015 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43560/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32582: [SPARK-35436][SS] RocksDBFileManager - save checkpoint to DFS
SparkQA commented on pull request #32582: URL: https://github.com/apache/spark/pull/32582#issuecomment-850121547 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43559/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
otterc commented on a change in pull request #30691: URL: https://github.com/apache/spark/pull/30691#discussion_r641257596 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -2000,6 +2023,147 @@ private[spark] class DAGScheduler( } } + /** + * Schedules shuffle merge finalize. + */ + private[scheduler] def scheduleShuffleMergeFinalize(stage: ShuffleMapStage): Unit = { +logInfo(("%s (%s) scheduled for finalizing" + + " shuffle merge in %s s").format(stage, stage.name, shuffleMergeFinalizeWaitSec)) +shuffleMergeFinalizeScheduler.schedule( + new Runnable { +override def run(): Unit = finalizeShuffleMerge(stage) + }, + shuffleMergeFinalizeWaitSec, + TimeUnit.SECONDS +) + } + + /** + * DAGScheduler notifies all the remote shuffle services chosen to serve shuffle merge request for + * the given shuffle map stage to finalize the shuffle merge process for this shuffle. This is + * invoked in a separate thread to reduce the impact on the DAGScheduler main thread, as the + * scheduler might need to talk to 1000s of shuffle services to finalize shuffle merge. + */ + private[scheduler] def finalizeShuffleMerge(stage: ShuffleMapStage): Unit = { +logInfo("%s (%s) finalizing the shuffle merge".format(stage, stage.name)) +externalShuffleClient.foreach { shuffleClient => + val shuffleId = stage.shuffleDep.shuffleId + val numMergers = stage.shuffleDep.getMergerLocs.length + val numResponses = new AtomicInteger() + val results = (0 until numMergers).map(_ => SettableFuture.create[Boolean]()) + val timedOut = new AtomicBoolean() + + def increaseAndCheckResponseCount(): Unit = { +if (numResponses.incrementAndGet() == numMergers) { + logInfo("%s (%s) shuffle merge finalized".format(stage, stage.name)) + // Since this runs in the netty client thread and is outside of DAGScheduler + // event loop, we only post ShuffleMergeFinalized event into the event queue. + // The processing of this event should be done inside the event loop, so it + // can safely modify scheduler's internal state. + eventProcessLoop.post(ShuffleMergeFinalized(stage)) +} + } + + stage.shuffleDep.getMergerLocs.zipWithIndex.foreach { +case (shuffleServiceLoc, index) => + // Sends async request to shuffle service to finalize shuffle merge on that host + // TODO: SPARK-35536: Cancel finalizeShuffleMerge if the stage is cancelled + // TODO: during shuffleMergeFinalizeWaitSec + shuffleClient.finalizeShuffleMerge(shuffleServiceLoc.host, +shuffleServiceLoc.port, shuffleId, +new MergeFinalizerListener { + override def onShuffleMergeSuccess(statuses: MergeStatuses): Unit = { +assert(shuffleId == statuses.shuffleId) +if (!timedOut.get()) { + eventProcessLoop.post(RegisterMergeStatuses(stage, MergeStatus. +convertMergeStatusesToMergeStatusArr(statuses, shuffleServiceLoc))) + increaseAndCheckResponseCount() + results(index).set(true) +} + } + + override def onShuffleMergeFailure(e: Throwable): Unit = { +if (!timedOut.get()) { + logWarning(s"Exception encountered when trying to finalize shuffle " + +s"merge on ${shuffleServiceLoc.host} for shuffle $shuffleId", e) + increaseAndCheckResponseCount() + // Do not fail the future as this would cause dag scheduler to prematurely + // give up on waiting for merge results from the remaining shuffle services + // if one fails + results(index).set(false) +} + } +}) + } + // DAGScheduler only waits for a limited amount of time for the merge results. + // It will attempt to submit the next stage(s) irrespective of whether merge results + // from all shuffle services are received or not. + // TODO: SPARK-33701: Instead of waiting for a constant amount of time for finalization + // TODO: for all the stages, adaptively tune timeout for merge finalization + try { +Futures.allAsList(results: _*).get(shuffleMergeResultsTimeoutSec, TimeUnit.SECONDS) + } catch { +case _: TimeoutException => + logInfo(s"Timed out on waiting for merge results from all " + +s"$numMergers mergers for shuffle $shuffleId") + timedOut.set(true) + eventProcessLoop.post(ShuffleMergeFinalized(stage)) + } +} + } + + private def processShuffleMapStageCompletion(shuffleStage: ShuffleMapStage): Unit = { +markStageAsFinished(shuffleStage) +logInfo("looking for newly runnable stages") +
[GitHub] [spark] allisonwang-db commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
allisonwang-db commented on a change in pull request #32303: URL: https://github.com/apache/spark/pull/32303#discussion_r641255370 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala ## @@ -107,6 +107,11 @@ case class UsingJoin(tpe: JoinType, usingColumns: Seq[String]) extends JoinType override def sql: String = "USING " + tpe.sql } +case class LateralJoin(tpe: JoinType) extends JoinType { + require(Seq(Inner, LeftOuter, Cross).contains(tpe), "Unsupported lateral join type " + tpe) Review comment: @maropu Postgres supports INNER, CROSS, and LEFT lateral join, and it doesn't make sense to support RIGHT OUTER and FULL OUTER lateral join. How about let's add the other two types of supported left joins: left semi and left anti here. Then lateral join types shouldn't be changed in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon edited a comment on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850107310 @itholic the generated doc looks a bit weird: ![Screen Shot 2021-05-28 at 1 09 38 PM](https://user-images.githubusercontent.com/6477701/119928166-04c67480-bfb6-11eb-8449-428b01f2144a.png) It includes `# noqa` Can you double check and fix? Seems like we should fix other places for JSON, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850107310 @itholic the generated doc looks a bit weird: ![Screen Shot 2021-05-28 at 1 09 38 PM](https://user-images.githubusercontent.com/6477701/119928166-04c67480-bfb6-11eb-8449-428b01f2144a.png) Can you double check and fix? Seems like we should fix other places for JSON, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
allisonwang-db commented on a change in pull request #32303: URL: https://github.com/apache/spark/pull/32303#discussion_r641253529 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala ## @@ -168,6 +168,21 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { } } +/** + * Rewrite lateral joins by rewriting all dependent joins (if any) inside the right + * sub-tree of the lateral join and converting the lateral join into a base join type. + */ +object RewriteLateralJoin extends Rule[LogicalPlan] with PredicateHelper { + + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case j @ Join(left, right, LateralJoin(joinType), condition, _) => + val conditions = condition.map(splitConjunctivePredicates).getOrElse(Nil) + val newRight = DecorrelateInnerQuery.rewriteDomainJoins(left, right, conditions) + // TODO: handle the COUNT bug Review comment: Created a new ticket for handling the COUNT bug in lateral subqueries: https://issues.apache.org/jira/browse/SPARK-35551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon commented on a change in pull request #32658: URL: https://github.com/apache/spark/pull/32658#discussion_r641253439 ## File path: docs/sql-data-sources-csv.md ## @@ -38,3 +36,217 @@ Spark SQL provides `spark.read().csv("file_name")` to read a file or directory o + +## Data Source Option + +Data source options of CSV can be set via: +* the `.option`/`.options` methods of + * `DataFrameReader` + * `DataFrameWriter` + * `DataStreamReader` + * `DataStreamWriter` +* the built-in functions below + * `from_csv` + * `to_csv` + * `schema_of_csv` +* `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) + + + + Property NameDefaultMeaningScope + +sep +, +Sets a separator (one or more characters) for each field and value. +read/write + + +encoding +UTF-8 for reading, not set for writing +Specifies encoding (charset) for reading or writing CSV files +read/write + + +quote +" +Sets a single character used for escaping quoted values where the separator can be part of the value. If you would like to turn off quotations, you need to set an empty string. If an empty string is set, it uses u (null character) for wirting, and it disables the quotation handling for reading. +read/write + + +quoteAll +false +A flag indicating whether all values should always be enclosed in quotes. It only escapes values containing a quote character by default. +write + + +escape +\ +Sets a single character used for escaping quotes inside an already quoted value. +read/write + + +escapeQuotes +true +A flag indicating whether values containing quotes should always be enclosed in quotes. It escapes all values containing a quote character by default. +write + + +comment +empty string +Sets a single character used for skipping lines beginning with this character. It's disabled by default +read + + +header +false +For reading, uses the first line as names of columns. For writing, writes the names of columns as the first line. Note that if the given path is a RDD of Strings, this header option will remove all lines same with the header if exists. +read/write + + +inferSchema +false +Infers the input schema automatically from data. It requires one extra pass over the data. +read + + +enforceSchema +true +If it is set to true, the specified or inferred schema will be forcibly applied to datasource files, and headers in CSV files will be ignored. If the option is set to false, the schema will be validated against all headers in CSV files or the first header in RDD if the header option is set to true. Field names in the schema and column names in CSV headers are checked by their positions taking into account spark.sql.caseSensitive. Though the default value is true, it is recommended to disable the enforceSchema option to avoid incorrect results. +read + + +ignoreLeadingWhiteSpace +false (for reading), true (for writing) +A flag indicating whether or not leading whitespaces from values being read/written should be skipped. +read/write + + +ignoreTrailingWhiteSpace +false (for reading), true (for writing) +A flag indicating whether or not trailing whitespaces from values being read/written should be skipped. +read/write + + +nullValue +empty string +Sets the string representation of a null value. Since 2.0.1, this nullValue param applies to all supported types including the string type. +read/write + + +nanValue +NaN +Sets the string representation of a non-number value. +read + + +positiveInf +Inf +Sets the string representation of a positive infinity value. +read + + +negativeInf +-Inf +Sets the string representation of a negative infinity value. +read + + +dateFormat +-MM-dd +Sets the string that indicates a date format. Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html;> Datetime Patterns. This applies to date type. +read/write + + +timestampFormat +-MM-dd'T'HH:mm:ss[.SSS][XXX] +Sets the string that indicates a timestamp format. Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html;>Datetime Patterns. This applies to timestamp type. +read/write + + +maxColumns +20480 +Defines a hard limit of how many columns a record can have. +read + + +maxCharsPerColumn +-1 +Defines the maximum number of characters allowed for any given value being read. The default value -1 means unlimited length. +read + + +mode +PERMISSIVE +Allows a mode for dealing with corrupt records during parsing. Note that Spark
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850092153 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43558/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850092142 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43558/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
itholic commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850092067 Thanks, @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-850089349 **[Test build #139042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139042/testReport)** for PR 32658 at commit [`1991031`](https://github.com/apache/spark/commit/1991031ea3871acc0a6ea20b96e3299bc1d6c51c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32582: [SPARK-35436][SS] RocksDBFileManager - save checkpoint to DFS
SparkQA commented on pull request #32582: URL: https://github.com/apache/spark/pull/32582#issuecomment-850087350 **[Test build #139041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139041/testReport)** for PR 32582 at commit [`703f59f`](https://github.com/apache/spark/commit/703f59fd4055d68e5bf957e1dc4f17159256a65a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32687: [SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions
AmplabJenkins removed a comment on pull request #32687: URL: https://github.com/apache/spark/pull/32687#issuecomment-850086867 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139036/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
AmplabJenkins removed a comment on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850086868 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43557/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
AmplabJenkins commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850086868 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43557/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32687: [SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions
AmplabJenkins commented on pull request #32687: URL: https://github.com/apache/spark/pull/32687#issuecomment-850086867 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139036/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema
wangyum commented on pull request #32675: URL: https://github.com/apache/spark/pull/32675#issuecomment-850086524 cc @cloud-fan @yaooqinn @AngersZh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema
wangyum commented on a change in pull request #32675: URL: https://github.com/apache/spark/pull/32675#discussion_r641220605 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala ## @@ -870,4 +871,68 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter assert(e.contains("Partition spec is invalid")) } } + + test("Insert data with different cases") { Review comment: Add `SPARK-35531` prefix to test name? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema
wangyum commented on a change in pull request #32675: URL: https://github.com/apache/spark/pull/32675#discussion_r641220214 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ## @@ -1092,14 +1092,28 @@ private[hive] object HiveClientImpl extends Logging { hiveTable.setViewExpandedText(t) } +// hive may convert schema into lower cases while bucketSpec will not +// only convert if case not match +def convertColumnNames(schema: StructType, names: Seq[String]): Seq[String] = { + names.map(name => { +val s = schema.find(col => col.name.equalsIgnoreCase(name)) +if (s.isDefined) { + s.get.name +} else { + name +} + }) +} Review comment: Rewrite `convertColumnNames`? ```scala def restoreHiveBucketSpecColNames(schema: StructType, names: Seq[String]): Seq[String] = { names.map { name => schema.find(col => SQLConf.get.resolver(col.name, name)).map(_.name).getOrElse(name) } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
SparkQA commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850083738 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43557/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32582: [SPARK-35436][SS] RocksDBFileManager - save checkpoint to DFS
viirya commented on pull request #32582: URL: https://github.com/apache/spark/pull/32582#issuecomment-850083083 Thanks @xuanyuanking. I will find some time to review this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850082809 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43558/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #32582: [SPARK-35436] RocksDBFileManager - save checkpoint to DFS
xuanyuanking commented on pull request #32582: URL: https://github.com/apache/spark/pull/32582#issuecomment-850079062 As we merged #32272, after rebasing and addressing the comment, this one is ready for review. cc @viirya and @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata
xuanyuanking commented on pull request #32272: URL: https://github.com/apache/spark/pull/32272#issuecomment-850078509 Thanks for the review and help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata
xuanyuanking commented on a change in pull request #32272: URL: https://github.com/apache/spark/pull/32272#discussion_r641196335 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import java.io.File +import java.nio.charset.StandardCharsets.UTF_8 +import java.nio.file.Files + +import scala.collection.Seq + +import com.fasterxml.jackson.annotation.JsonInclude.Include +import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper} +import com.fasterxml.jackson.module.scala.{DefaultScalaModule, ScalaObjectMapper} +import org.json4s.NoTypeHints +import org.json4s.jackson.Serialization + +/** + * Classes to represent metadata of checkpoints saved to DFS. Since this is converted to JSON, any + * changes to this MUST be backward-compatible. + */ +case class RocksDBCheckpointMetadata( +sstFiles: Seq[RocksDBSstFile], +logFiles: Seq[RocksDBLogFile], +numKeys: Long) { + import RocksDBCheckpointMetadata._ + + def json: String = { +// We turn this field into a null to avoid write a empty logFiles field in the json. +val nullified = if (logFiles.isEmpty) this.copy(logFiles = null) else this +mapper.writeValueAsString(nullified) + } + + def prettyJson: String = Serialization.writePretty(this)(RocksDBCheckpointMetadata.format) + + def writeToFile(metadataFile: File): Unit = { +val writer = Files.newBufferedWriter(metadataFile.toPath, UTF_8) +try { + writer.write(s"v$VERSION\n") + writer.write(this.json) +} finally { + writer.close() +} + } + + def immutableFiles: Seq[RocksDBImmutableFile] = sstFiles ++ logFiles +} + +/** Helper class for [[RocksDBCheckpointMetadata]] */ +object RocksDBCheckpointMetadata { + val VERSION = 1 + + implicit val format = Serialization.formats(NoTypeHints) + + /** Used to convert between classes and JSON. */ + lazy val mapper = { +val _mapper = new ObjectMapper with ScalaObjectMapper +_mapper.setSerializationInclusion(Include.NON_ABSENT) +_mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) +_mapper.registerModule(DefaultScalaModule) +_mapper + } + + def readFromFile(metadataFile: File): RocksDBCheckpointMetadata = { +val reader = Files.newBufferedReader(metadataFile.toPath, UTF_8) +try { + val versionLine = reader.readLine() + if (versionLine != s"v$VERSION") { +throw new IllegalStateException( + s"Cannot read RocksDB checkpoint metadata of version $versionLine") + } + Serialization.read[RocksDBCheckpointMetadata](reader) +} finally { + reader.close() +} + } + + def apply(rocksDBFiles: Seq[RocksDBImmutableFile], numKeys: Long): RocksDBCheckpointMetadata = { +val sstFiles = rocksDBFiles.collect { case file: RocksDBSstFile => file } +val logFiles = rocksDBFiles.collect { case file: RocksDBLogFile => file } + +RocksDBCheckpointMetadata(sstFiles, logFiles, numKeys) + } +} + +/** + * A RocksDBImmutableFile maintains a mapping between a local RocksDB file name and the name of + * its copy on DFS. Since these files are immutable, their DFS copies can be reused. + */ +sealed trait RocksDBImmutableFile { + def localFileName: String + def dfsFileName: String + def sizeBytes: Long + + /** + * Whether another local file is same as the file described by this class. + * A file is same only when the name and the size are same. + */ + def isSameFile(otherFile: File): Boolean = { +otherFile.getName == localFileName && otherFile.length() == sizeBytes + } +} + +/** + * Class to represent a RocksDB SST file. Since this is converted to JSON, + * any changes to these MUST be backward-compatible. + */ +private[sql] case class RocksDBSstFile( +localFileName: String, +dfsSstFileName: String, +sizeBytes: Long) extends RocksDBImmutableFile { + + override def dfsFileName: String = dfsSstFileName +} + +/** + * Class to represent a RocksDB Log file. Since this is converted to JSON, + * any changes to these MUST be
[GitHub] [spark] SparkQA removed a comment on pull request #32687: [SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions
SparkQA removed a comment on pull request #32687: URL: https://github.com/apache/spark/pull/32687#issuecomment-849987071 **[Test build #139036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139036/testReport)** for PR 32687 at commit [`d37d01a`](https://github.com/apache/spark/commit/d37d01a6ae8fd5404dca172e748fc9994c0d66b3). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32687: [SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions
SparkQA commented on pull request #32687: URL: https://github.com/apache/spark/pull/32687#issuecomment-850074626 **[Test build #139036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139036/testReport)** for PR 32687 at commit [`d37d01a`](https://github.com/apache/spark/commit/d37d01a6ae8fd5404dca172e748fc9994c0d66b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11
HyukjinKwon closed pull request #32673: URL: https://github.com/apache/spark/pull/32673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32673: [SPARK-35530][ML][TESTS] Fix rounding error in DifferentiableLossAggregatorSuite with Java 11
HyukjinKwon commented on pull request #32673: URL: https://github.com/apache/spark/pull/32673#issuecomment-850068886 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850068550 **[Test build #139040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139040/testReport)** for PR 32686 at commit [`408851d`](https://github.com/apache/spark/commit/408851d641be4aa13146c640a48ebfb9bc158be8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
SparkQA commented on pull request #32688: URL: https://github.com/apache/spark/pull/32688#issuecomment-850068523 **[Test build #139039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139039/testReport)** for PR 32688 at commit [`72c425c`](https://github.com/apache/spark/commit/72c425c2022eb76436af499f27f514556da18444). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode
AmplabJenkins removed a comment on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-850067873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request #32688: [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
LuciferYang opened a new pull request #32688: URL: https://github.com/apache/spark/pull/32688 ### What changes were proposed in this pull request? This pr upgrade Jackson version to 2.12.3. Jackson Release 2.12.3: [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12.3](https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12.3) ### Why are the changes needed? Upgrade to a new version to bring potential bug fixes like [https://github.com/FasterXML/jackson-modules-java8/issues/207](https://github.com/FasterXML/jackson-modules-java8/issues/207) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode
AmplabJenkins commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-850067873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode
SparkQA removed a comment on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-850011679 **[Test build #139038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139038/testReport)** for PR 32397 at commit [`b47599f`](https://github.com/apache/spark/commit/b47599fe82a2b6d4cd3896c4117d3de19cc62d5e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode
SparkQA commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-850058005 **[Test build #139038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139038/testReport)** for PR 32397 at commit [`b47599f`](https://github.com/apache/spark/commit/b47599fe82a2b6d4cd3896c4117d3de19cc62d5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode
SparkQA commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-850056962 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43556/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32301: [SPARK-35194][SQL] Refactor nested column aliasing for readability
AmplabJenkins removed a comment on pull request #32301: URL: https://github.com/apache/spark/pull/32301#issuecomment-850053730 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32301: [SPARK-35194][SQL] Refactor nested column aliasing for readability
AmplabJenkins commented on pull request #32301: URL: https://github.com/apache/spark/pull/32301#issuecomment-850053730 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32301: [SPARK-35194][SQL] Refactor nested column aliasing for readability
SparkQA removed a comment on pull request #32301: URL: https://github.com/apache/spark/pull/32301#issuecomment-849958280 **[Test build #139035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139035/testReport)** for PR 32301 at commit [`8a29e94`](https://github.com/apache/spark/commit/8a29e943447808391c17f860598e3f11ae41d54d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32301: [SPARK-35194][SQL] Refactor nested column aliasing for readability
SparkQA commented on pull request #32301: URL: https://github.com/apache/spark/pull/32301#issuecomment-850053105 **[Test build #139035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139035/testReport)** for PR 32301 at commit [`8a29e94`](https://github.com/apache/spark/commit/8a29e943447808391c17f860598e3f11ae41d54d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
otterc commented on a change in pull request #30691: URL: https://github.com/apache/spark/pull/30691#discussion_r641118134 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -2000,6 +2023,147 @@ private[spark] class DAGScheduler( } } + /** + * Schedules shuffle merge finalize. + */ + private[scheduler] def scheduleShuffleMergeFinalize(stage: ShuffleMapStage): Unit = { +logInfo(("%s (%s) scheduled for finalizing" + + " shuffle merge in %s s").format(stage, stage.name, shuffleMergeFinalizeWaitSec)) +shuffleMergeFinalizeScheduler.schedule( + new Runnable { +override def run(): Unit = finalizeShuffleMerge(stage) + }, + shuffleMergeFinalizeWaitSec, + TimeUnit.SECONDS +) + } + + /** + * DAGScheduler notifies all the remote shuffle services chosen to serve shuffle merge request for + * the given shuffle map stage to finalize the shuffle merge process for this shuffle. This is + * invoked in a separate thread to reduce the impact on the DAGScheduler main thread, as the + * scheduler might need to talk to 1000s of shuffle services to finalize shuffle merge. + */ + private[scheduler] def finalizeShuffleMerge(stage: ShuffleMapStage): Unit = { +logInfo("%s (%s) finalizing the shuffle merge".format(stage, stage.name)) +externalShuffleClient.foreach { shuffleClient => + val shuffleId = stage.shuffleDep.shuffleId + val numMergers = stage.shuffleDep.getMergerLocs.length + val numResponses = new AtomicInteger() + val results = (0 until numMergers).map(_ => SettableFuture.create[Boolean]()) + val timedOut = new AtomicBoolean() + + def increaseAndCheckResponseCount(): Unit = { +if (numResponses.incrementAndGet() == numMergers) { + logInfo("%s (%s) shuffle merge finalized".format(stage, stage.name)) + // Since this runs in the netty client thread and is outside of DAGScheduler + // event loop, we only post ShuffleMergeFinalized event into the event queue. + // The processing of this event should be done inside the event loop, so it + // can safely modify scheduler's internal state. + eventProcessLoop.post(ShuffleMergeFinalized(stage)) +} + } + + stage.shuffleDep.getMergerLocs.zipWithIndex.foreach { +case (shuffleServiceLoc, index) => + // Sends async request to shuffle service to finalize shuffle merge on that host + // TODO: SPARK-35536: Cancel finalizeShuffleMerge if the stage is cancelled + // TODO: during shuffleMergeFinalizeWaitSec + shuffleClient.finalizeShuffleMerge(shuffleServiceLoc.host, +shuffleServiceLoc.port, shuffleId, +new MergeFinalizerListener { + override def onShuffleMergeSuccess(statuses: MergeStatuses): Unit = { +assert(shuffleId == statuses.shuffleId) +if (!timedOut.get()) { + eventProcessLoop.post(RegisterMergeStatuses(stage, MergeStatus. +convertMergeStatusesToMergeStatusArr(statuses, shuffleServiceLoc))) + increaseAndCheckResponseCount() + results(index).set(true) +} + } + + override def onShuffleMergeFailure(e: Throwable): Unit = { +if (!timedOut.get()) { + logWarning(s"Exception encountered when trying to finalize shuffle " + +s"merge on ${shuffleServiceLoc.host} for shuffle $shuffleId", e) + increaseAndCheckResponseCount() + // Do not fail the future as this would cause dag scheduler to prematurely + // give up on waiting for merge results from the remaining shuffle services + // if one fails + results(index).set(false) +} + } +}) + } + // DAGScheduler only waits for a limited amount of time for the merge results. + // It will attempt to submit the next stage(s) irrespective of whether merge results + // from all shuffle services are received or not. + // TODO: SPARK-33701: Instead of waiting for a constant amount of time for finalization + // TODO: for all the stages, adaptively tune timeout for merge finalization + try { +Futures.allAsList(results: _*).get(shuffleMergeResultsTimeoutSec, TimeUnit.SECONDS) + } catch { +case _: TimeoutException => + logInfo(s"Timed out on waiting for merge results from all " + +s"$numMergers mergers for shuffle $shuffleId") + timedOut.set(true) + eventProcessLoop.post(ShuffleMergeFinalized(stage)) + } +} + } + + private def processShuffleMapStageCompletion(shuffleStage: ShuffleMapStage): Unit = { +markStageAsFinished(shuffleStage) +logInfo("looking for newly runnable stages") +
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850049662 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139037/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850049662 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139037/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850010173 **[Test build #139037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139037/testReport)** for PR 32686 at commit [`ab3b61f`](https://github.com/apache/spark/commit/ab3b61fc05e1df8ba75320d30c7c96834c291db2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850049444 **[Test build #139037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139037/testReport)** for PR 32686 at commit [`ab3b61f`](https://github.com/apache/spark/commit/ab3b61fc05e1df8ba75320d30c7c96834c291db2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
venkata91 commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-850045368 Addressed all the comments AFAIK, please review @mridulm @Victsm @Ngone51 @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
venkata91 commented on a change in pull request #30691: URL: https://github.com/apache/spark/pull/30691#discussion_r641095507 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -2004,6 +2020,131 @@ private[spark] class DAGScheduler( } } + /** + * Schedules shuffle merge finalize. + */ + private[scheduler] def scheduleShuffleMergeFinalize(stage: ShuffleMapStage): Unit = { +logInfo(("%s (%s) scheduled for finalizing" + + " shuffle merge in %s s").format(stage, stage.name, shuffleMergeFinalizeWaitSec)) +shuffleMergeFinalizeScheduler.schedule( + new Runnable { +override def run(): Unit = finalizeShuffleMerge(stage) + }, + shuffleMergeFinalizeWaitSec, + TimeUnit.SECONDS +) + } + + /** + * DAGScheduler notifies all the remote shuffle services chosen to serve shuffle merge request for + * the given shuffle map stage to finalize the shuffle merge process for this shuffle. This is + * invoked in a separate thread to reduce the impact on the DAGScheduler main thread, as the + * scheduler might need to talk to 1000s of shuffle services to finalize shuffle merge. + */ + private[scheduler] def finalizeShuffleMerge(stage: ShuffleMapStage): Unit = { +logInfo("%s (%s) finalizing the shuffle merge".format(stage, stage.name)) Review comment: Added additional tests to handle the cases of stage cancellation, barrier stage, late arrival of merge results etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
venkata91 commented on a change in pull request #30691: URL: https://github.com/apache/spark/pull/30691#discussion_r641094842 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -2136,9 +2137,24 @@ private[spark] class DAGScheduler( } } - private[scheduler] def handleShuffleMergeFinalized(stage: ShuffleMapStage): Unit = { -stage.shuffleDep.markShuffleMergeFinalized -processShuffleMapStageCompletion(stage) + private[scheduler] def handleRegisterMergeStatuses( + stage: ShuffleMapStage, + mergeStatuses: Seq[(Int, MergeStatus)]): Unit = { +// Register merge statuses if the stage is still running and shuffle merge is not finalized yet. +if (runningStages.contains(stage) && !stage.shuffleDep.shuffleMergeFinalized) { + mapOutputTracker.registerMergeResults(stage.shuffleDep.shuffleId, mergeStatuses) +} + } + + private[scheduler] def handleShuffleMergeFinalized( + stage: ShuffleMapStage): Unit = { +// Only update MapOutputTracker metadata if the stage is still active. i.e not cancelled. +if (runningStages.contains(stage)) { + stage.shuffleDep.markShuffleMergeFinalized() + processShuffleMapStageCompletion(stage) +} else { + mapOutputTracker.unregisterAllMergeResult(stage.shuffleDep.shuffleId) Review comment: Discussed offline with @mridulm and currently there are few corner cases which needs to be carefully thought through before having this behavior. Created a TODO and a corresponding follow up JIRA - https://issues.apache.org/jira/browse/SPARK-35549 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on pull request #32687: [SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions
allisonwang-db commented on pull request #32687: URL: https://github.com/apache/spark/pull/32687#issuecomment-850041129 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #31998: [SPARK-34859][SQL] parquet vectorized reader - support column index with rowIndexes
sunchao commented on pull request #31998: URL: https://github.com/apache/spark/pull/31998#issuecomment-850040241 @lxian In the current approach we'd have to copy values from one vector to another. I think a better and more efficient approach may be to feed the row indexes to `VectorizedRleValuesReader#readXXX` and skip rows if they are not in the range, so basically we increment both `rowId` and row indexes in parallel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe edited a comment on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe edited a comment on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-850036241 Created ticket for later improvement [SPARK-35546](https://issues.apache.org/jira/browse/SPARK-35546) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-850036241 Created ticket for later improvement https://issues.apache.org/jira/browse/SPARK-35546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850035894 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43555/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-850035894 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43555/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org