[GitHub] [spark] HyukjinKwon closed pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing
HyukjinKwon closed pull request #30056: URL: https://github.com/apache/spark/pull/30056 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon removed a comment on pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing
HyukjinKwon removed a comment on pull request #30056: URL: https://github.com/apache/spark/pull/30056#issuecomment-712610574 @cloud-fan can you take a look too just for doubly sure? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing
HyukjinKwon commented on pull request #30056: URL: https://github.com/apache/spark/pull/30056#issuecomment-712612351 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing
HyukjinKwon commented on pull request #30056: URL: https://github.com/apache/spark/pull/30056#issuecomment-712610574 @cloud-fan can you take a look too just for doubly sure? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29812: [SPARK-32941][SQL] Optimize UpdateFields expression chain and put the rule early in Analysis phase
viirya commented on pull request #29812: URL: https://github.com/apache/spark/pull/29812#issuecomment-712607215 @HyukjinKwon Thanks for the suggestion. I updated this PR description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712602334 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130031/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712602328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712602328 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712578947 **[Test build #130031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130031/testReport)** for PR 30097 at commit [`4792ebd`](https://github.com/apache/spark/commit/4792ebd71d20739ec8345ba2edd0e8d7b28dfd48). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712602038 **[Test build #130031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130031/testReport)** for PR 30097 at commit [`4792ebd`](https://github.com/apache/spark/commit/4792ebd71d20739ec8345ba2edd0e8d7b28dfd48). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class RemoveAllHints extends Rule[LogicalPlan] ` * ` class DisableHints extends RemoveAllHints ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712598524 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34637/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins removed a comment on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712588757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130030/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA removed a comment on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712531902 **[Test build #130030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)** for PR 30057 at commit [`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins removed a comment on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712588752 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712598520 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712598520 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712598500 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34637/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver
HyukjinKwon commented on a change in pull request #30024: URL: https://github.com/apache/spark/pull/30024#discussion_r508216007 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala ## @@ -58,5 +59,15 @@ object DriverRegistry extends Logging { } } } + + def get(className: String): Driver = { Review comment: oh! got it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AngersZh commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712595799 > **[Test build #130030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)** for PR 30057 at commit [`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304). > > * This patch **fails Spark unit tests**. > * This patch merges cleanly. > * This patch adds no public classes. Seems test `org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite.subquery/scalar-subquery/scalar-subquery-select.sql` unstable? Is there anyone working on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
HyukjinKwon commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712594756 I am okay with not blocking this and @tgravescs PRs by the test failures - hope we could fix it very soon though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712589515 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34637/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
sunchao edited a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712586049 > @sunchao, yes I think we can do that but would you mind creating a separate PR to fix the test first though? Using python3 with my workaround fix should be good enough. @HyukjinKwon Currently the github action tests pass without the `python3` change (I later made some change which seem to fix the tests related to python3), and the jenkins tests fail either w/ or w/o it: in the latter case it fails with error such as: ``` 20/10/16 19:20:36 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (amp-jenkins-worker-03.amp executor 1): org.apache.spark.SparkException: Error from python worker: Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 104, in _run_module_as_main loader, code, fname = _get_module_details(mod_name) File "/usr/lib64/python2.6/runpy.py", line 79, in _get_module_details loader = get_loader(mod_name) File "/usr/lib64/python2.6/pkgutil.py", line 456, in get_loader return find_loader(fullname) File "/usr/lib64/python2.6/pkgutil.py", line 466, in find_loader for importer in iter_importers(fullname): File "/usr/lib64/python2.6/pkgutil.py", line 422, in iter_importers __import__(pkg) File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/__init__.py", line 53, in from pyspark.rdd import RDD, RDDBarrier File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/rdd.py", line 34, in from pyspark.java_gateway import local_connect_and_auth File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/java_gateway.py", line 29, in from py4j.java_gateway import java_import, JavaGateway, JavaObject, GatewayParameters File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 60 PY4J_TRUE = {"yes", "y", "t", "true"} ^ SyntaxError: invalid syntax ``` So do we still need a separate PR for that right now? > Also, seems like we're going to split PR (?). The first one (this) is for preparation, and second one is actually bumping up to Hadoop version to 3.2.1 (?). Would you mind clarifying the plan and what this PR proposes in the description/title? Yes that's right. The plan is to have a separate PR bumping Hadoop version to 3.2.2 when that comes out (probably will be soon). There is a [bug](https://issues.apache.org/jira/browse/HDFS-15191) in 3.2.1 which affects wire compatibility between 3.2 clients and 2.x server. I'll update the PR description soon. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
sunchao edited a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712586049 > @sunchao, yes I think we can do that but would you mind creating a separate PR to fix the test first though? Using python3 with my workaround fix should be good enough. @HyukjinKwon Currently the github action tests pass without the `python3` change, and the jenkins tests fail either w/ or w/o it: in the latter case it fails with error such as: ``` 20/10/16 19:20:36 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (amp-jenkins-worker-03.amp executor 1): org.apache.spark.SparkException: Error from python worker: Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 104, in _run_module_as_main loader, code, fname = _get_module_details(mod_name) File "/usr/lib64/python2.6/runpy.py", line 79, in _get_module_details loader = get_loader(mod_name) File "/usr/lib64/python2.6/pkgutil.py", line 456, in get_loader return find_loader(fullname) File "/usr/lib64/python2.6/pkgutil.py", line 466, in find_loader for importer in iter_importers(fullname): File "/usr/lib64/python2.6/pkgutil.py", line 422, in iter_importers __import__(pkg) File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/__init__.py", line 53, in from pyspark.rdd import RDD, RDDBarrier File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/rdd.py", line 34, in from pyspark.java_gateway import local_connect_and_auth File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/java_gateway.py", line 29, in from py4j.java_gateway import java_import, JavaGateway, JavaObject, GatewayParameters File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 60 PY4J_TRUE = {"yes", "y", "t", "true"} ^ SyntaxError: invalid syntax ``` So do we still need a separate PR for that right now? > Also, seems like we're going to split PR (?). The first one (this) is for preparation, and second one is actually bumping up to Hadoop version to 3.2.1 (?). Would you mind clarifying the plan and what this PR proposes in the description/title? Yes that's right. The plan is to have a separate PR bumping Hadoop version to 3.2.2 when that comes out (probably will be soon). There is a [bug](https://issues.apache.org/jira/browse/HDFS-15191) in 3.2.1 which affects wire compatibility between 3.2 clients and 2.x server. I'll update the PR description soon. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712588752 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712588337 **[Test build #130030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)** for PR 30057 at commit [`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
Ngone51 commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r508169611 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( +// Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = +new CacheLoader() { + public ShuffleIndexInformation load(File file) throws IOException { +return new ShuffleIndexInformation(file); + } +}; +indexCache = CacheBuilder.newBuilder() +.maximumWeight(conf.mergedIndexCacheSize()) +.weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) +.build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves + * the associated metadata. If
[GitHub] [spark] sunchao commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
sunchao commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712586049 > @sunchao, yes I think we can do that but would you mind creating a separate PR to fix the test first though? Using python3 with my workaround fix should be good enough. Currently the github action tests pass without the `python3` change, and the jenkins tests fail either w/ or w/o the `python3` change: in the latter case it fails with error such as: ``` 20/10/16 19:20:36 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (amp-jenkins-worker-03.amp executor 1): org.apache.spark.SparkException: Error from python worker: Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 104, in _run_module_as_main loader, code, fname = _get_module_details(mod_name) File "/usr/lib64/python2.6/runpy.py", line 79, in _get_module_details loader = get_loader(mod_name) File "/usr/lib64/python2.6/pkgutil.py", line 456, in get_loader return find_loader(fullname) File "/usr/lib64/python2.6/pkgutil.py", line 466, in find_loader for importer in iter_importers(fullname): File "/usr/lib64/python2.6/pkgutil.py", line 422, in iter_importers __import__(pkg) File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/__init__.py", line 53, in from pyspark.rdd import RDD, RDDBarrier File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/rdd.py", line 34, in from pyspark.java_gateway import local_connect_and_auth File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/java_gateway.py", line 29, in from py4j.java_gateway import java_import, JavaGateway, JavaObject, GatewayParameters File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 60 PY4J_TRUE = {"yes", "y", "t", "true"} ^ SyntaxError: invalid syntax ``` > Also, seems like we're going to split PR (?). The first one (this) is for preparation, and second one is actually bumping up to Hadoop version to 3.2.1 (?). Would you mind clarifying the plan and what this PR proposes in the description/title? Yes that's right. The plan is to have a separate PR bumping Hadoop version to 3.2.2 when that comes out (probably will be soon). There is a [bug](https://issues.apache.org/jira/browse/HDFS-15191) in 3.2.1 which affects wire compatibility between 3.2 clients and 2.x server. I'll update the PR description soon. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver
HeartSaVioR commented on a change in pull request #30024: URL: https://github.com/apache/spark/pull/30024#discussion_r508204618 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala ## @@ -58,5 +59,15 @@ object DriverRegistry extends Logging { } } } + + def get(className: String): Driver = { Review comment: This block actually contains one-line change: `case d: DriverWrapper if d.wrapped.getClass.getCanonicalName == className => d.wrapped` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
AmplabJenkins removed a comment on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712581742 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
AmplabJenkins commented on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712581742 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
SparkQA removed a comment on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712494713 **[Test build #130025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130025/testReport)** for PR 28026 at commit [`59d43de`](https://github.com/apache/spark/commit/59d43debc2301fccaffa54c924363242c9167ef7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
SparkQA commented on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712581228 **[Test build #130025 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130025/testReport)** for PR 28026 at commit [`59d43de`](https://github.com/apache/spark/commit/59d43debc2301fccaffa54c924363242c9167ef7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30076: [SPARK-32862][SS] Left semi stream-stream join
HeartSaVioR commented on a change in pull request #30076: URL: https://github.com/apache/spark/pull/30076#discussion_r508201850 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -99,13 +99,20 @@ class SymmetricHashJoinStateManager( /** * Get all the matched values for given join condition, with marking matched. * This method is designed to mark joined rows properly without exposing internal index of row. + * + * @param joinOnlyFirstTimeMatchedRow Only join with first-time matched row. Review comment: That would depend on whether the follow PR would revert the change done here (say, back and forth) or not. If it's likely to do the back-and-forth and the follow-up is not a big thing (like less than 100 lines of code change and could be done in a couple of days), then probably better to address here to save efforts on reviewing something which will be reverted later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver
HyukjinKwon commented on a change in pull request #30024: URL: https://github.com/apache/spark/pull/30024#discussion_r508201461 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistrySuite.scala ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.jdbc + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.execution.datasources.jdbc.connection.TestDriver + +class DriverRegistrySuite extends SparkFunSuite { + test("SPARK-32229: get must give back wrapped driver if wrapped") { Review comment: If it's just a simple refactoring, we wouldn't need a test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver
HyukjinKwon commented on a change in pull request #30024: URL: https://github.com/apache/spark/pull/30024#discussion_r508201158 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala ## @@ -58,5 +59,15 @@ object DriverRegistry extends Logging { } } } + + def get(className: String): Driver = { Review comment: I am okay either way but I more mean that I wanted to know how it relates to the PR description/title and JIRA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-712578947 **[Test build #130031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130031/testReport)** for PR 30097 at commit [`4792ebd`](https://github.com/apache/spark/commit/4792ebd71d20739ec8345ba2edd0e8d7b28dfd48). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
HyukjinKwon commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712578041 @sunchao, yes I think we can do that but would you mind creating a separate PR to fix the test first though? Using `python3` with my workaround fix should be good enough. Also, seems like we're going to split PR (?). The first one (this) is for preparation, and second one is actually bumping up to Hadoop version to 3.2.1 (?). Would you mind clarifying the plan and what this PR proposes in the description/title? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken opened a new pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
leanken opened a new pull request #30097: URL: https://github.com/apache/spark/pull/30097 ### What changes were proposed in this pull request? Since Issue [SPARK-33139](https://issues.apache.org/jira/browse/SPARK-33139) has been done, and SQLConf.get and SparkSession.active are more reliable. We are trying to refine the existing code usage of passing SQLConf and SparkSession into sub-class of Rule[QueryPlan]. In this PR. * remove SQLConf from ctor-parameter of all sub-class of Rule[QueryPlan]. * using SQLConf.get to replace the original SQLConf instance. * remove SparkSession from ctor-parameter of all sub-class of Rule[QueryPlan]. * using SparkSession.active to replace the original SparkSession instance. ### Why are the changes needed? Code refine. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29812: [SPARK-32941][SQL] Optimize UpdateFields expression chain and put the rule early in Analysis phase
HyukjinKwon commented on a change in pull request #29812: URL: https://github.com/apache/spark/pull/29812#discussion_r508195964 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala ## @@ -17,19 +17,68 @@ package org.apache.spark.sql.catalyst.optimizer -import org.apache.spark.sql.catalyst.expressions.UpdateFields +import java.util.Locale + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions.{Expression, UpdateFields, WithField} import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf /** - * Combines all adjacent [[UpdateFields]] expression into a single [[UpdateFields]] expression. + * Optimizes [[UpdateFields]] expression chains. */ -object CombineUpdateFields extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { +object OptimizeUpdateFields extends Rule[LogicalPlan] { + private def canOptimize(names: Seq[String]): Boolean = { +if (SQLConf.get.caseSensitiveAnalysis) { + names.distinct.length != names.length +} else { + names.map(_.toLowerCase(Locale.ROOT)).distinct.length != names.length +} + } + + val optimizeUpdateFields: PartialFunction[Expression, Expression] = { +case UpdateFields(structExpr, fieldOps) + if fieldOps.forall(_.isInstanceOf[WithField]) && +canOptimize(fieldOps.map(_.asInstanceOf[WithField].name)) => + val caseSensitive = SQLConf.get.caseSensitiveAnalysis + + val withFields = fieldOps.map(_.asInstanceOf[WithField]) + val names = withFields.map(_.name) + val values = withFields.map(_.valExpr) + + val newNames = mutable.ArrayBuffer.empty[String] + val newValues = mutable.ArrayBuffer.empty[Expression] + + if (caseSensitive) { +names.zip(values).reverse.foreach { case (name, value) => Review comment: I wonder if we could just do like: `collection.immutable.ListMap(names.zip(values): _*)` which will keep the last win here and keep the order of fields to use later. But I guess it's no big deal. Just saying. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29812: [SPARK-32941][SQL] Optimize UpdateFields expression chain and put the rule early in Analysis phase
HyukjinKwon commented on pull request #29812: URL: https://github.com/apache/spark/pull/29812#issuecomment-712573143 @viirya, BTW, do you mind fixing the PR description to explain what this PR specifically improves? > This patch proposes to add more optimization to `UpdateFields` expression chain. Seems like this PR does not describe what exactly optimizes. Is my understanding correct that this PR proposes two separate optimizations? - Deduplicates `WithField` at `UpdateFields` - Respect nullability in input struct at `GetStructField(UpdateFields(..., struct))`, and unwrap if-else. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
sunchao commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712572297 Thanks @HyukjinKwon for confirming that! I'll consider this PR passed all tests then. As next step I'll change the Hadoop version to 3.2.0 and bump up the version in a separate PR as discussed before. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #30076: [SPARK-32862][SS] Left semi stream-stream join
c21 commented on a change in pull request #30076: URL: https://github.com/apache/spark/pull/30076#discussion_r508191906 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -99,13 +99,20 @@ class SymmetricHashJoinStateManager( /** * Get all the matched values for given join condition, with marking matched. * This method is designed to mark joined rows properly without exposing internal index of row. + * + * @param joinOnlyFirstTimeMatchedRow Only join with first-time matched row. Review comment: IMO it would be good to add the early eviction for left side matched row, in a follow up PR. If people strongly think we should add that at the first place with this PR, I can add as well, but need more time to polish it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30076: [SPARK-32862][SS] Left semi stream-stream join
HeartSaVioR commented on a change in pull request #30076: URL: https://github.com/apache/spark/pull/30076#discussion_r508188143 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -99,13 +99,20 @@ class SymmetricHashJoinStateManager( /** * Get all the matched values for given join condition, with marking matched. * This method is designed to mark joined rows properly without exposing internal index of row. + * + * @param joinOnlyFirstTimeMatchedRow Only join with first-time matched row. Review comment: Or like `excludeRowsAlreadyMatched`. I guess we'd like to have another method to deal with left-semi join efficiently like I commented (and in the PR description as well). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
AmplabJenkins removed a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712559058 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…
SparkQA removed a comment on pull request #29874: URL: https://github.com/apache/spark/pull/29874#issuecomment-712518103 **[Test build #130028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130028/testReport)** for PR 29874 at commit [`ff644eb`](https://github.com/apache/spark/commit/ff644eb0c291a4ede8ac0237ae31a3fd68a22a8e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
SparkQA removed a comment on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712450230 **[Test build #130020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130020/testReport)** for PR 28026 at commit [`c7d5591`](https://github.com/apache/spark/commit/c7d5591c48e219e581d3907463619f642996b2b5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
SparkQA removed a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712507245 **[Test build #130027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130027/testReport)** for PR 29843 at commit [`bcd81b7`](https://github.com/apache/spark/commit/bcd81b72b6f13a9ee44d9e1bc83e72b014005f09). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…
AmplabJenkins removed a comment on pull request #29874: URL: https://github.com/apache/spark/pull/29874#issuecomment-712564310 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
AmplabJenkins removed a comment on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712551323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins removed a comment on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712552167 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…
AmplabJenkins commented on pull request #29874: URL: https://github.com/apache/spark/pull/29874#issuecomment-712564310 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…
SparkQA commented on pull request #29874: URL: https://github.com/apache/spark/pull/29874#issuecomment-712563758 **[Test build #130028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130028/testReport)** for PR 29874 at commit [`ff644eb`](https://github.com/apache/spark/commit/ff644eb0c291a4ede8ac0237ae31a3fd68a22a8e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #29906: [SPARK-32037][CORE] Rename blacklisting feature
Ngone51 commented on pull request #29906: URL: https://github.com/apache/spark/pull/29906#issuecomment-712561183 Shall we add `DeveloperApi` annotation to `SparkFirehoseListener` since we all agree it missed before? @tgravescs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29906: [SPARK-32037][CORE] Rename blacklisting feature
Ngone51 commented on a change in pull request #29906: URL: https://github.com/apache/spark/pull/29906#discussion_r508182684 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ## @@ -284,80 +284,127 @@ private[spark] class AppStatusListener( } override def onExecutorBlacklisted(event: SparkListenerExecutorBlacklisted): Unit = { -updateBlackListStatus(event.executorId, true) +updateExclusionStatus(event.executorId, true) + } + + override def onExecutorExcluded(event: SparkListenerExecutorExcluded): Unit = { +updateExclusionStatus(event.executorId, true) } override def onExecutorBlacklistedForStage( event: SparkListenerExecutorBlacklistedForStage): Unit = { -val now = System.nanoTime() +updateExclusionStatusForStage(event.stageId, event.stageAttemptId, event.executorId) + } -Option(liveStages.get((event.stageId, event.stageAttemptId))).foreach { stage => - setStageBlackListStatus(stage, now, event.executorId) -} -liveExecutors.get(event.executorId).foreach { exec => - addBlackListedStageTo(exec, event.stageId, now) -} + override def onExecutorExcludedForStage( + event: SparkListenerExecutorExcludedForStage): Unit = { +updateExclusionStatusForStage(event.stageId, event.stageAttemptId, event.executorId) } override def onNodeBlacklistedForStage(event: SparkListenerNodeBlacklistedForStage): Unit = { -val now = System.nanoTime() +updateNodeExclusionStatusForStage(event.stageId, event.stageAttemptId, event.hostId) + } -// Implicitly blacklist every available executor for the stage associated with this node -Option(liveStages.get((event.stageId, event.stageAttemptId))).foreach { stage => - val executorIds = liveExecutors.values.filter(_.host == event.hostId).map(_.executorId).toSeq - setStageBlackListStatus(stage, now, executorIds: _*) -} -liveExecutors.values.filter(_.hostname == event.hostId).foreach { exec => - addBlackListedStageTo(exec, event.stageId, now) -} + override def onNodeExcludedForStage(event: SparkListenerNodeExcludedForStage): Unit = { +updateNodeExclusionStatusForStage(event.stageId, event.stageAttemptId, event.hostId) } - private def addBlackListedStageTo(exec: LiveExecutor, stageId: Int, now: Long): Unit = { -exec.blacklistedInStages += stageId + private def addExcludedStageTo(exec: LiveExecutor, stageId: Int, now: Long): Unit = { +exec.excludedInStages += stageId liveUpdate(exec, now) } private def setStageBlackListStatus(stage: LiveStage, now: Long, executorIds: String*): Unit = { executorIds.foreach { executorId => val executorStageSummary = stage.executorSummary(executorId) - executorStageSummary.isBlacklisted = true + executorStageSummary.isExcluded = true + maybeUpdate(executorStageSummary, now) +} +stage.excludedExecutors ++= executorIds +maybeUpdate(stage, now) + } + + private def setStageExcludedStatus(stage: LiveStage, now: Long, executorIds: String*): Unit = { +executorIds.foreach { executorId => + val executorStageSummary = stage.executorSummary(executorId) + executorStageSummary.isExcluded = true maybeUpdate(executorStageSummary, now) } -stage.blackListedExecutors ++= executorIds +stage.excludedExecutors ++= executorIds maybeUpdate(stage, now) } override def onExecutorUnblacklisted(event: SparkListenerExecutorUnblacklisted): Unit = { -updateBlackListStatus(event.executorId, false) +updateExclusionStatus(event.executorId, false) + } + + override def onExecutorUnexcluded(event: SparkListenerExecutorUnexcluded): Unit = { +updateExclusionStatus(event.executorId, false) } override def onNodeBlacklisted(event: SparkListenerNodeBlacklisted): Unit = { Review comment: Do we need to implement these deprecated methods for internal listeners? I assume they are only used for external listeners. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
LuciferYang commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r507352925 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteJobStatsTrackerMetricSuite.scala ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.{LocalSparkSession, SparkSession} + +class BasicWriteJobStatsTrackerMetricSuite extends SparkFunSuite with LocalSparkSession { + + test("SPARK-32978: make sure the number of dynamic part metric is correct") { +try { + val partitions = "50" + spark = SparkSession.builder().master("local[4]").getOrCreate() + val statusStore = spark.sharedState.statusStore + val oldExecutionsSize = statusStore.executionsList().size + + spark.sql("create table dynamic_partition(i bigint, part bigint) " + +"using parquet partitioned by (part)").collect() + spark.sql("insert overwrite table dynamic_partition partition(part) " + +s"select id, id % $partitions as part from range(1)").collect() + + // Wait for listener to finish computing the metrics for the executions. + while (statusStore.executionsList().size - oldExecutionsSize < 4 || +statusStore.executionsList().last.metricValues == null) { +Thread.sleep(100) + } + + // There should be 4 SQLExecutionUIData in executionsList and the 3rd item is we need, Review comment: > why there are 4? is it because of collect? Yes, without `.collect` should be 2. > BTW can we call val oldExecutionsSize = statusStore.executionsList().size after create table? then we just need to wait for one SQLExecutionUIData. @cloud-fan Address 15c7519 fix this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver
HeartSaVioR commented on a change in pull request #30024: URL: https://github.com/apache/spark/pull/30024#discussion_r508181273 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala ## @@ -58,5 +59,15 @@ object DriverRegistry extends Logging { } } } + + def get(className: String): Driver = { Review comment: That said, why don't we look up `wrapperMap` before iterating through `DriverManager.getDrivers`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
AmplabJenkins commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712559058 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver
HeartSaVioR commented on a change in pull request #30024: URL: https://github.com/apache/spark/pull/30024#discussion_r508180571 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala ## @@ -58,5 +59,15 @@ object DriverRegistry extends Logging { } } } + + def get(className: String): Driver = { Review comment: I'm actually in favor of this change - DriverRegistry deals with wrapping on register, and this will also let DriverRegistry deal with unwrapping on get. JdbcUtils no longer needs to know about these details - it just needs to know that it should use `DriverRegistry` instead of `DriverManager`. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala ## @@ -58,5 +59,15 @@ object DriverRegistry extends Logging { } } } + + def get(className: String): Driver = { Review comment: I'm actually in favor of this change - `DriverRegistry` deals with wrapping on register, and this will also let `DriverRegistry` deal with unwrapping on get. `JdbcUtils` no longer needs to know about these details - it just needs to know that it should use `DriverRegistry` instead of `DriverManager`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
SparkQA commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712558282 **[Test build #130027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130027/testReport)** for PR 29843 at commit [`bcd81b7`](https://github.com/apache/spark/commit/bcd81b72b6f13a9ee44d9e1bc83e72b014005f09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712552167 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712552152 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34636/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
AmplabJenkins commented on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712551323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
Ngone51 commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r508173722 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( +// Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = +new CacheLoader() { + public ShuffleIndexInformation load(File file) throws IOException { +return new ShuffleIndexInformation(file); + } +}; +indexCache = CacheBuilder.newBuilder() +.maximumWeight(conf.mergedIndexCacheSize()) +.weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) +.build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves + * the associated metadata. If
[GitHub] [spark] SparkQA commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
SparkQA commented on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712550608 **[Test build #130020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130020/testReport)** for PR 28026 at commit [`c7d5591`](https://github.com/apache/spark/commit/c7d5591c48e219e581d3907463619f642996b2b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas
AmplabJenkins removed a comment on pull request #29818: URL: https://github.com/apache/spark/pull/29818#issuecomment-712549555 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
AmplabJenkins removed a comment on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712549315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas
AmplabJenkins commented on pull request #29818: URL: https://github.com/apache/spark/pull/29818#issuecomment-712549555 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
AmplabJenkins commented on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712549315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas
SparkQA removed a comment on pull request #29818: URL: https://github.com/apache/spark/pull/29818#issuecomment-712461557 **[Test build #130021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130021/testReport)** for PR 29818 at commit [`c114166`](https://github.com/apache/spark/commit/c114166afb682081f99b8893bce60bbd38560b3e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
SparkQA commented on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712549297 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34635/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas
SparkQA commented on pull request #29818: URL: https://github.com/apache/spark/pull/29818#issuecomment-712548936 **[Test build #130021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130021/testReport)** for PR 29818 at commit [`c114166`](https://github.com/apache/spark/commit/c114166afb682081f99b8893bce60bbd38560b3e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver
HyukjinKwon commented on a change in pull request #30024: URL: https://github.com/apache/spark/pull/30024#discussion_r508171138 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala ## @@ -58,5 +59,15 @@ object DriverRegistry extends Logging { } } } + + def get(className: String): Driver = { Review comment: The change seems okay but why do we need to do? It just moves the codes around. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
AmplabJenkins removed a comment on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-712536768 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130026/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
AmplabJenkins removed a comment on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-712536767 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
SparkQA removed a comment on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-712502294 **[Test build #130026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130026/testReport)** for PR 30062 at commit [`fbdd333`](https://github.com/apache/spark/commit/fbdd33385083adb4be83adf46cd518d519650307). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
Ngone51 commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r508170165 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( +// Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = +new CacheLoader() { + public ShuffleIndexInformation load(File file) throws IOException { +return new ShuffleIndexInformation(file); + } +}; +indexCache = CacheBuilder.newBuilder() +.maximumWeight(conf.mergedIndexCacheSize()) +.weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) +.build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves + * the associated metadata. If
[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
Ngone51 commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r508169611 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( +// Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = +new CacheLoader() { + public ShuffleIndexInformation load(File file) throws IOException { +return new ShuffleIndexInformation(file); + } +}; +indexCache = CacheBuilder.newBuilder() +.maximumWeight(conf.mergedIndexCacheSize()) +.weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) +.build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves + * the associated metadata. If
[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712545891 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34636/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()
HyukjinKwon closed pull request #29831: URL: https://github.com/apache/spark/pull/29831 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()
HyukjinKwon commented on pull request #29831: URL: https://github.com/apache/spark/pull/29831#issuecomment-712543428 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
HyukjinKwon edited a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024 > BTW I'm still not sure why my PR will trigger the YARN/Python test failures - seems it shouldn't be related. This is because the regular test cases do not trigger the Yarn test cases. I am sure it was already broken before (in Jenkins). The relevant YARN test cases are triggered when the PR has some changes _only in YARN side_. See also https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615 See also https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130016/testReport/ at https://github.com/apache/spark/pull/29906 cc @tgravescs FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
HyukjinKwon edited a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024 > BTW I'm still not sure why my PR will trigger the YARN/Python test failures - seems it shouldn't be related. This is because the regular test cases do not trigger the Yarn test cases. I am sure it was already broken before (in Jenkins). The relevant YARN test cases are triggered when the PR has some changes _only in YARN side_. See also https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615 See also https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130016/testReport/ at https://github.com/apache/spark/pull/29906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
SparkQA commented on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712542212 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34635/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
HyukjinKwon edited a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024 > BTW I'm still not sure why my PR will trigger the YARN/Python test failures - seems it shouldn't be related. This is because the regular test cases do not trigger the Yarn test cases. I am sure it was already broken before (in Jenkins). The relevant YARN test cases are triggered when the PR has some changes _only in YARN side_. See also https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
HyukjinKwon commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024 > BTW I'm still not sure why my PR will trigger the YARN/Python test failures - seems it shouldn't be related. This is because the regular test cases do not trigger the Yarn test cases. I am sure it was already broken before. The relevant YARN test cases are triggered when the PR has some changes _only in YARN side_. See also https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
HyukjinKwon commented on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-712538725 @sunchao, I more meant: from my observation, looks like Jenkins' `python` executable is 2 in Jenkins. Yes, so looks like we should probably switch `python` to Python 3 in Jenkins, cc @shaneknapp. Usually I would prefer to separate env issues from the codes so we can run separately. Also, from what I know Shane is busy for training to have a backup engineer right now. So, I think changing to `python3` seems fine for the time being as a workaround. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
viirya commented on a change in pull request #30093: URL: https://github.com/apache/spark/pull/30093#discussion_r508160798 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1056,8 +1058,14 @@ object EliminateSorts extends Rule[LogicalPlan] { case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) => val newOrders = orders.filterNot(_.child.foldable) if (newOrders.isEmpty) child else s.copy(order = newOrders) -case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) => - child +case s @ Sort(orders, global, child) +if SortOrder.orderingSatisfies(child.outputOrdering, orders) => + (global, child) match { +case (false, _) => child +case (true, r: Range) => r Review comment: This assumes we know Range's global ordering in advance. This seems to leak physical stuff into the Optimizer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
AmplabJenkins commented on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-712536767 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
SparkQA commented on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-712536396 **[Test build #130026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130026/testReport)** for PR 30062 at commit [`fbdd333`](https://github.com/apache/spark/commit/fbdd33385083adb4be83adf46cd518d519650307). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
viirya commented on a change in pull request #30093: URL: https://github.com/apache/spark/pull/30093#discussion_r508159260 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1056,8 +1058,14 @@ object EliminateSorts extends Rule[LogicalPlan] { case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) => val newOrders = orders.filterNot(_.child.foldable) if (newOrders.isEmpty) child else s.copy(order = newOrders) -case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) => Review comment: This was added in 2.4. cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
AmplabJenkins removed a comment on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712532723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
AmplabJenkins commented on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712532723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
SparkQA removed a comment on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712529719 **[Test build #130029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130029/testReport)** for PR 30095 at commit [`dbb8111`](https://github.com/apache/spark/commit/dbb811140f80a30ea5b781e266443c7a76483564). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
SparkQA commented on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712532633 **[Test build #130029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130029/testReport)** for PR 30095 at commit [`dbb8111`](https://github.com/apache/spark/commit/dbb811140f80a30ea5b781e266443c7a76483564). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-712531902 **[Test build #130030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)** for PR 30057 at commit [`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
AmplabJenkins removed a comment on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712517112 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130024/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax
AmplabJenkins removed a comment on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-712517108 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference
AmplabJenkins removed a comment on pull request #30095: URL: https://github.com/apache/spark/pull/30095#issuecomment-712475253 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
AmplabJenkins removed a comment on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-712518597 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas
SparkQA removed a comment on pull request #29818: URL: https://github.com/apache/spark/pull/29818#issuecomment-712398796 **[Test build #130015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130015/testReport)** for PR 29818 at commit [`1b875c1`](https://github.com/apache/spark/commit/1b875c19e3c6318fcb26c1ea62b5397d6e75d1f8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org