[GitHub] [spark] holdenk opened a new pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable
holdenk opened a new pull request #30155: URL: https://github.com/apache/spark/pull/30155 ### What changes were proposed in this pull request? Make pod allocation executor timeouts configurable. Keep all known pods in mind when allocating executors to avoid over allocating if the pending time is much higher than the allocation interval. ### Why are the changes needed? The current executor timeouts do not match that of all real world clusters especially under load. While this can be worked around by increasing the allocation batch delay, that will decrease the speed at which the total number of executors will be able to be requested. ### Does this PR introduce _any_ user-facing change? Yes new configuration property ### How was this patch tested? Updated existing test to use the timeout from the new configuration property. Verified test failed without the update. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
AmplabJenkins removed a comment on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716870239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
AmplabJenkins commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716870239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
SparkQA removed a comment on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716748700 **[Test build #130296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)** for PR 30093 at commit [`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
SparkQA commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716869601 **[Test build #130296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)** for PR 30093 at commit [`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao opened a new pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
huaxingao opened a new pull request #30154: URL: https://github.com/apache/spark/pull/30154 ### What changes were proposed in this pull request? Currently in JDBCTableCatalog, we ignore the table options when creating table. ``` // TODO (SPARK-32405): Apply table options while creating tables in JDBC Table Catalog if (!properties.isEmpty) { logWarning("Cannot create JDBC table with properties, these properties will be " + "ignored: " + properties.asScala.map { case (k, v) => s"$k=$v" }.mkString("[", ", ", "]")) } ``` ### Why are the changes needed? need to apply the table options when we create table ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add new test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite
SparkQA removed a comment on pull request #30130: URL: https://github.com/apache/spark/pull/30130#issuecomment-716860126 **[Test build #130299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130299/testReport)** for PR 30130 at commit [`01666ef`](https://github.com/apache/spark/commit/01666ef027ff48e4af567ad29dd6fa8655cf5bea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite
AmplabJenkins removed a comment on pull request #30130: URL: https://github.com/apache/spark/pull/30130#issuecomment-716865122 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite
AmplabJenkins commented on pull request #30130: URL: https://github.com/apache/spark/pull/30130#issuecomment-716865122 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite
SparkQA commented on pull request #30130: URL: https://github.com/apache/spark/pull/30130#issuecomment-716864965 **[Test build #130299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130299/testReport)** for PR 30130 at commit [`01666ef`](https://github.com/apache/spark/commit/01666ef027ff48e4af567ad29dd6fa8655cf5bea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
HeartSaVioR commented on a change in pull request #26935: URL: https://github.com/apache/spark/pull/26935#discussion_r512313839 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala ## @@ -89,16 +116,16 @@ trait StateStore { def commit(): Long /** - * Abort all the updates that have been made to the store. Implementations should ensure that - * no more updates (puts, removes) can be after an abort in order to avoid incorrect usage. + * Return an iterator containing all the key-value pairs in the StateStore. Implementations must + * ensure that updates (puts, removes) can be made while iterating over this iterator. */ - def abort(): Unit + override def iterator(): Iterator[UnsafeRowPair] Review comment: Yes. Unfortunately we've figured out the requirements of `iterator` implementation is a bit different between the two. That said, someone could leverage the fact that iterator in ReadOnlyState doesn't need to care about updates. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
HeartSaVioR commented on a change in pull request #26935: URL: https://github.com/apache/spark/pull/26935#discussion_r512312263 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala ## @@ -81,6 +74,40 @@ trait StateStore { iterator() } + /** Return an iterator containing all the key-value pairs in the StateStore. */ + def iterator(): Iterator[UnsafeRowPair] + + /** + * Clean up the resource. + * + * The method name is to respect backward compatibility on [[StateStore]]. Review comment: `abort` is only the one which is not properly named in point of ReadStateStore. It should be `close`, but we found it tricky to respect backward compatibility with having `close`, hence we decided to leave it as it is. Please refer https://github.com/apache/spark/pull/26935#discussion_r493236101 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29970: [SPARK-33087][SQL] DataFrameWriterV2 should delegate table resolution to the analyzer
SparkQA commented on pull request #29970: URL: https://github.com/apache/spark/pull/29970#issuecomment-716862852 **[Test build #130300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130300/testReport)** for PR 29970 at commit [`ee3587a`](https://github.com/apache/spark/commit/ee3587ad1a1654ea89b1064d3407052b66cabe2e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29970: [SPARK-33087][SQL] DataFrameWriterV2 should delegate table resolution to the analyzer
HeartSaVioR commented on pull request #29970: URL: https://github.com/apache/spark/pull/29970#issuecomment-716861018 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
HeartSaVioR commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-716860727 cc. @cloud-fan @rdblue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite
SparkQA commented on pull request #30130: URL: https://github.com/apache/spark/pull/30130#issuecomment-716860126 **[Test build #130299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130299/testReport)** for PR 30130 at commit [`01666ef`](https://github.com/apache/spark/commit/01666ef027ff48e4af567ad29dd6fa8655cf5bea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite
dongjoon-hyun commented on pull request #30130: URL: https://github.com/apache/spark/pull/30130#issuecomment-716859380 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
dongjoon-hyun closed pull request #30153: URL: https://github.com/apache/spark/pull/30153 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
dongjoon-hyun commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716858640 Thank you, @viirya ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512300825 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,899 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves the + * associated metadata. If not present and the
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512298080 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadata.
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins removed a comment on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716850189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
dongjoon-hyun commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716850248 Hi, @holdenk and @viirya and @srowen . Could you review this PR, please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716850189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716850170 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34899/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30140: [SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
maropu commented on pull request #30140: URL: https://github.com/apache/spark/pull/30140#issuecomment-716846073 okay, I will today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512281688 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -94,6 +95,9 @@ static final String STOP_ON_FAILURE_KEY = "spark.yarn.shuffle.stopOnFailure"; private static final boolean DEFAULT_STOP_ON_FAILURE = false; + // Used by shuffle merge manager to create merged shuffle files. + protected static final String APP_BASE_RELATIVE_PATH = "usercache/%s/appcache/%s/"; Review comment: yeah, it does feel a bit odd here to hard code this here. the code inside RemoteBlockPushResolver assumes it has a specific format of with the user and appid as well. I don't have any other great solution to get this from yarn, but I think it should be documented what the %s values are and assumed to be when passed into the class. ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. +
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716841256 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34899/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512285083 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadata.
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512280774 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws Exception { } TransportConf transportConf = new TransportConf("shuffle", new HadoopConfigProvider(conf)); - blockHandler = new ExternalBlockHandler(transportConf, registeredExecutorFile); + shuffleMergeManager = new RemoteBlockPushResolver(transportConf, APP_BASE_RELATIVE_PATH); Review comment: Yes, doesn't seem to be a big change to have a config that is a class. I will add that with this change. @tgravescs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512279242 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadata.
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716825697 **[Test build #130298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130298/testReport)** for PR 30153 at commit [`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA removed a comment on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716820189 **[Test build #130298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130298/testReport)** for PR 30153 at commit [`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716825889 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins removed a comment on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716825889 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716820189 **[Test build #130298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130298/testReport)** for PR 30153 at commit [`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
dongjoon-hyun commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716817713 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Victsm commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
Victsm commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512259560 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadata.
[GitHub] [spark] zero323 commented on pull request #30150: [SPARK-32188][PYTHON][DOCS][FOLLOW-UP] Document Column APIs in API reference
zero323 commented on pull request #30150: URL: https://github.com/apache/spark/pull/30150#issuecomment-716807149 LGTM. On a side note, it highlights how sparse are `Column` docs (and that we miss even basic examples for dunder methods). Do we have any ticket to track that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #30131: [SPARK-33220][CORE]Use `scheduleWithFixedDelay` to avoid repeated unnecessary scheduling for a short time
mridulm commented on pull request #30131: URL: https://github.com/apache/spark/pull/30131#issuecomment-716804124 +CC @otterc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins removed a comment on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716803034 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34898/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins removed a comment on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716803022 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716803000 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34898/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716803022 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512238046 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated
[GitHub] [spark] SparkQA commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
SparkQA commented on pull request #30152: URL: https://github.com/apache/spark/pull/30152#issuecomment-716787342 **[Test build #130295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130295/testReport)** for PR 30152 at commit [`0ae050b`](https://github.com/apache/spark/commit/0ae050b727572c2ba39b1269883bb177fc5e7838). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
AmplabJenkins commented on pull request #30152: URL: https://github.com/apache/spark/pull/30152#issuecomment-716788005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
SparkQA removed a comment on pull request #30152: URL: https://github.com/apache/spark/pull/30152#issuecomment-716677497 **[Test build #130295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130295/testReport)** for PR 30152 at commit [`0ae050b`](https://github.com/apache/spark/commit/0ae050b727572c2ba39b1269883bb177fc5e7838). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512232873 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716789761 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34898/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
AmplabJenkins removed a comment on pull request #30152: URL: https://github.com/apache/spark/pull/30152#issuecomment-716788005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512229942 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,899 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves the + * associated metadata. If not present and the
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
AmplabJenkins removed a comment on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716784916 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
SparkQA commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716784902 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34897/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
AmplabJenkins commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716784916 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
viirya commented on pull request #30152: URL: https://github.com/apache/spark/pull/30152#issuecomment-716782854 Thanks @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512225133 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws Exception { } TransportConf transportConf = new TransportConf("shuffle", new HadoopConfigProvider(conf)); - blockHandler = new ExternalBlockHandler(transportConf, registeredExecutorFile); + shuffleMergeManager = new RemoteBlockPushResolver(transportConf, APP_BASE_RELATIVE_PATH); Review comment: I was just thinking it would be easier to create the config now that is a class that is either the push based or the Noop one. Then a user could also just specify a different class that is their custom one. I'm fine with doing that separate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512221375 ## File path: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java ## @@ -363,4 +363,26 @@ public boolean useOldFetchProtocol() { return conf.getBoolean("spark.shuffle.useOldFetchProtocol", false); } + /** + * The minimum size of a chunk when dividing a merged shuffle file into multiple chunks during + * push-based shuffle. + * A merged shuffle file consists of multiple small shuffle blocks. Fetching the + * complete merged shuffle file in a single response increases the memory requirements for the + * clients. Instead of serving the entire merged file, the shuffle service serves the + * merged file in `chunks`. A `chunk` constitutes few shuffle blocks in entirety and this + * configuration controls how big a chunk can get. A corresponding index file for each merged + * shuffle file will be generated indicating chunk boundaries. + */ + public int minChunkSizeInMergedShuffleFile() { +return Ints.checkedCast(JavaUtils.byteStringAsBytes( + conf.get("spark.shuffle.server.minChunkSizeInMergedShuffleFile", "2m"))); Review comment: yeah agree, I think leave on server side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512220482 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,899 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves the + * associated metadata. If not present and
[GitHub] [spark] dongjoon-hyun commented on pull request #30140: [SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
dongjoon-hyun commented on pull request #30140: URL: https://github.com/apache/spark/pull/30140#issuecomment-716778229 @maropu . Could you make a backporting PR to branch-2.4? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
dongjoon-hyun closed pull request #30152: URL: https://github.com/apache/spark/pull/30152 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
dongjoon-hyun commented on pull request #30152: URL: https://github.com/apache/spark/pull/30152#issuecomment-71680 Merged to branch-2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan
dongjoon-hyun commented on pull request #30152: URL: https://github.com/apache/spark/pull/30152#issuecomment-716777364 In Jenkins, Python tests passed. ``` ... Finished test(python3.6): pyspark.ml.tests (187s) Tests passed in 951 seconds ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30141: [SPARK-33230][SQL] Hadoop committers to get unique job ID in "spark.sql.sources.writeJobUUID"
dongjoon-hyun commented on pull request #30141: URL: https://github.com/apache/spark/pull/30141#issuecomment-716775382 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #30141: [SPARK-33230][SQL] Hadoop committers to get unique job ID in "spark.sql.sources.writeJobUUID"
dongjoon-hyun closed pull request #30141: URL: https://github.com/apache/spark/pull/30141 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA removed a comment on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716768684 **[Test build #130297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130297/testReport)** for PR 30153 at commit [`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins removed a comment on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716774481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
AmplabJenkins commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716774481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716774302 **[Test build #130297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130297/testReport)** for PR 30153 at commit [`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
SparkQA commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716770165 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34897/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
SparkQA commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716768684 **[Test build #130297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130297/testReport)** for PR 30153 at commit [`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
dongjoon-hyun commented on pull request #30153: URL: https://github.com/apache/spark/pull/30153#issuecomment-716766241 Hi, @shaneknapp . I made a PR from Spark side first. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
dongjoon-hyun opened a new pull request #30153: URL: https://github.com/apache/spark/pull/30153 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716762323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716762323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716583128 **[Test build #130293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130293/testReport)** for PR 30145 at commit [`82e901f`](https://github.com/apache/spark/commit/82e901f95316efa31d5d55867bf0152e99d94b70). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716761264 **[Test build #130293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130293/testReport)** for PR 30145 at commit [`82e901f`](https://github.com/apache/spark/commit/82e901f95316efa31d5d55867bf0152e99d94b70). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)
AmplabJenkins removed a comment on pull request #30089: URL: https://github.com/apache/spark/pull/30089#issuecomment-716753175 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)
AmplabJenkins commented on pull request #30089: URL: https://github.com/apache/spark/pull/30089#issuecomment-716753175 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)
SparkQA removed a comment on pull request #30089: URL: https://github.com/apache/spark/pull/30089#issuecomment-716640452 **[Test build #130294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130294/testReport)** for PR 30089 at commit [`c9f264a`](https://github.com/apache/spark/commit/c9f264a7a5d5e568f74b4147646432f98536df1f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)
SparkQA commented on pull request #30089: URL: https://github.com/apache/spark/pull/30089#issuecomment-716752086 **[Test build #130294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130294/testReport)** for PR 30089 at commit [`c9f264a`](https://github.com/apache/spark/commit/c9f264a7a5d5e568f74b4147646432f98536df1f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core
AmplabJenkins commented on pull request #29414: URL: https://github.com/apache/spark/pull/29414#issuecomment-716749723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716749652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716749652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core
AmplabJenkins removed a comment on pull request #29414: URL: https://github.com/apache/spark/pull/29414#issuecomment-716749723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716546948 **[Test build #130290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130290/testReport)** for PR 30145 at commit [`feeae94`](https://github.com/apache/spark/commit/feeae94b1660ad8fd086a9d37ac2719e87db97a0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
SparkQA commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-716748700 **[Test build #130296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)** for PR 30093 at commit [`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core
SparkQA removed a comment on pull request #29414: URL: https://github.com/apache/spark/pull/29414#issuecomment-716546987 **[Test build #130291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130291/testReport)** for PR 29414 at commit [`d68066e`](https://github.com/apache/spark/commit/d68066eddbfde21c00e33a32763976242371166d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-716748379 **[Test build #130290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130290/testReport)** for PR 30145 at commit [`feeae94`](https://github.com/apache/spark/commit/feeae94b1660ad8fd086a9d37ac2719e87db97a0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core
SparkQA commented on pull request #29414: URL: https://github.com/apache/spark/pull/29414#issuecomment-716748439 **[Test build #130291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130291/testReport)** for PR 29414 at commit [`d68066e`](https://github.com/apache/spark/commit/d68066eddbfde21c00e33a32763976242371166d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-716744287 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-716744287 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait
AmplabJenkins removed a comment on pull request #30134: URL: https://github.com/apache/spark/pull/30134#issuecomment-716743923 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait
AmplabJenkins commented on pull request #30134: URL: https://github.com/apache/spark/pull/30134#issuecomment-716743923 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-716538184 **[Test build #130289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130289/testReport)** for PR 30097 at commit [`cd3329f`](https://github.com/apache/spark/commit/cd3329fadfb7bfcfae97b945d4ac6b8b346d38db). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-716743114 **[Test build #130289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130289/testReport)** for PR 30097 at commit [`cd3329f`](https://github.com/apache/spark/commit/cd3329fadfb7bfcfae97b945d4ac6b8b346d38db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait
SparkQA commented on pull request #30134: URL: https://github.com/apache/spark/pull/30134#issuecomment-716742913 **[Test build #130292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130292/testReport)** for PR 30134 at commit [`bd2e8bc`](https://github.com/apache/spark/commit/bd2e8bcb351c723a6be51d38b81959bdc794e45a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait
SparkQA removed a comment on pull request #30134: URL: https://github.com/apache/spark/pull/30134#issuecomment-716551877 **[Test build #130292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130292/testReport)** for PR 30134 at commit [`bd2e8bc`](https://github.com/apache/spark/commit/bd2e8bcb351c723a6be51d38b81959bdc794e45a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
allisonwang-db commented on a change in pull request #30093: URL: https://github.com/apache/spark/pull/30093#discussion_r512162304 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1242,6 +1242,13 @@ object SQLConf { .booleanConf .createWithDefault(true) + val REMOVE_REDUNDANT_SORTS_ENABLED = buildConf("spark.sql.execution.removeRedundantSorts") +.internal() +.doc("Whether to remove redundant physical sort node") +.version("3.1.0") Review comment: I am not exactly sure if it's better to change it in this PR or to change it when this PR is backported to 2.4.8 (in case the current change does not work in 2.4.8) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts
allisonwang-db commented on a change in pull request #30093: URL: https://github.com/apache/spark/pull/30093#discussion_r512162304 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1242,6 +1242,13 @@ object SQLConf { .booleanConf .createWithDefault(true) + val REMOVE_REDUNDANT_SORTS_ENABLED = buildConf("spark.sql.execution.removeRedundantSorts") +.internal() +.doc("Whether to remove redundant physical sort node") +.version("3.1.0") Review comment: I am not exactly sure if it's better to change it in this PR or to change it when this PR is backported to 2.4.8. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org