[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r517681080 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,959 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. Review comment: yeah the main thing is we are trying to keep those dependencies to a minimum so they don't conflict with other things and keep size as small as possible because you are running this as part of node manager in yarns case. tags shouldn't have any other dependencies and it should be small and not much change of it conflicting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r517666337 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,959 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. Review comment: ah you are right we don't include it. spark-tags module is pretty small though and just has the annotations. I would think that would be ok to include here. @mridulm @Ngone51 any objections? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r517652076 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,959 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. Review comment: yes javadoc is fine similar to https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java#L29 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r517620874 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,959 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. Review comment: this and other new classes (including MergedShuffleFileManager) should have the: * @since 3.1.0 or @Since("3.1.0") depending on java/scala. Also sorry if this has been discussed do we want to marke the new API as Evolving? Note I'm fine with doing this as separate issue. ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,959 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentM
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r514454561 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadat
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r513498062 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadat
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512281688 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -94,6 +95,9 @@ static final String STOP_ON_FAILURE_KEY = "spark.yarn.shuffle.stopOnFailure"; private static final boolean DEFAULT_STOP_ON_FAILURE = false; + // Used by shuffle merge manager to create merged shuffle files. + protected static final String APP_BASE_RELATIVE_PATH = "usercache/%s/appcache/%s/"; Review comment: yeah, it does feel a bit odd here to hard code this here. the code inside RemoteBlockPushResolver assumes it has a specific format of with the user and appid as well. I don't have any other great solution to get this from yarn, but I think it should be documented what the %s values are and assumed to be when passed into the class. ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512238046 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadat
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512232873 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,905 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap> partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given the appShuffleId and reduceId that uniquely identifies a given shuffle partition of an + * application, retrieves the associated metadat
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512225133 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws Exception { } TransportConf transportConf = new TransportConf("shuffle", new HadoopConfigProvider(conf)); - blockHandler = new ExternalBlockHandler(transportConf, registeredExecutorFile); + shuffleMergeManager = new RemoteBlockPushResolver(transportConf, APP_BASE_RELATIVE_PATH); Review comment: I was just thinking it would be easier to create the config now that is a class that is either the push based or the Noop one. Then a user could also just specify a different class that is their custom one. I'm fine with doing that separate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512221375 ## File path: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java ## @@ -363,4 +363,26 @@ public boolean useOldFetchProtocol() { return conf.getBoolean("spark.shuffle.useOldFetchProtocol", false); } + /** + * The minimum size of a chunk when dividing a merged shuffle file into multiple chunks during + * push-based shuffle. + * A merged shuffle file consists of multiple small shuffle blocks. Fetching the + * complete merged shuffle file in a single response increases the memory requirements for the + * clients. Instead of serving the entire merged file, the shuffle service serves the + * merged file in `chunks`. A `chunk` constitutes few shuffle blocks in entirety and this + * configuration controls how big a chunk can get. A corresponding index file for each merged + * shuffle file will be generated indicating chunk boundaries. + */ + public int minChunkSizeInMergedShuffleFile() { +return Ints.checkedCast(JavaUtils.byteStringAsBytes( + conf.get("spark.shuffle.server.minChunkSizeInMergedShuffleFile", "2m"))); Review comment: yeah agree, I think leave on server side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r512220482 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,899 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private final LoadingCache indexCache; + + @SuppressWarnings("UnstableApiUsage") + public RemoteBlockPushResolver(TransportConf conf, String relativeMergeDirPathPattern) { +this.conf = conf; +this.partitions = Maps.newConcurrentMap(); +this.appsPathInfo = Maps.newConcurrentMap(); +this.directoryCleaner = Executors.newSingleThreadExecutor( + // Add `spark` prefix because it will run in NM in Yarn mode. + NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner")); +this.minChunkSize = conf.minChunkSizeInMergedShuffleFile(); +CacheLoader indexCacheLoader = + new CacheLoader() { +public ShuffleIndexInformation load(File file) throws IOException { + return new ShuffleIndexInformation(file); +} + }; +indexCache = CacheBuilder.newBuilder() + .maximumWeight(conf.mergedIndexCacheSize()) + .weigher((Weigher) (file, indexInfo) -> indexInfo.getSize()) + .build(indexCacheLoader); +this.relativeMergeDirPathPattern = relativeMergeDirPathPattern; +this.errorHandler = new ErrorHandler.BlockPushErrorHandler(); + } + + /** + * Given an ID that uniquely identifies a given shuffle partition of an application, retrieves the + * associated metadata. If not present and th
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r511095790 ## File path: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java ## @@ -363,4 +363,26 @@ public boolean useOldFetchProtocol() { return conf.getBoolean("spark.shuffle.useOldFetchProtocol", false); } + /** + * The minimum size of a chunk when dividing a merged shuffle file into multiple chunks during + * push-based shuffle. + * A merged shuffle file consists of multiple small shuffle blocks. Fetching the + * complete merged shuffle file in a single response increases the memory requirements for the + * clients. Instead of serving the entire merged file, the shuffle service serves the + * merged file in `chunks`. A `chunk` constitutes few shuffle blocks in entirety and this + * configuration controls how big a chunk can get. A corresponding index file for each merged + * shuffle file will be generated indicating chunk boundaries. + */ + public int minChunkSizeInMergedShuffleFile() { +return Ints.checkedCast(JavaUtils.byteStringAsBytes( + conf.get("spark.shuffle.server.minChunkSizeInMergedShuffleFile", "2m"))); + } + + /** + * The size of cache used in push-based shuffle for storing merged index files. Review comment: we should specify if cache is memory or disk. ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -0,0 +1,899 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.BufferedOutputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.Iterator; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.Executor; +import java.util.concurrent.Executors; + +import com.google.common.base.Objects; +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.cache.Weigher; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import com.google.common.primitives.Longs; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.protocol.Encoders; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.MergeStatuses; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.JavaUtils; +import org.apache.spark.network.util.NettyUtils; +import org.apache.spark.network.util.TransportConf; + +/** + * An implementation of {@link MergedShuffleFileManager} that provides the most essential shuffle + * service processing logic to support push based shuffle. + */ +public class RemoteBlockPushResolver implements MergedShuffleFileManager { + + private static final Logger logger = LoggerFactory.getLogger(RemoteBlockPushResolver.class); + private static final String MERGE_MANAGER_DIR = "merge_manager"; + + private final ConcurrentMap appsPathInfo; + private final ConcurrentMap partitions; + + private final Executor directoryCleaner; + private final TransportConf conf; + private final int minChunkSize; + private final String relativeMergeDirPathPattern; + private final ErrorHandler.BlockPushErrorHandler errorHandler; + + @SuppressWarnings("UnstableApiUsage") + private
[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
tgravescs commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r511094756 ## File path: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java ## @@ -363,4 +363,26 @@ public boolean useOldFetchProtocol() { return conf.getBoolean("spark.shuffle.useOldFetchProtocol", false); } + /** + * The minimum size of a chunk when dividing a merged shuffle file into multiple chunks during + * push-based shuffle. + * A merged shuffle file consists of multiple small shuffle blocks. Fetching the + * complete merged shuffle file in a single response increases the memory requirements for the + * clients. Instead of serving the entire merged file, the shuffle service serves the + * merged file in `chunks`. A `chunk` constitutes few shuffle blocks in entirety and this + * configuration controls how big a chunk can get. A corresponding index file for each merged + * shuffle file will be generated indicating chunk boundaries. + */ + public int minChunkSizeInMergedShuffleFile() { +return Ints.checkedCast(JavaUtils.byteStringAsBytes( + conf.get("spark.shuffle.server.minChunkSizeInMergedShuffleFile", "2m"))); Review comment: Thanks for making the comment here as I was just wondering the same thing. How much difference have you seen in tuning this parameter? I assume if its not a static here for the server, then it would have to be passed in as a parameter during fetching. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org