This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push: new 12f4492 [SPARK-30512] Added a dedicated boss event loop group 12f4492 is described below commit 12f449220dde0b8476f58806f7dee06fcb54da87 Author: Chandni Singh <chsi...@linkedin.com> AuthorDate: Wed Jan 29 15:02:48 2020 -0600 [SPARK-30512] Added a dedicated boss event loop group ### What changes were proposed in this pull request? Adding a dedicated boss event loop group to the Netty pipeline in the External Shuffle Service to avoid the delay in channel registration. ``` EventLoopGroup bossGroup = NettyUtils.createEventLoop(ioMode, 1, conf.getModuleName() + "-boss"); EventLoopGroup workerGroup = NettyUtils.createEventLoop(ioMode, conf.serverThreads(), conf.getModuleName() + "-server"); bootstrap = new ServerBootstrap() .group(bossGroup, workerGroup) .channel(NettyUtils.getServerChannelClass(ioMode)) .option(ChannelOption.ALLOCATOR, allocator) ``` ### Why are the changes needed? We have been seeing a large number of SASL authentication (RPC requests) timing out with the external shuffle service. ``` java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout waiting for task. at org.spark-project.guava.base.Throwables.propagate(Throwables.java:160) at org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:278) at org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:80) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) at org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:181) at org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:141) at org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:218) ``` The investigation that we have done is described here: https://github.com/netty/netty/issues/9890 After adding `LoggingHandler` to the netty pipeline, we saw that the registration of the channel was getting delay which is because the worker threads are busy with the existing channels. ### Does this PR introduce any user-facing change? No ### How was this patch tested? We have tested the patch on our clusters and with a stress testing tool. After this change, we didn't see any SASL requests timing out. Existing unit tests pass. Closes #27240 from otterc/SPARK-30512. Authored-by: Chandni Singh <chsi...@linkedin.com> Signed-off-by: Thomas Graves <tgra...@apache.org> (cherry picked from commit 6b47ace27d04012bcff47951ea1eea2aa6fb7d60) Signed-off-by: Thomas Graves <tgra...@apache.org> --- .../main/java/org/apache/spark/network/server/TransportServer.java | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java index 9c85ab2..da3fe30 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java +++ b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java @@ -91,9 +91,10 @@ public class TransportServer implements Closeable { private void init(String hostToBind, int portToBind) { IOMode ioMode = IOMode.valueOf(conf.ioMode()); - EventLoopGroup bossGroup = - NettyUtils.createEventLoop(ioMode, conf.serverThreads(), conf.getModuleName() + "-server"); - EventLoopGroup workerGroup = bossGroup; + EventLoopGroup bossGroup = NettyUtils.createEventLoop(ioMode, 1, + conf.getModuleName() + "-boss"); + EventLoopGroup workerGroup = NettyUtils.createEventLoop(ioMode, conf.serverThreads(), + conf.getModuleName() + "-server"); PooledByteBufAllocator allocator = NettyUtils.createPooledByteBufAllocator( conf.preferDirectBufs(), true /* allowCache */, conf.serverThreads()); --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org