This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 6b47ace  [SPARK-30512] Added a dedicated boss event loop group
6b47ace is described below

commit 6b47ace27d04012bcff47951ea1eea2aa6fb7d60
Author: Chandni Singh <chsi...@linkedin.com>
AuthorDate: Wed Jan 29 15:02:48 2020 -0600

    [SPARK-30512] Added a dedicated boss event loop group
    
    ### What changes were proposed in this pull request?
    Adding a dedicated boss event loop group to the Netty pipeline in the 
External Shuffle Service to avoid the delay in channel registration.
    ```
       EventLoopGroup bossGroup = NettyUtils.createEventLoop(ioMode, 1,
          conf.getModuleName() + "-boss");
        EventLoopGroup workerGroup =  NettyUtils.createEventLoop(ioMode, 
conf.serverThreads(),
        conf.getModuleName() + "-server");
    
        bootstrap = new ServerBootstrap()
          .group(bossGroup, workerGroup)
          .channel(NettyUtils.getServerChannelClass(ioMode))
          .option(ChannelOption.ALLOCATOR, allocator)
    ```
    
    ### Why are the changes needed?
    We have been seeing a large number of SASL authentication (RPC requests) 
timing out with the external shuffle service.
    ```
    java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout 
waiting for task.
        at 
org.spark-project.guava.base.Throwables.propagate(Throwables.java:160)
        at 
org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:278)
        at 
org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:80)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
        at 
org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:181)
        at 
org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:141)
        at 
org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:218)
    ```
    The investigation that we have done is described here:
    https://github.com/netty/netty/issues/9890
    
    After adding `LoggingHandler` to the netty pipeline, we saw that the 
registration of the channel was getting delay which is because the worker 
threads are busy with the existing channels.
    
    ### Does this PR introduce any user-facing change?
    No
    
    ### How was this patch tested?
    We have tested the patch on our clusters and with a stress testing tool. 
After this change, we didn't see any SASL requests timing out. Existing unit 
tests pass.
    
    Closes #27240 from otterc/SPARK-30512.
    
    Authored-by: Chandni Singh <chsi...@linkedin.com>
    Signed-off-by: Thomas Graves <tgra...@apache.org>
---
 .../main/java/org/apache/spark/network/server/TransportServer.java | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java
 
b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java
index 8396e69..f0ff9f5 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java
@@ -100,9 +100,10 @@ public class TransportServer implements Closeable {
   private void init(String hostToBind, int portToBind) {
 
     IOMode ioMode = IOMode.valueOf(conf.ioMode());
-    EventLoopGroup bossGroup =
-      NettyUtils.createEventLoop(ioMode, conf.serverThreads(), 
conf.getModuleName() + "-server");
-    EventLoopGroup workerGroup = bossGroup;
+    EventLoopGroup bossGroup = NettyUtils.createEventLoop(ioMode, 1,
+      conf.getModuleName() + "-boss");
+    EventLoopGroup workerGroup =  NettyUtils.createEventLoop(ioMode, 
conf.serverThreads(),
+      conf.getModuleName() + "-server");
 
     bootstrap = new ServerBootstrap()
       .group(bossGroup, workerGroup)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to