[ 
https://issues.apache.org/jira/browse/STORM-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-128:
-------------------------------
    Component/s: storm-core

> Topology fails to start if a configured DRPC server is down
> -----------------------------------------------------------
>
>                 Key: STORM-128
>                 URL: https://issues.apache.org/jira/browse/STORM-128
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/696
> In our environment we have 3 DRPC servers running. This was done mainly for 
> availability and capacity. However, we noticed that when even one of these 
> servers is down, topologies fail to start with the following exception:
> java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: 
> java.net.NoRouteToHostException: No route to host
> at backtype.storm.drpc.DRPCInvocationsClient.(DRPCInvocationsClient.java:23)
> at backtype.storm.drpc.DRPCSpout.open(DRPCSpout.java:65)
> at 
> storm.trident.spout.RichSpoutBatchTriggerer.open(RichSpoutBatchTriggerer.java:41)
> at backtype.storm.daemon.executor$fn__3985$fn__3997.invoke(executor.clj:460)
> at backtype.storm.util$async_loop$fn__465.invoke(util.clj:375)
> at clojure.lang.AFn.run(AFn.java:24)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.thrift7.transport.TTransportException: 
> java.net.NoRouteToHostException: No route to host
> at org.apache.thrift7.transport.TSocket.open(TSocket.java:183)
> at 
> org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81)
> at 
> backtype.storm.drpc.DRPCInvocationsClient.connect(DRPCInvocationsClient.java:30)
> at backtype.storm.drpc.DRPCInvocationsClient.(DRPCInvocationsClient.java:21)
> ... 6 more
> Caused by: java.net.NoRouteToHostException: No route to host
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> at java.net.Socket.connect(Socket.java:579)
> at org.apache.thrift7.transport.TSocket.open(TSocket.java:178)
> ... 9 more
> I was wondering if it makes sense to make Storm handle this gracefully 
> instead of failing fast. Otherwise, the DRPC servers become a SPOF.
> If the topologies are already running the topology usually just logs an error 
> message and continues.
> ----------
> dkador: +1 on figuring out how to make the DRPC stuff not a SOP. I'd be happy 
> to look into it myself but not sure where to start. Any guidance?
> ----------
> rijuk: For reference, the stack trace I see when a DRPC server goes down 
> while a topology is running is the following. In this case, the topology 
> continues to function normally.
> [backtype.storm.drpc.DRPCSpout Thread-65]: Failed to fetch DRPC result from 
> DRPC server
> org.apache.thrift7.transport.TTransportException: java.net.ConnectException: 
> Connection refused
> at org.apache.thrift7.transport.TSocket.open(TSocket.java:183)
> at 
> org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81)
> at 
> backtype.storm.drpc.DRPCInvocationsClient.connect(DRPCInvocationsClient.java:30)
> at 
> backtype.storm.drpc.DRPCInvocationsClient.fetchRequest(DRPCInvocationsClient.java:53)
> at backtype.storm.drpc.DRPCSpout.nextTuple(DRPCSpout.java:89)
> at 
> storm.trident.spout.RichSpoutBatchTriggerer.nextTuple(RichSpoutBatchTriggerer.java:68)
> at 
> backtype.storm.daemon.executor$fn__3985$fn__3997$fn__4026.invoke(executor.clj:502)
> at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
> at clojure.lang.AFn.run(AFn.java:24)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> at java.net.Socket.connect(Socket.java:579)
> at org.apache.thrift7.transport.TSocket.open(TSocket.java:178)
> ... 9 more
> In this case I'd the host up, but the DRPC server process was down. Hence the 
> ConnectException. But, the behavior is the same even when the host is 
> unreachable, except for the Exception type.
> @dkador, I'm not sure what the right solution is. One naive solution I can 
> think of is to make DRPCInvocationsClient constructor rethrow TException 
> instead of throwing a RuntimeException. Obviously, you'll have to make sure 
> that all callers of this higher up in the stack handle this exception 
> properly.
> Actually, on second thoughts that's not a good idea. You probably still want 
> the DRPCInvocationsClient object to be constructed. So, maybe you can log an 
> error and just eat that exception. All other methods in that class call 
> "connect" if necessary anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to