It's probably worth mentioning that this isn't a regression, it was also 
happening in M2 and persisted when we upgraded.

On Tuesday, June 2, 2015 at 7:05:09 PM UTC-7, Russell Cohen wrote:
>
> Hi All,
>   We've been trying to debug a tricky intermittent failure mode we're 
> having with akka-streams. We are setting up a large fan in (~32 nodes) and 
> periodically one of the nodes will inexplicably lead to an RST packet being 
> sent, aborting the entire graph. Below, I have debug logging from the 
> fan-in master as well as from the child that sent the failing packet along 
> with my commentary:
>
>
> Fan in master:
>
> [DEBUG] [06/02/2015 18:13:09.228] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Attempting 
> connection to [long-nrt-20/IP_REDACTED:2552]
> [DEBUG] [06/02/2015 18:13:09.229] [nrt-akka.actor.default-dispatcher-19] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Connection 
> established to [long-nrt-20/IP_REDACTED:2552]
> [DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] 
> [Actor[akka://nrt/system/IO-TCP-STREAM/client-127-long-nrt-20%2IP_REDACTED%3A2552#-368113801]]
>  
> registered as connection handler
> [DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-20] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Wrote [8] 
> bytes to channel
> [DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-20] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Got 
> ConfirmedClose command, sending FIN.
> [DEBUG] [06/02/2015 18:13:09.324] [nrt-akka.actor.default-dispatcher-20] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [7695] bytes.
> [DEBUG] [06/02/2015 18:13:09.324] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [9545] bytes.
> [DEBUG] [06/02/2015 18:13:09.325] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [1794] bytes.
> [DEBUG] [06/02/2015 18:13:09.325] [nrt-akka.actor.default-dispatcher-20] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [1794] bytes.
> [DEBUG] [06/02/2015 18:13:09.333] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [12439] bytes.
> [DEBUG] [06/02/2015 18:13:09.340] [nrt-akka.actor.default-dispatcher-20] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [1566] bytes.
> [DEBUG] [06/02/2015 18:13:09.346] [nrt-akka.actor.default-dispatcher-2] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [5210] bytes.
> [DEBUG] [06/02/2015 18:13:09.346] [nrt-akka.actor.default-dispatcher-20] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [4344] bytes.
> (snip)
> [DEBUG] [06/02/2015 18:13:24.800] [nrt-akka.actor.default-dispatcher-27] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [131072] bytes.
> [DEBUG] [06/02/2015 18:13:24.800] [nrt-akka.actor.default-dispatcher-27] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [131072] bytes.
> [DEBUG] [06/02/2015 18:13:25.066] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [131072] bytes.
> [DEBUG] [06/02/2015 18:13:25.066] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [131072] bytes.
> [DEBUG] [06/02/2015 18:13:25.302] [nrt-akka.actor.default-dispatcher-24] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [131072] bytes.
> [DEBUG] [06/02/2015 18:13:25.302] [nrt-akka.actor.default-dispatcher-24] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [131072] bytes.
> [DEBUG] [06/02/2015 18:13:31.169] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [131072] bytes.
> [DEBUG] [06/02/2015 18:13:31.170] [nrt-akka.actor.default-dispatcher-3] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
> [105496] bytes.
> [DEBUG] [06/02/2015 18:13:31.911] [nrt-akka.actor.default-dispatcher-2] [
> akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Closing 
> connection due to IO error java.io.IOException: Connection reset by peer
> (Abort message propagates to other streams)
>
> Fan in failing slave:
> [DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] 
> [Actor[akka://nrt/system/IO-TCP-STREAM/server-1-long-nrt-20%2FIP_REDACTED%3A2552/$j#733080811]]
>  
> registered as connection handler
> [DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Read [8] 
> bytes.
> [DEBUG] [06/02/2015 18:13:09.233] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Read [-1] 
> bytes.
> [DEBUG] [06/02/2015 18:13:09.233] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Read 
> returned end-of-stream, our side not yet closed
> [DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1777] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [3843] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [2075] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1803] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1794] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [2077] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1794] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-26] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [2077] bytes to channel
> [DEBUG] [06/02/2015 18:13:09.323] [nrt-akka.actor.default-dispatcher-23] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1794] bytes to channel
> (snip)
> [DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1794] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-6] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1793] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1794] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [2077] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [950] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-6] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [2076] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1791] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-6] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [2079] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-25] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [1793] bytes to channel
> [DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-6] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote 
> [954] bytes to channel
> 2015-06-02 18:13:20,417 -0700 INFO  [hostId=long-nrt-20] [module=nrt] [
> localUserName=nrt] Source complete
> [DEBUG] [06/02/2015 18:13:20.418] [nrt-akka.actor.default-dispatcher-28] [
> akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Got Close 
> command but write is still pending.
> [DEBUG] [06/02/2015 18:13:20.422] [nrt-akka.actor.default-dispatcher-25] [
> akka://nrt/system/IO-TCP/selectors/$a/10] Monitored actor 
> [Actor[akka://nrt/system/IO-TCP-STREAM/server-1-long-nrt-20%2FIP_REDACTED%3A2552/$j#733080811]]
>  
> terminated
>
> Commentary:
> Both nodes start at the same time (ruling out any clock skew), however the 
> failing slave finishes at 18:13:20, whereas the master keeps reading from 
> the connection for 10 more seconds. Before quitting, the slave logs: 
> Got Close command but write is still pending.
>
> As far as I can tell, this seems to be the smoking gun. It only gets 
> logged on streams that lead to this failure mode. There is no other logging 
> about this stream on the slave node after the last log line printed here 
> and no error messages are shown. My best guess is that when entering Close 
> when writes are still pending the socket cannot cleanly exit leading the 
> master to think it's still alive. At this point the master attempts to 
> cleanup the socket but it doesn't exist, leading to the OS sending an RST 
> packet.
>
> Any ideas? We don't have any way to work around this at the moment besides 
> not using Akka streams. There isn't anything obvious in the TcpConnection 
> code but based on the comments, there have been similar issues before.
>
> Russell
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to