Hi All,
  We've been trying to debug a tricky intermittent failure mode we're 
having with akka-streams. We are setting up a large fan in (~32 nodes) and 
periodically one of the nodes will inexplicably lead to an RST packet being 
sent, aborting the entire graph. Below, I have debug logging from the 
fan-in master as well as from the child that sent the failing packet along 
with my commentary:


Fan in master:

[DEBUG] [06/02/2015 18:13:09.228] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Attempting 
connection to [long-nrt-20/IP_REDACTED:2552]
[DEBUG] [06/02/2015 18:13:09.229] [nrt-akka.actor.default-dispatcher-19] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Connection 
established to [long-nrt-20/IP_REDACTED:2552]
[DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] 
[Actor[akka://nrt/system/IO-TCP-STREAM/client-127-long-nrt-20%2IP_REDACTED%3A2552#-368113801]]
 
registered as connection handler
[DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-20] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Wrote [8] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-20] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Got 
ConfirmedClose command, sending FIN.
[DEBUG] [06/02/2015 18:13:09.324] [nrt-akka.actor.default-dispatcher-20] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [7695] 
bytes.
[DEBUG] [06/02/2015 18:13:09.324] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [9545] 
bytes.
[DEBUG] [06/02/2015 18:13:09.325] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [1794] 
bytes.
[DEBUG] [06/02/2015 18:13:09.325] [nrt-akka.actor.default-dispatcher-20] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [1794] 
bytes.
[DEBUG] [06/02/2015 18:13:09.333] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [12439] 
bytes.
[DEBUG] [06/02/2015 18:13:09.340] [nrt-akka.actor.default-dispatcher-20] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [1566] 
bytes.
[DEBUG] [06/02/2015 18:13:09.346] [nrt-akka.actor.default-dispatcher-2] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [5210] 
bytes.
[DEBUG] [06/02/2015 18:13:09.346] [nrt-akka.actor.default-dispatcher-20] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read [4344] 
bytes.
(snip)
[DEBUG] [06/02/2015 18:13:24.800] [nrt-akka.actor.default-dispatcher-27] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[131072] bytes.
[DEBUG] [06/02/2015 18:13:24.800] [nrt-akka.actor.default-dispatcher-27] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[131072] bytes.
[DEBUG] [06/02/2015 18:13:25.066] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[131072] bytes.
[DEBUG] [06/02/2015 18:13:25.066] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[131072] bytes.
[DEBUG] [06/02/2015 18:13:25.302] [nrt-akka.actor.default-dispatcher-24] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[131072] bytes.
[DEBUG] [06/02/2015 18:13:25.302] [nrt-akka.actor.default-dispatcher-24] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[131072] bytes.
[DEBUG] [06/02/2015 18:13:31.169] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[131072] bytes.
[DEBUG] [06/02/2015 18:13:31.170] [nrt-akka.actor.default-dispatcher-3] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Read 
[105496] bytes.
[DEBUG] [06/02/2015 18:13:31.911] [nrt-akka.actor.default-dispatcher-2] [
akka.tcp://nrt@long-nrt-1:2551/system/IO-TCP/selectors/$a/143] Closing 
connection due to IO error java.io.IOException: Connection reset by peer
(Abort message propagates to other streams)

Fan in failing slave:
[DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] 
[Actor[akka://nrt/system/IO-TCP-STREAM/server-1-long-nrt-20%2FIP_REDACTED%3A2552/$j#733080811]]
 
registered as connection handler
[DEBUG] [06/02/2015 18:13:09.230] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Read [8] 
bytes.
[DEBUG] [06/02/2015 18:13:09.233] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Read [-1] 
bytes.
[DEBUG] [06/02/2015 18:13:09.233] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Read 
returned end-of-stream, our side not yet closed
[DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1777] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [3843] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [2075] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.321] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1803] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1794] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [2077] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1794] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.322] [nrt-akka.actor.default-dispatcher-26] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [2077] 
bytes to channel
[DEBUG] [06/02/2015 18:13:09.323] [nrt-akka.actor.default-dispatcher-23] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1794] 
bytes to channel
(snip)
[DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1794] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-6] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1793] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1794] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [2077] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [950] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.416] [nrt-akka.actor.default-dispatcher-6] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [2076] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1791] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-6] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [2079] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-25] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [1793] 
bytes to channel
[DEBUG] [06/02/2015 18:13:20.417] [nrt-akka.actor.default-dispatcher-6] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Wrote [954] 
bytes to channel
2015-06-02 18:13:20,417 -0700 INFO  [hostId=long-nrt-20] [module=nrt] [
localUserName=nrt] Source complete
[DEBUG] [06/02/2015 18:13:20.418] [nrt-akka.actor.default-dispatcher-28] [
akka.tcp://nrt@long-nrt-20:2551/system/IO-TCP/selectors/$a/10] Got Close 
command but write is still pending.
[DEBUG] [06/02/2015 18:13:20.422] [nrt-akka.actor.default-dispatcher-25] [
akka://nrt/system/IO-TCP/selectors/$a/10] Monitored actor 
[Actor[akka://nrt/system/IO-TCP-STREAM/server-1-long-nrt-20%2FIP_REDACTED%3A2552/$j#733080811]]
 
terminated

Commentary:
Both nodes start at the same time (ruling out any clock skew), however the 
failing slave finishes at 18:13:20, whereas the master keeps reading from 
the connection for 10 more seconds. Before quitting, the slave logs: 
Got Close command but write is still pending.

As far as I can tell, this seems to be the smoking gun. It only gets logged 
on streams that lead to this failure mode. There is no other logging about 
this stream on the slave node after the last log line printed here and no 
error messages are shown. My best guess is that when entering Close when 
writes are still pending the socket cannot cleanly exit leading the master 
to think it's still alive. At this point the master attempts to cleanup the 
socket but it doesn't exist, leading to the OS sending an RST packet.

Any ideas? We don't have any way to work around this at the moment besides 
not using Akka streams. There isn't anything obvious in the TcpConnection 
code but based on the comments, there have been similar issues before.

Russell

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to