[ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200610#comment-13200610 ]
Peter Schuller commented on CASSANDRA-3838: ------------------------------------------- Note that simply adding a socket timeout is not a good idea unless both sides are truly expected to never starve (this is why I didn't suggest it for CASSANDRA-3569, and why TCP keep-alive is the "correct" solution because it does not generate spurious timeouts by lack of in-band data on the channel - but as noted in that ticket, the practical reality is that we don't control keep alive parameters on a per-socket basis). For example if one of the ends is waiting for a few seconds for a particularly expensive fsync(), or waiting for some kind of lock, you'd have spurious failures (whereas this is not the case for keep-alive, because the transport is alive and kicking at the kernel level). Depending on surrounding logic, it could be dangerous if it causes the receiver to believe it received the file while the sender believes it doesn't (e.g. multiple streaming -> disk space explosion). I would suggest TCP keep-alive for the reasons mentioned here and discussed in CASSANDRA-3569, and suggest that the TCP keep-alive settings be tweaked to fail quicker if that is desired. If adding a socket timeout, thought needs to go into what kind of false failure cases will be created. If both ends are truly expected not to block on anything like compaction locks or whatever else there might be, it might be okay. In either case, definitely *don't* use rpc timeout IMO; the concerns are completely different. A low-timeout cluster with an rpc timeout of 0.5 seconds for example would be extremely sensitive to even the slightest hiccup (such as waitnig 1 second for an fsync(), or a GC pause, etc) and it would truly be useless and extremely damaging to kill streams for that. In general, as with CASSANDRA-3569, I strongly argue that streaming should not be caused to spuriously fail because the impact of that can be huge, particularly on clusters with large nodes. As for reads vs. writes: You definitely want timeouts on both sides in order to guarantee that you never hang under any circumstance regardless of the nature of the TCP connection loss, unless you have some other method to accomplish the same thing. If this (socket timeouts) does go in, I argue even more strongly than before that the tear-down of streams due to failure detector as in CASSANDRA-3569 is truly just negative rather than positive (but as noted in that ticket, not hanging forever on repairs and such remains a concern). > Repair Streaming hangs between multiple regions > ----------------------------------------------- > > Key: CASSANDRA-3838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3838 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.7 > Reporter: Vijay > Assignee: Vijay > Priority: Minor > Fix For: 1.0.8 > > Attachments: 0001-Add-streaming-socket-timeouts.patch > > > Streaming hangs between datacenters, though there might be multiple reasons > for this, a simple fix will be to add the Socket timeout so the session can > retry. > The following is the netstat of the affected node (the below output remains > this way for a very long period). > [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats > Mode: NORMAL > Streaming to: /50.17.92.159 > /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db > sections=7002 progress=1523325354/2475291786 - 61% > /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db > sections=4581 progress=0/595026085 - 0% > /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db > sections=6631 progress=0/2270344837 - 0% > /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db > sections=6266 progress=0/2190197091 - 0% > /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db > sections=7662 progress=0/3082087770 - 0% > /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db > sections=7874 progress=0/587439833 - 0% > /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db > sections=7682 progress=0/2933920085 - 0% > "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable > [0x000000006be85000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at > com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297) > at > com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286) > at > com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743) > at > com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731) > at > com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59) > - locked <0x00000006afea1bd8> (a > com.sun.net.ssl.internal.ssl.AppOutputStream) > at > com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133) > at > com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203) > at > com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117) > at > org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152) > at > org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Streaming from: /46.51.141.51 > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db > sections=7231 progress=0/1548922508 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db > sections=4730 progress=0/296474156 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db > sections=7650 progress=0/1580417610 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db > sections=7682 progress=0/196689250 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db > sections=7149 progress=0/478695185 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db > sections=443 progress=0/78417320 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db > sections=6631 progress=0/2270344837 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db > sections=4590 progress=0/1310718798 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db > sections=4581 progress=0/595026085 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db > sections=7682 progress=0/2933920085 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db > sections=7876 progress=0/3308781588 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db > sections=7386 progress=0/2868167170 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db > sections=7874 progress=0/587439833 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db > sections=4618 progress=0/215989758 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db > sections=7002 progress=1542191546/2475291786 - 62% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db > sections=6266 progress=0/2190197091 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db > sections=6698 progress=0/2304563183 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db > sections=7662 progress=0/3082087770 - 0% > abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db > sections=7386 progress=0/1324787539 - 0% > "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable > [0x000000004251b000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at > com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293) > at > com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405) > at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360) > at > com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798) > - locked <0x00000005e220a170> (a java.lang.Object) > at > com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755) > at > com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) > - locked <0x00000005e220a1b8> (a > com.sun.net.ssl.internal.ssl.AppInputStream) > at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392) > at > com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190) > at > com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254) > at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readLong(DataInputStream.java:399) > at > org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) > at > org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244) > at > org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90) > at > org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira