[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Peter Schuller (Commented) (JIRA) Sat, 04 Feb 2012 16:58:22 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200610#comment-13200610
 ]


Peter Schuller commented on CASSANDRA-3838:
-------------------------------------------

Note that simply adding a socket timeout is not a good idea unless both sides 
are truly expected to never starve (this is why I didn't suggest it for 
CASSANDRA-3569, and why TCP keep-alive is the "correct" solution because it 
does not generate spurious timeouts by lack of in-band data on the channel - 
but as noted in that ticket, the practical reality is that we don't control 
keep alive parameters on a per-socket basis).

For example if one of the ends is waiting for a few seconds for a particularly 
expensive fsync(), or waiting for some kind of lock, you'd have spurious 
failures (whereas this is not the case for keep-alive, because the transport is 
alive and kicking at the kernel level). Depending on surrounding logic, it 
could be dangerous if it causes the receiver to believe it received the file 
while the sender believes it doesn't (e.g. multiple streaming -> disk space 
explosion).

I would suggest TCP keep-alive for the reasons mentioned here and discussed in 
CASSANDRA-3569, and suggest that the TCP keep-alive settings be tweaked to fail 
quicker if that is desired.

If adding a socket timeout, thought needs to go into what kind of false failure 
cases will be created. If both ends are truly expected not to block on anything 
like compaction locks or whatever else there might be, it might be okay.

In either case, definitely *don't* use rpc timeout IMO; the concerns are 
completely different. A low-timeout cluster with an rpc timeout of 0.5 seconds 
for example would be extremely sensitive to even the slightest hiccup (such as 
waitnig 1 second for an fsync(), or a GC pause, etc) and it would truly be 
useless and extremely damaging to kill streams for that.

In general, as with CASSANDRA-3569, I strongly argue that streaming should not 
be caused to spuriously fail because the impact of that can be huge, 
particularly on clusters with large nodes.

As for reads vs. writes: You definitely want timeouts on both sides in order to 
guarantee that you never hang under any circumstance regardless of the nature 
of the TCP connection loss, unless you have some other method to accomplish the 
same thing.

If this (socket timeouts) does go in, I argue even more strongly than before 
that the tear-down of streams due to failure detector as in CASSANDRA-3569 is 
truly just negative rather than positive (but as noted in that ticket, not 
hanging forever on repairs and such remains a concern).

                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons 
> for this, a simple fix will be to add the Socket timeout so the session can 
> retry.
> The following is the netstat of the affected node (the below output remains 
> this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable 
> [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at 
> com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at 
> com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a 
> com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at 
> com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at 
> com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at 
> com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db 
> sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db 
> sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db 
> sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db 
> sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db 
> sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db 
> sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db 
> sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db 
> sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db 
> sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db 
> sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db 
> sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db 
> sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db 
> sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db 
> sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db 
> sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db 
> sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db 
> sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db 
> sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db 
> sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable 
> [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at 
> com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at 
> com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at 
> com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a 
> com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at 
> com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at 
> com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at 
> org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at 
> org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Reply via email to