[ 
https://issues.apache.org/jira/browse/CASSANDRA-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615442#comment-13615442
 ] 

Ondřej Černoš commented on CASSANDRA-5391:
------------------------------------------

Update:

With SSTable compression switched off the bug disappears. When I run nodetool 
rebuild us-east on a Rackspace node, it fetches the data correctly and when I 
compare the md5 of the DB file on an AWS node (after flush and compaction), it 
is exactly the same as on the Rackspace node.
It means the problem is only with compressed SSTables, but the problem is 
independent on chosen compression algorithm. And only with SSL switched on for 
inter-DC communication.
                
> SSL problems with inter-DC communication
> ----------------------------------------
>
>                 Key: CASSANDRA-5391
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5391
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.3
>         Environment: $ /etc/alternatives/jre_1.6.0/bin/java -version
> java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
> $ uname -a
> Linux hostname 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 CDT 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> $ cat /etc/redhat-release 
> Scientific Linux release 6.3 (Carbon)
> $ facter | grep ec2
> ...
> ec2_placement => availability_zone=us-east-1d
> ...
> $ rpm -qi cassandra
> cassandra-1.2.3-1.el6.cmp1.noarch
> (custom built rpm from cassandra tarball distribution)
>            Reporter: Ondřej Černoš
>            Priority: Blocker
>
> I get SSL and snappy compression errors in multiple datacenter setup.
> The setup is simple: 3 nodes in AWS east, 3 nodes in Rackspace. I use 
> slightly modified Ec2MultiRegionSnitch in Rackspace (I just added a regex 
> able to parse the Rackspace/Openstack availability zone which happens to be 
> in unusual format).
> During {{nodetool rebuild}} tests I managed to (consistently) trigger the 
> following error:
> {noformat}
> 2013-03-19 12:42:16.059+0100 [Thread-13] [DEBUG] 
> IncomingTcpConnection.java(79) 
> org.apache.cassandra.net.IncomingTcpConnection: IOException reading from 
> socket; closing
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>       at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>       at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>       at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>       at 
> org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:93)
>       at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.decompress(CompressedInputStream.java:101)
>       at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:79)
>       at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:337)
>       at 
> org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort(BytesReadTracker.java:140)
>       at 
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
>       at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
>       at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:160)
>       at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> {noformat}
> The exception is raised during DB file download. What is strange is the 
> following:
> * the exception is raised only when rebuildig from AWS into Rackspace
> * the exception is raised only when all nodes are up and running in AWS (all 
> 3). In other words, if I bootstrap from one or two nodes in AWS, the command 
> succeeds.
> Packet-level inspection revealed malformed packets _on both ends of 
> communication_ (the packet is considered malformed on the machine it 
> originates on).
> Further investigation raised two more concerns:
> * We managed to get another stacktrace when testing the scenario. The 
> exception was raised only once during the tests and was raised when I 
> throttled the inter-datacenter bandwidth to 1Mbps.
> {noformat}
> java.lang.RuntimeException: javax.net.ssl.SSLException: bad record MAC
>       at com.google.common.base.Throwables.propagate(Throwables.java:160)
>       at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
>       at java.lang.Thread.run(Thread.java:662)
> Caused by: javax.net.ssl.SSLException: bad record MAC
>       at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1607)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:859)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>       at 
> com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>       at 
> org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151)
>       at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>       ... 1 more
> {noformat}
> This is pure SSL error with no snappy interference.
> * I managed to trigger the exception during {{nodetool repair}} tests when 
> replacing dead node with a new one _on the aws side_, which means the problem 
> is not restricted to the one-way scenario only.
> {noformat}
> 2013-03-27 14:06:03.033+0100 [Thread-137] [INFO] StreamInSession.java(136) 
> org.apache.cassandra.streaming.StreamInSession: Streaming of file 
> /path/to/cassandra/data/ks/cf/ks-cf-ib-2-Data.db sections=3 progress=0/20513 
> - 0% for org.apache.cassandra.streaming.StreamInSession@14450ae7 failed: 
> requesting a retry.
> 2013-03-27 14:06:03.033+0100 [Thread-138] [DEBUG] FileUtils.java(110) 
> org.apache.cassandra.io.util.FileUtils: Deleting ks-cf-tmp-ib-98-Data.db
> 2013-03-27 14:06:03.033+0100 [Thread-138] [DEBUG] FileUtils.java(110) 
> org.apache.cassandra.io.util.FileUtils: Deleting ks-cf-tmp-ib-98-Filter.db
> 2013-03-27 14:06:03.034+0100 [Thread-138] [DEBUG] FileUtils.java(110) 
> org.apache.cassandra.io.util.FileUtils: Deleting ks-cf-tmp-ib-98-TOC.txt
> 2013-03-27 14:06:03.034+0100 [Thread-137] [DEBUG] 
> IncomingTcpConnection.java(91) 
> org.apache.cassandra.net.IncomingTcpConnection: IOException reading from 
> socket; closing
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>       at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>       at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>       at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>       at 
> org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:93)
>       at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.decompress(CompressedInputStream.java:101)
>       at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:79)
>       at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
>       at 
> org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort(BytesReadTracker.java:140)
>       at 
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
>       at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
>       at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:160)
>       at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to