[jira] [Assigned] (CASSANDRA-5391) SSL problems with inter-DC communication

2013-03-28 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita reassigned CASSANDRA-5391:
-

Assignee: Yuki Morishita  (was: T Jake Luciani)

> SSL problems with inter-DC communication
> 
>
> Key: CASSANDRA-5391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5391
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.3
> Environment: $ /etc/alternatives/jre_1.6.0/bin/java -version
> java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
> $ uname -a
> Linux hostname 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 CDT 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> $ cat /etc/redhat-release 
> Scientific Linux release 6.3 (Carbon)
> $ facter | grep ec2
> ...
> ec2_placement => availability_zone=us-east-1d
> ...
> $ rpm -qi cassandra
> cassandra-1.2.3-1.el6.cmp1.noarch
> (custom built rpm from cassandra tarball distribution)
>Reporter: Ondřej Černoš
>Assignee: Yuki Morishita
>Priority: Blocker
> Attachments: 5391-1.2.txt
>
>
> I get SSL and snappy compression errors in multiple datacenter setup.
> The setup is simple: 3 nodes in AWS east, 3 nodes in Rackspace. I use 
> slightly modified Ec2MultiRegionSnitch in Rackspace (I just added a regex 
> able to parse the Rackspace/Openstack availability zone which happens to be 
> in unusual format).
> During {{nodetool rebuild}} tests I managed to (consistently) trigger the 
> following error:
> {noformat}
> 2013-03-19 12:42:16.059+0100 [Thread-13] [DEBUG] 
> IncomingTcpConnection.java(79) 
> org.apache.cassandra.net.IncomingTcpConnection: IOException reading from 
> socket; closing
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>   at 
> org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:93)
>   at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.decompress(CompressedInputStream.java:101)
>   at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:79)
>   at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:337)
>   at 
> org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort(BytesReadTracker.java:140)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:160)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> {noformat}
> The exception is raised during DB file download. What is strange is the 
> following:
> * the exception is raised only when rebuildig from AWS into Rackspace
> * the exception is raised only when all nodes are up and running in AWS (all 
> 3). In other words, if I bootstrap from one or two nodes in AWS, the command 
> succeeds.
> Packet-level inspection revealed malformed packets _on both ends of 
> communication_ (the packet is considered malformed on the machine it 
> originates on).
> Further investigation raised two more concerns:
> * We managed to get another stacktrace when testing the scenario. The 
> exception was raised only once during the tests and was raised when I 
> throttled the inter-datacenter bandwidth to 1Mbps.
> {noformat}
> java.lang.RuntimeException: javax.net.ssl.SSLException: bad record MAC
>   at com.google.common.base.Throwables.propagate(Throwables.java:160)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: javax.net.ssl.SSLException: bad record MAC
>   at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1607)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:859)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>   

[jira] [Assigned] (CASSANDRA-5391) SSL problems with inter-DC communication

2013-03-27 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-5391:
-

Assignee: T Jake Luciani

Can you shed any light, Jake?

> SSL problems with inter-DC communication
> 
>
> Key: CASSANDRA-5391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5391
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.3
> Environment: $ /etc/alternatives/jre_1.6.0/bin/java -version
> java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
> $ uname -a
> Linux hostname 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 CDT 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> $ cat /etc/redhat-release 
> Scientific Linux release 6.3 (Carbon)
> $ facter | grep ec2
> ...
> ec2_placement => availability_zone=us-east-1d
> ...
> $ rpm -qi cassandra
> cassandra-1.2.3-1.el6.cmp1.noarch
> (custom built rpm from cassandra tarball distribution)
>Reporter: Ondřej Černoš
>Assignee: T Jake Luciani
>Priority: Blocker
>
> I get SSL and snappy compression errors in multiple datacenter setup.
> The setup is simple: 3 nodes in AWS east, 3 nodes in Rackspace. I use 
> slightly modified Ec2MultiRegionSnitch in Rackspace (I just added a regex 
> able to parse the Rackspace/Openstack availability zone which happens to be 
> in unusual format).
> During {{nodetool rebuild}} tests I managed to (consistently) trigger the 
> following error:
> {noformat}
> 2013-03-19 12:42:16.059+0100 [Thread-13] [DEBUG] 
> IncomingTcpConnection.java(79) 
> org.apache.cassandra.net.IncomingTcpConnection: IOException reading from 
> socket; closing
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>   at 
> org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:93)
>   at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.decompress(CompressedInputStream.java:101)
>   at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:79)
>   at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:337)
>   at 
> org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort(BytesReadTracker.java:140)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:160)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> {noformat}
> The exception is raised during DB file download. What is strange is the 
> following:
> * the exception is raised only when rebuildig from AWS into Rackspace
> * the exception is raised only when all nodes are up and running in AWS (all 
> 3). In other words, if I bootstrap from one or two nodes in AWS, the command 
> succeeds.
> Packet-level inspection revealed malformed packets _on both ends of 
> communication_ (the packet is considered malformed on the machine it 
> originates on).
> Further investigation raised two more concerns:
> * We managed to get another stacktrace when testing the scenario. The 
> exception was raised only once during the tests and was raised when I 
> throttled the inter-datacenter bandwidth to 1Mbps.
> {noformat}
> java.lang.RuntimeException: javax.net.ssl.SSLException: bad record MAC
>   at com.google.common.base.Throwables.propagate(Throwables.java:160)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: javax.net.ssl.SSLException: bad record MAC
>   at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1607)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:859)
>   at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>   at 
> com.sun.