Sean Thornton created CASSANDRA-11974: -----------------------------------------
Summary: Failed assert causes OutboundTcpConnection to exit Key: CASSANDRA-11974 URL: https://issues.apache.org/jira/browse/CASSANDRA-11974 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Reporter: Sean Thornton I am seeing the following in a client's cluster: {noformat} ERROR [MessagingService-Outgoing-/10.0.0.1] 2016-06-06 03:38:19,305 CassandraDaemon.java:229 - Exception in thread Thread[MessagingService-Outgoing-/10.0.0.1,5,main] java.lang.AssertionError: 635174 at org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:290) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:392) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:381) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:271) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:259) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:503) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:490) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.SliceFromReadCommandSerializer.serialize(SliceFromReadCommand.java:168) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:143) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:132) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:121) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:330) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:282) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] {noformat} Obviously they somehow exceeded a 64K limit (quick and dirty suspects - https://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html) but that is neither here nor there. The problem I see when this happens is {{ByteBufferUtil.writeWithShortLength}} can throw a {{java.lang.AssertionError}} which is a true {{Error}} that bubbles up and totally bypasses the {{catch (Exception e)}} clause in the message processing loop in {{OutboundTcpConnection.run()}} _which causes the thread to exit and that node to no longer communicate outgoing messages to other nodes_. At least from my perspective, there are two things I would like to see handled differently - * In the event of _any_ problem, I would like to see whatever details possible be logged about the problem Message - partition key, CF data, anything. Right now it can be very difficult to track this down * The {{java.lang.Error}} possibility needs to be handled somehow. If it's an assertion error, it seems like we could continue the processing loop. But shutting down the JVM would be better than what I get now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)