[ https://issues.apache.org/jira/browse/CASSANDRA-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323462#comment-15323462 ]
Sean Thornton commented on CASSANDRA-11974: ------------------------------------------- Agree. I was trying to focus on a single issue here but there are a number of places in the code where critical threads can exit without proper handling and it would be better, in my opinion, that these events are recognized and handled, even if that is shutting the JVM down) vs. continuing to _appear_ to be up and running normally (compaction thread, I'm looking at you). I think the use of the Java assert keyword is possibly the root cause of this in a number of places due to its raising of a true Error. Most people don't think or don't know how to handle this appropriately (and really shouldn't). I would much prefer to see something in the pattern of Spring's Assert or common-lang's Validate be used. I do think it's better for the community to provide concrete instances for the developers to address one-by-one though. It's difficult to address more general items without a larger effort and there are already a number of those. > Failed assert causes OutboundTcpConnection to exit > -------------------------------------------------- > > Key: CASSANDRA-11974 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11974 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Reporter: Sean Thornton > > I am seeing the following in a client's cluster: > {noformat} > ERROR [MessagingService-Outgoing-/10.0.0.1] 2016-06-06 03:38:19,305 > CassandraDaemon.java:229 - Exception in thread > Thread[MessagingService-Outgoing-/10.0.0.1,5,main] > java.lang.AssertionError: 635174 > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:290) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:392) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:381) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:271) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:259) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:503) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:490) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.SliceFromReadCommandSerializer.serialize(SliceFromReadCommand.java:168) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:143) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:132) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:121) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:330) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:282) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > {noformat} > Obviously they somehow exceeded a 64K limit (quick and dirty suspects - > https://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html) but > that is neither here nor there. > The problem I see when this happens is > {{ByteBufferUtil.writeWithShortLength}} can throw a > {{java.lang.AssertionError}} which is a true {{Error}} that bubbles up and > totally bypasses the {{catch (Exception e)}} clause in the message processing > loop in {{OutboundTcpConnection.run()}} _which causes the thread to exit and > that node to no longer communicate outgoing messages to other nodes_. > At least from my perspective, there are two things I would like to see > handled differently - > * In the event of _any_ problem, I would like to see whatever details > possible be logged about the problem Message - partition key, CF data, > anything. Right now it can be very difficult to track this down > * The {{java.lang.Error}} possibility needs to be handled somehow. If it's > an assertion error, it seems like we could continue the processing loop. But > shutting down the JVM would be better than what I get now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)