[ 
https://issues.apache.org/jira/browse/CASSANDRA-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323462#comment-15323462
 ] 

Sean Thornton commented on CASSANDRA-11974:
-------------------------------------------

Agree. I was trying to focus on a single issue here but there are a number of 
places in the code where critical threads can exit without proper handling and 
it would be better, in my opinion, that these events are recognized and 
handled, even if that is shutting the JVM down) vs. continuing to _appear_ to 
be up and running normally (compaction thread, I'm looking at you). I think the 
use of the Java assert keyword is possibly the root cause of this in a number 
of places due to its raising of a true Error.  Most people don't think or don't 
know how to handle this appropriately (and really shouldn't).  I would much 
prefer to see something in the pattern of Spring's Assert or common-lang's 
Validate be used.

I do think it's better for the community to provide concrete instances for the 
developers to address one-by-one though. It's difficult to address more general 
items without a larger effort and there are already a number of those.

> Failed assert causes OutboundTcpConnection to exit
> --------------------------------------------------
>
>                 Key: CASSANDRA-11974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11974
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Sean Thornton
>
> I am seeing the following in a client's cluster:
> {noformat}
> ERROR [MessagingService-Outgoing-/10.0.0.1] 2016-06-06 03:38:19,305  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MessagingService-Outgoing-/10.0.0.1,5,main]
> java.lang.AssertionError: 635174
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:290)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:392)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:381)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:271)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:259)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:503)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:490)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.SliceFromReadCommandSerializer.serialize(SliceFromReadCommand.java:168)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:143) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:132) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:121) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:330)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:282)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> {noformat}
> Obviously they somehow exceeded a 64K limit (quick and dirty suspects - 
> https://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html) but 
> that is neither here nor there.
> The problem I see when this happens is 
> {{ByteBufferUtil.writeWithShortLength}} can throw a 
> {{java.lang.AssertionError}} which is a true {{Error}} that bubbles up and 
> totally bypasses the {{catch (Exception e)}} clause in the message processing 
> loop in {{OutboundTcpConnection.run()}} _which causes the thread to exit and 
> that node to no longer communicate outgoing messages to other nodes_.
> At least from my perspective, there are two things I would like to see 
> handled differently -
> * In the event of _any_ problem, I would like to see whatever details 
> possible be logged about the problem Message - partition key, CF data, 
> anything.  Right now it can be very difficult to track this down
> * The {{java.lang.Error}} possibility needs to be handled somehow.  If it's 
> an assertion error, it seems like we could continue the processing loop.  But 
> shutting down the JVM would be better than what I get now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to