[ 
https://issues.apache.org/jira/browse/CASSANDRA-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268714#comment-16268714
 ] 

Vincent White commented on CASSANDRA-14047:
-------------------------------------------

While looking into this the only error I came across was related to joining 
nodes begin sent read messages for column families they didn't know about yet. 
Which look like ...


{noformat}
WARN  [MessagingService-NettyInbound-Thread-3-1] 2017-11-28 12:40:41,123 
MessageInHandler.java:273 - Got message from unknown table while reading from 
socket; closing
org.apache.cassandra.exceptions.UnknownTableException: Couldn't find table with 
id 5bc52802-de25-35ed-aeab-188eecebb090. If a table was just created, this is 
likely due to the sch
emanot being fully propagated.  Please wait for schema agreement on table 
creation.
        at 
org.apache.cassandra.schema.Schema.getExistingTableMetadata(Schema.java:474) 
~[main/:na]
        at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:670)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:606)
 ~[main/:na]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:139) 
~[main/:na]
        at 
org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:178)
 ~[main/:na]
        at 
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]

{noformat}

...followed by 
{noformat}
WARN  [MessagingService-NettyInbound-Thread-3-1] 2017-11-28 12:40:41,193 
MessageInHandler.java:277 - Unexpected exception caught in inbound channel 
pipeline from /127.0.0.1:41522
java.lang.ArrayIndexOutOfBoundsException: 90
        at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:657)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:606)
 ~[main/:na]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:139) 
~[main/:na]
        at 
org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:178)
 ~[main/:na]
        at 
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
        at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
        at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
{noformat}

...as C* appears to get stuck (at least for a good amount of time) in a loop 
where decode() is called (after the initial exception), ReadCommand() throws an 
exception(due to the read index into the buffer being left in an incorrect 
state following the initial exception) each time which again calls ctx.close() 
on the channel before this loop continues. 

I probably don't have enough netty knowledge to propose a good fix for this but 
[this|https://github.com/apache/cassandra/compare/trunk...vincewhite:debug-1] 
appeared to help, though I didn't get time to look too deeply into the new 
behaviour. Maybe someone like [~jasobrown] would have a more correct fix. 

I suspect this is probably affects most of these consistency tests. 

> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: 
> Missing: ['127.0.0.3.* now UP']:
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14047
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14047
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Testing
>            Reporter: Michael Kjellman
>            Assignee: Vincent White
>
> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: 
> Missing: ['127.0.0.3.* now UP']:
> 15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']:
> INFO  [main] 2017-11-15 11:21:32,452 YamlConfigura.....
> See system.log for remainder
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-v3VgyS
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
>     'num_tokens': '32',
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> dtest: DEBUG: Testing single dc, users, each quorum reads
> --------------------- >> end captured logging << ---------------------
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/cassandra/cassandra-dtest/tools/decorators.py", line 48, in 
> wrapped
>     f(obj)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 621, in 
> test_simple_strategy_each_quorum_users
>     
> self._run_test_function_in_parallel(TestAccuracy.Validation.validate_users, 
> [self.nodes], [self.rf], combinations)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in 
> _run_test_function_in_parallel
>     self._start_cluster(save_sessions=True, 
> requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in 
> _start_cluster
>     cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 428, in start
>     node.watch_log_for_alive(other_node, from_mark=mark)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 520, in watch_log_for_alive
>     self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 488, in watch_log_for
>     raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".....\nSee {} for remainder".format(filename))
> "15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']:\nINFO  [main] 
> 2017-11-15 11:21:32,452 YamlConfigura.....\nSee system.log for 
> remainder\n-------------------- >> begin captured logging << 
> --------------------\ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-v3VgyS\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n    'num_tokens': '32',\n    'phi_convict_threshold': 
> 5,\n    'range_request_timeout_in_ms': 10000,\n    
> 'read_request_timeout_in_ms': 10000,\n    'request_timeout_in_ms': 10000,\n   
>  'truncate_request_timeout_in_ms': 10000,\n    'write_request_timeout_in_ms': 
> 10000}\ndtest: DEBUG: Testing single dc, users, each quorum 
> reads\n--------------------- >> end captured logging << ---------------------"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to