Jeff Ferland created CASSANDRA-11028:
----------------------------------------

             Summary: Streaming errors caused by corrupt tables need more 
logging
                 Key: CASSANDRA-11028
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11028
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jeff Ferland


Example output: ERROR [STREAM-IN-/10.0.10.218] 2016-01-17 16:01:38,431  
StreamSession.java:505 - [Stream #e6ca4590-bc66-11e5-84be-571ffcecc993] 
Streaming error occurred
java.lang.IllegalArgumentException: Unknown type 0

In some cases logging shows a message more like:
ERROR [STREAM-IN-/10.0.10.12] 2016-01-05 14:44:38,690  StreamSession.java:505 - 
[Stream #472d28e0-b347-11e5-8b40-bb4d80df86f4] Streaming error occurred
java.io.IOException: Too many retries for Header (cfId: 
6b262d58-8730-36ca-8e3e-f0a40beaf92f, #0, version: ka, estimated keys: 58880, 
transfer size: 2159040, compressed?: true, repairedAt: 0)

In the majority of cases, however, no information identifying the column family 
is shown, and never identifying the source file that was being streamed.

Errors do no stop the streaming process, but do mark the streaming as failed at 
the end. This usually results in a log message pattern like:

INFO  [StreamReceiveTask:252] 2016-01-18 04:45:01,190  
StreamResultFuture.java:180 - [Stream #e6ca4590-bc66-11e5-84be-571ffcecc993] 
Session with /10.0.10.219 is complete
WARN  [StreamReceiveTask:252] 2016-01-18 04:45:01,215  
StreamResultFuture.java:207 - [Stream #e6ca4590-bc66-11e5-84be-571ffcecc993] 
Stream failed
ERROR [main] 2016-01-18 04:45:01,217  CassandraDaemon.java:579 - Exception 
encountered during startup

... which is highly confusing given the error occurred hours before.

Request: more detail in logging messages for stream failure indicating what 
column family was being used, and if possible a clarification between network 
issues and corrupt file issues.

Actual cause of errors / solution is running nodetool scrub on the offending 
node. It's rather expensive scrubbing the whole space blindly versus targeting 
issue tables. In our particular case, out of order keys were caused by a bug in 
a previous version of Cassandra.

    WARN  [CompactionExecutor:19552] 2016-01-18 16:02:10,155  
OutputHandler.java:52 - 378490 out of order rows found while scrubbing 
SSTableReader(path='/mnt/cassandra/data/keyspace/cf-888a52f96d1d389790ee586a6100916c/keyspace-cf-ka-133-Data.db');
 Those have been written (in order) to a new sstable 
(SSTableReader(path='/mnt/cassandra/data/keyspace/cf-888a52f96d1d389790ee586a6100916c/keyspace-cf-ka-179-Data.db'))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to