[ 
https://issues.apache.org/jira/browse/CASSANDRA-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166449#comment-13166449
 ] 

Jonathan Ellis commented on CASSANDRA-3585:
-------------------------------------------

Some interesting things from the trace log:

The request that errors out:
{noformat}
DEBUG [pool-2-thread-39] 2011-12-08 21:07:21,815 StorageProxy.java (line 627) 
Command/ConsistencyLevel is SliceByNamesReadCommand(table='KeyspaceMetadata', 
key=313030303237383033, 
columnParent='QueryPath(columnFamilyName='CF_MailboxSummary', 
superColumnName='null', columnName='null')', 
columns=[666f6c6465725f73756d6d6172795f4c057f31b9ba31138e6c1ff57f552f66,666f6c6465725f73756d6d6172795f5112276526f2330392edfc4b47cf4725,666f6c6465725f73756d6d6172795ff75e02920bc73235abb48ead4be65444,])/QUORUM
{noformat}

The data read, and the digest read that errors out.
{noformat}
DEBUG [pool-2-thread-39] 2011-12-08 21:07:21,836 StorageProxy.java (line 648) 
reading data from /192.168.136.231
TRACE [pool-2-thread-39] 2011-12-08 21:07:21,836 MessagingService.java (line 
412) /192.168.128.229 sending READ version 3 to 3242@/192.168.136.231
DEBUG [pool-2-thread-39] 2011-12-08 21:07:21,869 StorageProxy.java (line 668) 
reading digest from /192.168.128.226
TRACE [pool-2-thread-39] 2011-12-08 21:07:21,870 MessagingService.java (line 
412) /192.168.128.229 sending READ version 3 to 3280@/192.168.128.226
{noformat}

Failing to parse the reply:
{noformat}
DEBUG [Thread-8] 2011-12-08 21:07:21,877 IncomingTcpConnection.java (line 149) 
Message body is 47 bytes
TRACE [Thread-8] 2011-12-08 21:07:21,885 MessagingService.java (line 502) 
/192.168.128.229 received REQUEST_RESPONSE from 3280@/192.168.136.234
DEBUG [RequestResponseStage:18] 2011-12-08 21:07:21,885 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
3280@/192.168.136.234
DEBUG [RequestResponseStage:18] 2011-12-08 21:07:21,891 
AbstractRowResolver.java (line 64) Preprocessing 47 bytes
ERROR [RequestResponseStage:18] 2011-12-08 21:07:21,903 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[RequestResponseStage:18,5,main]
{noformat}

It sure looks like something fishy is going on with the message IDs -- we sent 
3280@ to .226, but here we are processing a response from .234.  It's also 
worth noting that the message size (47 bytes) is neither a digest reply (21 
bytes) or the data for the read here (16460 bytes), BUT we do see other 47 byte 
replies in the log around this time, that are replies to a Mutation sent 
earlier:

{noformat}
DEBUG [Thread-15] 2011-12-08 21:07:21,891 IncomingTcpConnection.java (line 149) 
Message body is 47 bytes
TRACE [Thread-15] 2011-12-08 21:07:21,902 MessagingService.java (line 502) 
/192.168.128.229 received REQUEST_RESPONSE from 3145@/192.168.128.227
{noformat}

I think we have a race in the MessagingService ID/Callback Map somewhere.
                
> Intermittent exceptions seen in cassandra 1.0.5 during Reads.
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3585
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3585
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.5
>         Environment: RHEL 2.6.32-71.el6.x86_64.
> RAM - 28GB
> 11 CPUs of 2.6GHz
>            Reporter: Shantanu
>         Attachments: 3585-v2.txt, 3585.txt, metap_system.log.zip, 
> metap_system.log.zip
>
>
> In my test setup I have cassandra db provisioned with cassandra 0.8.7. The 
> setup is of two data centers. I have upgraded the cassandra to the latest 
> version 1.0.5. I'm seeing following exceptions in cassandra logs -
> ERROR [RequestResponseStage:32] 2011-12-06 14:46:08,150 
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
> Thread[RequestResponseStage:32,5,main]
> java.io.IOError: java.io.EOFException
> at 
> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
> at org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:126)
> at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
> at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
> at 
> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
> ... 6 more
> RF is set to DC1:3,DC2:3 and I/m doing the operations with CL=Local_Quorum.
> I have run nodetool scrub on all the nodes in the ring to verify if it solves 
> the issue but it didn't.
> Thanks,
> Shantanu

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to