[ https://issues.apache.org/jira/browse/CASSANDRA-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166449#comment-13166449 ]
Jonathan Ellis commented on CASSANDRA-3585: ------------------------------------------- Some interesting things from the trace log: The request that errors out: {noformat} DEBUG [pool-2-thread-39] 2011-12-08 21:07:21,815 StorageProxy.java (line 627) Command/ConsistencyLevel is SliceByNamesReadCommand(table='KeyspaceMetadata', key=313030303237383033, columnParent='QueryPath(columnFamilyName='CF_MailboxSummary', superColumnName='null', columnName='null')', columns=[666f6c6465725f73756d6d6172795f4c057f31b9ba31138e6c1ff57f552f66,666f6c6465725f73756d6d6172795f5112276526f2330392edfc4b47cf4725,666f6c6465725f73756d6d6172795ff75e02920bc73235abb48ead4be65444,])/QUORUM {noformat} The data read, and the digest read that errors out. {noformat} DEBUG [pool-2-thread-39] 2011-12-08 21:07:21,836 StorageProxy.java (line 648) reading data from /192.168.136.231 TRACE [pool-2-thread-39] 2011-12-08 21:07:21,836 MessagingService.java (line 412) /192.168.128.229 sending READ version 3 to 3242@/192.168.136.231 DEBUG [pool-2-thread-39] 2011-12-08 21:07:21,869 StorageProxy.java (line 668) reading digest from /192.168.128.226 TRACE [pool-2-thread-39] 2011-12-08 21:07:21,870 MessagingService.java (line 412) /192.168.128.229 sending READ version 3 to 3280@/192.168.128.226 {noformat} Failing to parse the reply: {noformat} DEBUG [Thread-8] 2011-12-08 21:07:21,877 IncomingTcpConnection.java (line 149) Message body is 47 bytes TRACE [Thread-8] 2011-12-08 21:07:21,885 MessagingService.java (line 502) /192.168.128.229 received REQUEST_RESPONSE from 3280@/192.168.136.234 DEBUG [RequestResponseStage:18] 2011-12-08 21:07:21,885 ResponseVerbHandler.java (line 44) Processing response on a callback from 3280@/192.168.136.234 DEBUG [RequestResponseStage:18] 2011-12-08 21:07:21,891 AbstractRowResolver.java (line 64) Preprocessing 47 bytes ERROR [RequestResponseStage:18] 2011-12-08 21:07:21,903 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[RequestResponseStage:18,5,main] {noformat} It sure looks like something fishy is going on with the message IDs -- we sent 3280@ to .226, but here we are processing a response from .234. It's also worth noting that the message size (47 bytes) is neither a digest reply (21 bytes) or the data for the read here (16460 bytes), BUT we do see other 47 byte replies in the log around this time, that are replies to a Mutation sent earlier: {noformat} DEBUG [Thread-15] 2011-12-08 21:07:21,891 IncomingTcpConnection.java (line 149) Message body is 47 bytes TRACE [Thread-15] 2011-12-08 21:07:21,902 MessagingService.java (line 502) /192.168.128.229 received REQUEST_RESPONSE from 3145@/192.168.128.227 {noformat} I think we have a race in the MessagingService ID/Callback Map somewhere. > Intermittent exceptions seen in cassandra 1.0.5 during Reads. > ------------------------------------------------------------- > > Key: CASSANDRA-3585 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3585 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.5 > Environment: RHEL 2.6.32-71.el6.x86_64. > RAM - 28GB > 11 CPUs of 2.6GHz > Reporter: Shantanu > Attachments: 3585-v2.txt, 3585.txt, metap_system.log.zip, > metap_system.log.zip > > > In my test setup I have cassandra db provisioned with cassandra 0.8.7. The > setup is of two data centers. I have upgraded the cassandra to the latest > version 1.0.5. I'm seeing following exceptions in cassandra logs - > ERROR [RequestResponseStage:32] 2011-12-06 14:46:08,150 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[RequestResponseStage:32,5,main] > java.io.IOError: java.io.EOFException > at > org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71) > at org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:126) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at > org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100) > at > org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81) > at > org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64) > ... 6 more > RF is set to DC1:3,DC2:3 and I/m doing the operations with CL=Local_Quorum. > I have run nodetool scrub on all the nodes in the ring to verify if it solves > the issue but it didn't. > Thanks, > Shantanu -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira