[jira] [Commented] (CASSANDRA-3585) Intermittent exceptions seen in cassandra 1.0.5 during Reads.

Jonathan Ellis (Commented) (JIRA) Sat, 10 Dec 2011 07:56:01 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166904#comment-13166904
 ]


Jonathan Ellis commented on CASSANDRA-3585:
-------------------------------------------

Figured it out.  This is actually a second manifestation of CASSANDRA-3577, a 
bug in the multi-DC write optimization.  Pasting from there: 

bq. Node A (DC1) sends a write to node B (DC2), which forwards to node C (DC2). 
 Node C replies to node A with the message ID it received from node B.  If the 
message generation on A and B is far enough apart, then A will not have a 
callback for the reply and all you will see happen is the write timeout (at CL 
> ONE).  But if A *does* have a callback (for a different operation) waiting, 
then A will try to apply the mutation response to that callback, which (if the 
callback is for a read) will result in the error see in CASSANDRA-3585.

For 1.0.6 we've disabled that optimization; for 1.1 we've fixed it by 
pre-generating extra callback IDs on the coordinator (node A in this example) 
and forwarding those cross-DC as well.
                
> Intermittent exceptions seen in cassandra 1.0.5 during Reads.
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3585
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3585
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.5
>         Environment: RHEL 2.6.32-71.el6.x86_64.
> RAM - 28GB
> 11 CPUs of 2.6GHz
>            Reporter: Shantanu
>         Attachments: 3585-v2.txt, 3585.txt, CassandraLogs.tar.bz2, 
> metap_system.log.zip, metap_system.log.zip
>
>
> In my test setup I have cassandra db provisioned with cassandra 0.8.7. The 
> setup is of two data centers. I have upgraded the cassandra to the latest 
> version 1.0.5. I'm seeing following exceptions in cassandra logs -
> ERROR [RequestResponseStage:32] 2011-12-06 14:46:08,150 
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
> Thread[RequestResponseStage:32,5,main]
> java.io.IOError: java.io.EOFException
> at 
> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
> at org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:126)
> at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
> at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
> at 
> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
> ... 6 more
> RF is set to DC1:3,DC2:3 and I/m doing the operations with CL=Local_Quorum.
> I have run nodetool scrub on all the nodes in the ring to verify if it solves 
> the issue but it didn't.
> Thanks,
> Shantanu

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3585) Intermittent exceptions seen in cassandra 1.0.5 during Reads.

Reply via email to