[ https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226141#comment-16226141 ]
ASF GitHub Bot commented on ZOOKEEPER-2684: ------------------------------------------- Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/411#discussion_r147883484 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java --- @@ -255,24 +255,23 @@ public void run() { // If session queue != null, then it is also not empty. Request topPending = sessionQueue.poll(); if (request.cxid != topPending.cxid) { - LOG.error( - "Got cxid 0x" - + Long.toHexString(request.cxid) - + " expected 0x" + Long.toHexString( - topPending.cxid) - + " for client session id " - + Long.toHexString(request.sessionId)); - throw new IOException("Error: unexpected cxid for" - + "client session"); + // we can get commit requests that are not at the queue head after + // a session moved (see ZOOKEEPER-2684). We will just pass the + // commit to the next processor and put the pending back with + // a warning, we should not see this often under normal load + LOG.warn("Got request " + request + + " but we are expecting request " + topPending); + sessionQueue.addFirst(topPending); + } else { + /* + * We want to send our version of the request. the + * pointer to the connection in the request + */ + topPending.setHdr(request.getHdr()); --- End diff -- Would you mind explaining why we normally want to send our version of the request and why it is ok not to in this case? > Fix a crashing bug in the mixed workloads commit processor > ---------------------------------------------------------- > > Key: ZOOKEEPER-2684 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.6.0 > Environment: with pretty heavy load on a real cluster > Reporter: Ryan Zhang > Assignee: Ryan Zhang > Priority: Blocker > Attachments: ZOOKEEPER-2684.patch > > > We deployed our build with ZOOKEEPER-2024 and it quickly started to crash > with the following error > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc > clearly something is not right in the new commit processor per session queue > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)