[ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652253#comment-13652253 ]
stack commented on HBASE-7006: ------------------------------ [~jeffreyz] Thanks. I asked about zxid. "I think you mean the zxid? That's a 64bit number where the lower 32bits are the xid and the upper 32 bits are the epoch. The xid increases for each write, the epoch increases when there is a leader change. The zxid should always only increase. There was a bug where the lower 32bits could roll over, however that resulted in the epoch number increasing as well (64bits++) - so the constraint was maintained (but the cluster would fail/lockup for another issue, I fixed that in recent releases though...... Now when that is about to happen it forces a new leader election)." Above is from our Patrick Hunt. Says fix is in Apache ZK (3.3.5, 3.4.4). If you look at tail of the below issue, you will see an hbase favorite user running into rollover issue: https://issues.apache.org/jira/browse/ZOOKEEPER-1277 Let me make sure we add to notes that folks should upgrade to these versions of zk. > [MTTR] Study distributed log splitting to see how we can make it faster > ----------------------------------------------------------------------- > > Key: HBASE-7006 > URL: https://issues.apache.org/jira/browse/HBASE-7006 > Project: HBase > Issue Type: Bug > Components: MTTR > Reporter: stack > Assignee: Jeffrey Zhong > Priority: Critical > Fix For: 0.95.1 > > Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, > hbase-7006-combined-v3.patch, hbase-7006-combined-v4.patch, > hbase-7006-combined-v5.patch, LogSplitting Comparison.pdf, > ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf > > > Just saw interesting issue where a cluster went down hard and 30 nodes had > 1700 WALs to replay. Replay took almost an hour. It looks like it could run > faster that much of the time is spent zk'ing and nn'ing. > Putting in 0.96 so it gets a look at least. Can always punt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira