[jira] [Commented] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster

stack (JIRA) Wed, 08 May 2013 12:33:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652253#comment-13652253
 ]


stack commented on HBASE-7006:
------------------------------

[~jeffreyz] Thanks.

I asked about zxid.

"I think you mean the zxid? That's a 64bit number where the lower
32bits are the xid and the upper 32 bits are the epoch. The xid
increases for each write, the epoch increases when there is a leader
change. The zxid should always only increase. There was a bug where
the lower 32bits could roll over, however that resulted in the epoch
number increasing as well (64bits++) - so the constraint was
maintained (but the cluster would fail/lockup for another issue, I
fixed that in recent releases though...... Now
when that is about to happen it forces a new leader election)."

Above is from our Patrick Hunt.  Says fix is in Apache ZK (3.3.5, 3.4.4).

If you look at tail of the below issue, you will see an hbase favorite user 
running into rollover issue:

https://issues.apache.org/jira/browse/ZOOKEEPER-1277

Let me make sure we add to notes that folks should upgrade to these versions of 
zk.

                
> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, 
> hbase-7006-combined-v3.patch, hbase-7006-combined-v4.patch, 
> hbase-7006-combined-v5.patch, LogSplitting Comparison.pdf, 
> ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 
> 1700 WALs to replay.  Replay took almost an hour.  It looks like it could run 
> faster that much of the time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster

Reply via email to