[ https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554540#comment-16554540 ]
stack commented on HBASE-18152: ------------------------------- [~elserj] thank you for jumping in. I appreciate being able to chat on this. I'm trying to make a test but it's a bit hard manufacturing. Looking at this was procedure store I'm currently trying to figure why we have a third way of writing wals and was thinking of putting in place fshlog or asyncfs. I appreciate the sort suggestion. I am not sure the edits we're part of the same batch so am unsure it would help. Might be worth trying though? Any other ideas welcome. Thanks. > [AMv2] Corrupt Procedure WAL file; procedure data stored out of order > --------------------------------------------------------------------- > > Key: HBASE-18152 > URL: https://issues.apache.org/jira/browse/HBASE-18152 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 2.0.0 > Reporter: stack > Assignee: stack > Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-18152.master.001.patch, > hbase-hbase-master-ctr-e138-1518143905142-221855-01-000002.hwx.site.log.gz, > pv2-00000000000000000036.log, pv2-00000000000000000047.log, > reading_bad_wal.patch > > > I've seen corruption from time-to-time testing. Its rare enough. Often we > can get over it but sometimes we can't. It took me a while to capture an > instance of corruption. Turns out we are write to the WAL out-of-order which > undoes a basic tenet; that WAL content is ordered in line w/ execution. > Below I'll post a corrupt WAL. > Looking at the write-side, there is a lot going on. I'm not clear on how we > could write out of order. Will try and get more insight. Meantime parking > this issue here to fill data into. -- This message was sent by Atlassian JIRA (v7.6.3#76005)