[ 
https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045790#comment-16045790
 ] 

stack commented on HBASE-18152:
-------------------------------

Looking at the corruption in 36.log, we are indeed missing stuff off the end of 
the WAL. The missing entries look like they would have been on the end of the 
WAL.  There is a second at least between their event and the master crash. I 
presume in this second its thread messing around trying to persist to the WAL 
store. We are writing edits out-of-order in some circumstance (see experience 
w/ previous log and the attached workaround patch which helps...). This makes 
for possibility of their being holes if we expect events in-order as the smart 
verification check does. Need to dig in on the way we log.

> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> ---------------------------------------------------------------------
>
>                 Key: HBASE-18152
>                 URL: https://issues.apache.org/jira/browse/HBASE-18152
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17537.master.002.patch, 
> HBASE-18152.master.001.patch, pv2-00000000000000000036.log, 
> pv2-00000000000000000047.log, reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing.  Its rare enough. Often we 
> can get over it but sometimes we can't. It took me a while to capture an 
> instance of corruption. Turns out we are write to the WAL out-of-order which 
> undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we 
> could write out of order. Will try and get more insight. Meantime parking 
> this issue here to fill data into.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to