[jira] [Commented] (HBASE-18152) [AMv2] Corrupt Procedure WAL file; procedure data stored out of order

Josh Elser (JIRA) Tue, 24 Jul 2018 10:26:21 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554562#comment-16554562
 ]


Josh Elser commented on HBASE-18152:
------------------------------------

{quote}Looking at this was procedure store I'm currently trying to figure why 
we have a third way of writing wals and was thinking of putting in place fshlog 
or asyncfs.
{quote}
Strong +1 on that. I know Enis was frustrated about this way back when.
{quote}I appreciate the sort suggestion. I am not sure the edits we're part of 
the same batch so am unsure it would help. Might be worth trying though?
{quote}
Yeah, could be hair-brained. Not sure yet.

I'll give it another look and see what I can come up with. I was trying to come 
up with something that doesn't just funnel our proc executors into a single 
synchronized block. But... maybe I don't need to be worried about that? I don't 
have a good feeling for how critical this part is – simpler fix might be to 
just hoist up the synchronization a level higher (e.g. out of pushData and into 
insert, update, delete)

> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> ---------------------------------------------------------------------
>
>                 Key: HBASE-18152
>                 URL: https://issues.apache.org/jira/browse/HBASE-18152
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: HBASE-18152.master.001.patch, 
> hbase-hbase-master-ctr-e138-1518143905142-221855-01-000002.hwx.site.log.gz, 
> pv2-00000000000000000036.log, pv2-00000000000000000047.log, 
> reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing.  Its rare enough. Often we 
> can get over it but sometimes we can't. It took me a while to capture an 
> instance of corruption. Turns out we are write to the WAL out-of-order which 
> undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we 
> could write out of order. Will try and get more insight. Meantime parking 
> this issue here to fill data into.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18152) [AMv2] Corrupt Procedure WAL file; procedure data stored out of order

Reply via email to