[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041321#comment-15041321
 ] 

Duo Zhang commented on HBASE-14790:
-----------------------------------

{quote}
This is clean up of a broken WAL? This is being able to ask each DN what it 
thinks the length is? While this is going on, we would be holding on to the 
hbase handlers not letting response go back to the client? Would we have to do 
some weird accounting where three clients A, B, and C and each written an edit, 
and then the length we get back from exisiting DNs after a crash say does not 
include the edit written by client C... we'll have to figure out how to fail 
client C's write (though we'd moved on from append and were trying to 
sync/hflush the append)?
{quote}
I think we can implement in this way. When a WAL is broken(one datanode fail 
means broken)

1. Open a new WAL synchronously.
2. Write all the un-acked WAL entries to the new WAL file(which means we should 
keep all the un-acked WAL entries).
3. Schedule a background task to close the old WAL file.

We should hold the sync WAL request if we consider that some of WAL entries 
after last sync has already been written out but not acked until we 
successfully write them to new WAL file and get ack back.

And the task in phase 3

1. Doing standard pipeline recovery. Maybe a little difference is that, we can 
truncate the block length to our acked length when negotiating with datanodes.
2. endBlock on each datanode.
3. complete file on namenode.

It is does not matter if the rs is crashed during the recovery because we can 
make sure that the file length after lease recovery should be longer than acked 
length(unless the 3 datanodes are all crashed, we can not handle this using 
hflush).

Thanks. [~stack]

> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to