[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041234#comment-15041234
 ] 

Phil Yang commented on HBASE-14790:
-----------------------------------

Considering these features:
Hflush is much faster than hsync, especially in pipeline mode. So we have to 
use hflush for hbase writing.
The data in DN that is hflushed but not hsynced may only in memory not disk, 
but it can be read by client.

So if we hflush data to DNs, and it is read by ReplicationSource and 
transferred to slave cluster, then three DNs and RS in master cluster crash. 
And after replaying WALs, slave will have data that master loses...

The only way to prevent any data losses is hsync every time but it is too slow, 
and I think most users can bear data lose to speed up writing operation but can 
not bear slave has more data than master.

Therefore, I think we can do these:
hflush every time, not fsync;
hfsync periodically, for example, default per 1000ms? It can be configured by 
users, and users can also configure that we hfsync each time, so there will not 
have any data loses unless all DNs disk fail...
RS tells "acked length" to ReplicationSource which is the data we hsynced, not 
hflushed. 
ReplicationSource only transfer data which is not larger than acked length. So 
the slave cluster will never have inconsistency.
WAL reading can handle  duplicate entries.
On WAL logging, if we get error on hflush, we open a new file and rewrite this 
entry, and recover/hsync/close old file asynchronously.

> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to