[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038178#comment-15038178 ]
Zhe Zhang commented on HBASE-14790: ----------------------------------- [~stack] {{DataStreamer#block}} tracks the "number of bytes acked". It is returned by {{DFSOutputStream#getBlock}} [~Apache9] I'm still reading your analysis, will get back shortly > Implement a new DFSOutputStream for logging WAL only > ---------------------------------------------------- > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement > Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)