[ https://issues.apache.org/jira/browse/HBASE-20003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413378#comment-16413378 ]
Chance Li commented on HBASE-20003: ----------------------------------- [~anoop.hbase] Sir, Excellent doc, I have a little question. To consider a much more special case where the Tertiary success but the Secondary failure(not down), and then the primary goes down without changing the META. This scenario looks like that the RS hosting the Secondary replica reaches the Global Memstore size upper limit(maybe because other table's regions' heavy workload). It's not similar to the case you mentioned. The questions are: 1) Which replica is eligible to become a new primary? (the last mutation(s)(batched) are only in Tertiary but not in Secondary). 2) Losing the state of replica, these two replica will keep the inconsistent state for a while until new replica goes up (the new primary will start flushing). That means the client sometimes can get data, sometimes can't. It's a very bad thing. How can we avoid it? Basically, the architecture looks like moving the pipline from the WAL to region replication, and it will improve MTTR , so it's acceptable. But the different is there will have more thing that will impact the pipeline, such as Global Memstore size upper limit, then the replica's pipeline may be more volatile than WAL's, and this may lead to uncontrolled replica failed, also lead to uncontrolled flush. Finally, Do Hbase consider the geographic locality? > WALLess HBase on Persistent Memory > ---------------------------------- > > Key: HBASE-20003 > URL: https://issues.apache.org/jira/browse/HBASE-20003 > Project: HBase > Issue Type: New Feature > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Priority: Major > > This JIRA aims to make use of persistent memory (pmem) technologies in HBase. > One such usage is to make the Memstore to reside on pmem. Making a persistent > memstore would remove the need for WAL and paves way for a WALLess HBase. > The existing region replica feature could be used here and ensure the data > written to memstores are synchronously replicated to the replicas and ensure > strong consistency of the data. (pipeline model) > Advantages : > - Data Availability : Since the data across replicas are consistent > (synchronously written) our data is always 100 % available. > - Lower MTTR : It becomes easier/faster to switch over to the replicas on a > primary region failure as there is no WAL replay involved. Building the > memstore map data also is much faster than reading the WAL and replaying the > WAL. > - Possibility of bigger memstores : These pmems are designed to have more > memory than DRAMs so it would also enable us to have bigger sized memstores > which leads to lesser flushes/compaction IO. > - Removes the dependency of HDFS on the write path > Initial PoC has been designed and developed. Testing is underway and we would > publish the PoC results along with the design doc sooner. The PoC doc will > talk about the design decisions, the libraries considered to work with these > pmem devices, pros and cons of those libraries and the performance results. > Note : Next gen memory technologies using 3DXPoint gives persistent memory > feature. Such memory DIMMs are soon to appear in the market. The PoC is done > around Intel's ApachePass (AEP) -- This message was sent by Atlassian JIRA (v7.6.3#76005)