[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764837#comment-15764837
 ] 

Joep Rottinghuis commented on HBASE-17018:
------------------------------------------

Thanks for the feedback Sangjin.
(1) The code to transition states is indeed not complete yet. I agree that the 
ExceptionListener and information we get back from the BufferedMutator will be 
indicative for error conditions. I was thinking what upon the first successful 
submission when in bad state, we'd immediately transition to the transitioning 
state, but we might want to hedge there before we go back to a good state to 
avoid flapping.

(2) You're right that the two-tiered timeout is more complex than a single one. 
I was planning to add a jmx bean to be able to poke the coordinator to go to a 
"BAD" state in advance of HBase maintenance and/or to force it to spool for 
other reasons. I was also thinking of making the timeout dynamic (perhaps 
through a bean, perhaps through other mechanisms). If the timeout can reduce 
we'd need some reasonable response time for the new timeout to take affect. The 
final reason why I chose the double timeout is to allow an exception handler to 
call back into the coordinator on a different thread. We could make an 
assumption that the exception handler will be called only when the last 
submission also times out, but I think that might be making too strong of an 
assumption of the current implementation of the BufferedMutatorImpl. That said, 
I'll have to work out a little more of the details of state transitioning to 
know for sure if two tiers is indeed needed.

(3) I agree that the timeout of 0 is somewhat explicit. The design was aimed to 
separate concerns of the coordinator and the processor (for design and testing 
purposes). Similarly I didn't want to bleed the state to the flusher, but 
simply have it ask whether it should flush or not. If the state itself is 
exposed, then the logic will be spread between the coordinator, the processor 
and the flusher. One could argue that this is already the case, so I'll have to 
re-examine this. When state transitioning evolves, it might be needed to give 
up on the strict separation.

(4) Indeed if I were to wrap the put calls, it would alleviate the out of order 
submissions between puts and flushes. It would mean I have to change the 
flushCount in the SpoolingBufferedMutatorSubmission from a final field set in 
the constructor to a volatile (or otherwise synchnonized) setter. Were you 
thinking I'd add a new interface around the BlockingQueue, or simply have a 
local method and hope the current code and later maintainers keep the 
discipline to always call a local method to enqueue? Alternatively the entire 
construction of the submission could be pushed into a separate class that keeps 
the flushCount and does object creation and queue submission under a single 
lock (perhaps simply a synchronized method). I can see if the coordinator could 
do this work so we can avoid yet another class.

One open item with respect to timeouts is that there currently isn't a way to 
make the spoolingBufferedMutator have a different timeout than the wrapped 
BufferedMutatorImpl. There is only one params passed in. This is a challenge in 
terms of coordinating the timeouts descibed above, but also in terms of 
closing. If the flush in close times out, we might not close. If the close 
times out, we may not have time to shut the queues down. We may have to divvy 
up the timeout between the various needed operations. TDB how that will work.

> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, 
> HBASE-17018.master.004.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.
> https://reviews.apache.org/r/54882/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to