[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765011#comment-15765011
 ] 

Sangjin Lee commented on HBASE-17018:
-------------------------------------

{quote}
Indeed if I were to wrap the put calls, it would alleviate the out of order 
submissions between puts and flushes. It would mean I have to change the 
flushCount in the SpoolingBufferedMutatorSubmission from a final field set in 
the constructor to a volatile (or otherwise synchnonized) setter. Were you 
thinking I'd add a new interface around the BlockingQueue, or simply have a 
local method and hope the current code and later maintainers keep the 
discipline to always call a local method to enqueue? Alternatively the entire 
construction of the submission could be pushed into a separate class that keeps 
the flushCount and does object creation and queue submission under a single 
lock (perhaps simply a synchronized method). I can see if the coordinator could 
do this work so we can avoid yet another class.
{quote}

I suppose there are a couple of options each of which has pros and cons. First 
was extending {{LinkedBlockingQueue}} within this class to add the logic of 
handling the flush count strongly. Something like:
{code}
  final BlockingQueue<SpoolingBufferedMutatorSubmission> inbound =
      new LinkedBlockingQueue<>() {
        public synchronized void put(SpoolingBufferedMutatorSubmission s) 
throws InterruptedException {
          // handle and set the flush count on the submission based on the type
          super.put(s);
        }
    };
{code}

That way, any new code within this class that would put items in the inbound 
queue would benefit from the same treatment. One drawback is that {{put()}} is 
not the only way to add items to the queue (there is {{add()}}), and to be 
complete we'd need to either override {{add()}} too or ensure no one calls it 
(by overriding it to throw an exception or something). This overridden class 
can be totally private to the SBMI.

I also thought of simply doing this as a private method within SBMI so we don't 
need to play with extending the LinkedBlockingQueue. But the downside of that 
is we need to ensure all new code that puts items in that inbound queue goes 
through that method.

> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, 
> HBASE-17018.master.004.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.
> https://reviews.apache.org/r/54882/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to