[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730082#comment-15730082
 ] 

stack commented on HBASE-17018:
-------------------------------

Looking at attached PDF (in future, attach google doc with comments permissions 
for those w/ URL... then we can write comment on the doc itself):

You are setting timestamp on the Puts. You are not relying on HBase for this?

How long would you allow spooling to go on? What you thinking?

Agree focus should be on spooling to distributed storage. If wrote using hbase 
WAL, could use WALPlayer to replay. Just a suggestion. Might be too heavyweight 
for what you need. FYI, lots of support in hbase for serializing pojos as 
protobufs and back again.

Section 'Replaying spooled puts' especially #2 is awkward. Needs working 
through.

Connection needs to be able to ride over vagaries such as the coming and going 
of cluster (Application servers can't be expected to check/reset Connections): 
i.e. most scenarios in this doc are general hbase connection issues that we 
should have test coverage for and deal with; if missing from general client, 
its a bug. Client is not currently interruptible which is wrong.I like where 
you finger ClusterStatusListener as a font for connection state signal. TODO.

See subtask on this issue for cut at making it so can supply own BM.

AP is convoluted. Better would be an Interface on AP that exposes API you'd 
need to run your spooling BM. Its YAI (yet-another-interface) but AP is 
inscrutable and subject to change; not to be depended on.

Thanks for posting the doc.













> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: YARN-4061 HBase requirements for fault tolerant 
> writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to