[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

Himanshu Vashishtha (JIRA) Thu, 09 Aug 2012 14:39:21 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432185#comment-13432185
 ]


Himanshu Vashishtha commented on HBASE-6550:
--------------------------------------------

I see :)

I will be glad to make it more simpler. But, its not that difficult...  :P
It basically adds two things: bailout mechanism; and to achieve it, use 
Callable to submit in a RepSink#threadpool.

I wanted to have the bailout functionality for the regionserver handler as part 
of the patch. With this, it gives the opportunity to do cleanup etc in case 
client goes away. Decorating config solves half the purpose. 
Another way is making similar changes at the master cluster regionserver side 
(decorating its config with a lower rpc timeout etc, but that's not desirable 
as its not intra-cluster and we want to give a full try before resending the 
shipment).


bq. Create an unmanaged HConnectionImplementation and an Executor
You mean at class level? In case another master cluster regionserver calls the 
method via another handler, it will wait then?
Or at method level? 

bq.For each batch create new HTable(connection, executor)
apply batch
close create HTable.

Yes, it also happens in the current patch. It closes out the connection, and 
htable's pool after the batch op.

                
> Refactoring ReplicationSink to make it more responsive of cluster health
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6550
>                 URL: https://issues.apache.org/jira/browse/HBASE-6550
>             Project: HBase
>          Issue Type: New Feature
>          Components: replication
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-6550-v1.patch
>
>
> ReplicationSink replicates the WALEdits in the local cluster. It uses native 
> HBase client to insert the mutations. Sometime, it takes a while to process 
> it (may be due to region splitting, gc pause, etc) and it undergoes the 
> retrial phase. 
> It has two repercussions:
> a) The regionserver handler which is serving the request (till now, a 
> priority handler) is blocked for this period.
> b) The caller may get timed out and it will retry it anyway, but the handler 
> serving the ReplicationSink requests is still working.
> Refactoring ReplicationSink to have the following features:
> a) Making it more configurable (have its own number of retrial limit, 
> connection timeout, etc)
> b) Add a fail fast behavior so that it bails out in case caller is timedout, 
> or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

Reply via email to