[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942104#comment-13942104
 ] 

Himanshu Vashishtha commented on HBASE-10278:
---------------------------------------------

Yes, I agree. Adding even one level of indirection increases the number of
context switches, as seen in the last set of experiments.

I changed the model so the SRs does the syncing directly as it does now.
(No extra  pool). So, there is no extra context switch for 'normal' case.

When things go bad for the current pipeline, SwitchMonitor does the syncing
and interrupt these existing SR's and replaces them with new SR's.

We switch when things go bad, and it is OK to add little cost when we are
switching because 1) it is rare case in comparison to regular hdfs-sync
calls, 2) it occurs when things are bad from HDFS pipeline point of view,
and cost of adding new SRs is very less as compare to bad pipeline.

On other hand, things become interesting since SFs are reusable objects,
(it is tied to a rs-handler), AND, both SwitchMonitor and SyncRunner race
to mark the SF as done. And, as soon as it is done, the handler put them
back in the RB. I am now working on to handle this exact case, rest looks
ok.





> Provide better write predictability
> -----------------------------------
>
>                 Key: HBASE-10278
>                 URL: https://issues.apache.org/jira/browse/HBASE-10278
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>         Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, 
> SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to