[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333224#comment-15333224
 ] 

Tianying Chang commented on HBASE-16030:
----------------------------------------

[~enis] thanks for reviewing the patch. Yes, 5 minutes is not enough, we would 
like to see the flush uniformly distributed through the one hour range in 
online facing production cluster. I am fine if we can make this value 
configurable, therefore larger than 5 min. Will it have a problem if flush 
request is queued and delayed for up to 1 hour? 

BTW, attached a new graph to show the impact of the hourly spike on the 
network/disk/cpu on our new 1.2RC test cluster.

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16030
>                 URL: https://issues.apache.org/jira/browse/HBASE-16030
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.1
>            Reporter: Tianying Chang
>            Assignee: Tianying Chang
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
>         Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, 
> hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to