[ 
https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088938#comment-14088938
 ] 

Lars Hofhansl edited comment on HBASE-11695 at 8/7/14 7:21 AM:
---------------------------------------------------------------

We can do what we do what the CompactionChecker does and add a multiplier.
(As an aside the default multiplier for the CompactionChecker is 1000, so it 
would only check every 10000s = 2h 46m, isn't that too rarely?)

Another option is to set the period like this: max(wakeFrequency, 2*jitter, 
flushInteral/10). I.e. we
# do not wake up more often that wakeFrequency
# do not wake up such that we would request flush of the same region multiple 
times (2*jitter)
# only wakeup often enough to satisfy the flushInterval with an accuracy of 10% 
(flushInterval/10)

The jitter is hardcoded to 20s. wakeFrequency defaults to 10s (it's not 
actually a frequency, btw), and flushInterval defaults to 1h. So with these 
defaults we'd wake up to check every 360s, which seems more like it.

Or maybe just max(wakeFrequency, 2*jitter)... I.e. every 40s with default 
settings.

But maybe that's too complicated and we just define another multiplier, or a 
complete new setting - means another config option, though.



was (Author: lhofhansl):
We can do what we do what the CompactionChecker does and add a multiplier.
(As an aside the default multiplier for the CompactionChecker is 1000, so it 
would only check every 10000s = 2h 46m, isn't that too rarely?)

Another option is to set the period like this: max(wakeFrequency, 2*jitter, 
flushInteral/10). I.e. we
# do not wake up more often that wakeFrequency
# do not wake up such that we would request flush of the same region multiple 
times (2*jitter)
# only wakeup often enough to satisfy the flushInterval with an accuracy of 10%

The jitter is hardcoded to 20s. wakeFrequency defaults to 10s (it's not 
actually a frequency, btw), and flushInterval defaults to 1h. So with these 
defaults we'd wake up to check every 360s, which seems more like it.

Or maybe just max(wakeFrequency, 2*jitter)... I.e. every 40s with default 
settings.

But maybe that's too complicate and we just define another multiplier, or a 
complete new setting - mean another config option, though.


> PeriodicFlusher and WakeFrequency issues
> ----------------------------------------
>
>                 Key: HBASE-11695
>                 URL: https://issues.apache.org/jira/browse/HBASE-11695
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.21
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Critical
>
> We just ran into a flush storm caused by the PeriodicFlusher.
> Many memstore became eligible for flushing at exactly the same time, the 
> effect we've seen is that the exact same region was flushed multiple times, 
> because the flusher wakes up too often (every 10s). The jitter of 20s is 
> larger than that and it takes some time to actually flush the memstore.
> Here's one example. We've seen 100's of these, monopolizing the flush queue 
> and preventing "important" flushes from happening.
> {code}
> 06-Aug-2014 20:11:56  [regionserver60020.periodicFlusher] INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- 
> regionserver60020.periodicFlusher requesting flush for region 
> tsdb,\x00\x00\x0AO\xCF* 
> \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
>  after a delay of 13449
> 06-Aug-2014 20:12:06  [regionserver60020.periodicFlusher] INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- 
> regionserver60020.periodicFlusher requesting flush for region 
> tsdb,\x00\x00\x0AO\xCF* 
> \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
>  after a delay of 14060
> {code}
> So we need to increase the period of the PeriodicFlusher to at least the 
> random jitter, also increase the default random jitter (20s does not help 
> with many regions).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to