[ 
https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561495#comment-13561495
 ] 

nkeywal commented on HBASE-5930:
--------------------------------

bq. With this, maybe we will no longer need skipWAL if we can prove that 
deferred flush is as fast as skip WAL. 
In standard database, skipping the WAL is often used when you're doing a 
functional upgrade requiring some unavailability time, i.e.:
- dump
- run batch scripts to update your data
- if anything goes wrong reload the dump

For hundreds of reasons it makes much less sense with HBase, but it could 
happen (some companies don't need 24x24). So we should not remove the skipWAL 
imho, except if it really simplify something internally.


On the patch itself, I have a question on adding some randomness. The scenario 
I'm thinking about is a massive but periodic update on a table: all the regions 
will be written simultaneously, hence flushed simultaneously. That's the main 
use case for this JIRA, and this could hammer the namenode, imho. Except if we 
thing there is enough randomness by having a different flusher by regionserver 
(which may not be the case if all regions servers are started simultaneously). 

As a side note, I would personally like a flush interval of 10 minutes:
- it would help on .META. recovery, especially with the separate wal for .META.
- this allows to have more regions: today, on average and in theory, each 
region takes 50% of an hdfs block size of memory. The more regions we flush 
early, the more empty memstore we have...
                
> Periodically flush the Memstore?
> --------------------------------
>
>                 Key: HBASE-5930
>                 URL: https://issues.apache.org/jira/browse/HBASE-5930
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Assignee: Devaraj Das
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5930-1.patch, 5930-wip.patch
>
>
> A colleague of mine ran into an interesting issue.
> He inserted some data with the WAL disabled, which happened to fit in the 
> aggregate Memstores memory.
> Two weeks later he a had problem with the HDFS cluster, which caused the 
> region servers to abort. He found that his data was lost. Looking at the log 
> we found that the Memstores were not flushed at all during these two weeks.
> Should we have an option to flush memstores periodically. There are obvious 
> downsides to this, like many small storefiles, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to