[ https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-9873: ------------------------- Component/s: wal MTTR Priority: Critical (was: Major) Made this critical since its about MTTR > Some improvements in hlog and hlog split > ---------------------------------------- > > Key: HBASE-9873 > URL: https://issues.apache.org/jira/browse/HBASE-9873 > Project: HBase > Issue Type: Improvement > Components: MTTR, wal > Reporter: Liu Shaohui > Priority: Critical > Labels: failover, hlog > > Some improvements in hlog and hlog split > 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs > split in failover. Now hlogs cleaning only be run in rolling hlog writer. > 2) Add a background hlog compaction thread to compaction the hlog: remove the > hlog entries whose data have been flushed to hfile. The scenario is that in a > share cluster, write requests of a table may very little and periodical, a > lots of hlogs can not be cleaned for entries of this table in those hlogs. > 3) Rely on the smallest of all biggest hfile's seqId of previous served > regions to ignore some entries. Facebook have implemented this in HBASE-6508 > and we backport it to hbase 0.94 in HBASE-9568. > 4) Support running multiple hlog splitters on a single RS and on > master(latter can boost split efficiency for tiny cluster) > 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog > to slices(configurable size, eg hdfs trunk size 64M) > support concurrent multiple split tasks on a single hlog file slice > 6) Do not cancel the timeout split task until one task reports it succeeds > (avoids scenario where split for a hlog file fails due to no one task can > succeed within the timeout period ), and and reschedule a same split task to > reduce split time ( to avoid some straggler in hlog split) > 7) Consider the hlog data locality when schedule the hlog split task. > Schedule the hlog to a splitter which is near to hlog data. > 8) Support multi hlog writers and switching to another hlog writer when long > write latency to current hlog due to possible temporary network spike? > This is a draft which lists the improvements about hlog we try to implement > in the near future. Comments and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.1#6144)