[ https://issues.apache.org/jira/browse/HBASE-24619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guanghao Zhang reassigned HBASE-24619: -------------------------------------- Assignee: Guanghao Zhang > Try compact the recovered hfiles firstly after region online > ------------------------------------------------------------ > > Key: HBASE-24619 > URL: https://issues.apache.org/jira/browse/HBASE-24619 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.3.0 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Priority: Major > > As discussed in HBASE-23739 and in HBASE-24632, there may have many recovered > hfiles. Should find a better way to compact them firstly after region online. > > For instance (quoting our [~anoop.hbase]): > "Assume there were some small files because of flush but never got compacted > before the RS down happened. We will look for the possible candidate from > oldest files and in all chance the very old files would get excluded because > of the size math. But It is possible that new flushed files would get > selected. And we have the max files to compact config also which is 10 by > default. Even these small files count alone might be >10. If there are say 15 > WAL files to split, for sure we will have at least 15 small HFiles. > My thinking was this. After the region open, we have to make sure these small > files are compacted in one go and we should not even consider the max files > limit for this compaction. Also to note that this files might not even have > the DBE/compression etc being applied. Ya coding wise am not sure how clean > it might come." > > And from our [~pankaj2461] > > "...concern is the compaction after region open, which impact MTTR due to > heavy IO in large cluster with many outstanding WALs" > -- This message was sent by Atlassian Jira (v8.3.4#803005)