[ 
https://issues.apache.org/jira/browse/HBASE-24619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24619:
--------------------------------------

    Assignee: Guanghao Zhang

> Try compact the recovered hfiles firstly after region online
> ------------------------------------------------------------
>
>                 Key: HBASE-24619
>                 URL: https://issues.apache.org/jira/browse/HBASE-24619
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.3.0
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>
> As discussed in HBASE-23739 and in HBASE-24632, there may have many recovered 
> hfiles. Should find a better way to compact them firstly after region online.
>  
> For instance (quoting our [~anoop.hbase]):
> "Assume there were some small files because of flush but never got compacted 
> before the RS down happened. We will look for the possible candidate from 
> oldest files and in all chance the very old files would get excluded because 
> of the size math. But It is possible that new flushed files would get 
> selected. And we have the max files to compact config also which is 10 by 
> default. Even these small files count alone might be >10. If there are say 15 
> WAL files to split, for sure we will have at least 15 small HFiles.
> My thinking was this. After the region open, we have to make sure these small 
> files are compacted in one go and we should not even consider the max files 
> limit for this compaction. Also to note that this files might not even have 
> the DBE/compression etc being applied. Ya coding wise am not sure how clean 
> it might come."
>  
> And from our [~pankaj2461]
>  
> "...concern is the compaction after region open, which impact MTTR due to 
> heavy IO in large cluster with many outstanding WALs"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to