[ https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240817#comment-14240817 ]
Jingcheng Du commented on HBASE-11861: -------------------------------------- Hi Jon [~jmhsieh], some further comments. bq. I don't think we want to scan all the mob files to do a compaction on a single store. Also, because of splits and merges, there could be other del mob files that are relevant that have a start key earlier or later that cover the range in a particular store. I think we'll have to do some start key and end key tracking in the delmob files and the mob files to reduce the candidate list. Actually we could scan all the mob files (read the start/stop keys from the metadata of each file) in the region server, region could read these information from the region server. For the del file across regions, we need to create ref files for them just like the way for normal mob files. In each store, all of the file whose start key is between start/stop keys of the current region are owned by the current region. They will be candidates for the mob compaction. Then we will find the small files in these candidates, then we will scan the del file to find the invalid files. After that we'll rewrite them to a new mob file. Hi Jon, In your ideas, where should we run the mob compaction? In each region or in a single place for example HMaster? Please advise. Thanks. > Native MOB Compaction mechanisms. > --------------------------------- > > Key: HBASE-11861 > URL: https://issues.apache.org/jira/browse/HBASE-11861 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners > Affects Versions: 2.0.0 > Reporter: Jonathan Hsieh > Attachments: 141030-mob-compaction.pdf, mob compaction.pdf > > > Currently, the first cut of mob will have external processes to age off old > mob data (the ttl cleaner), and to compact away deleted or over written data > (the sweep tool). > From an operational point of view, having two external tools, especially one > that relies on MapReduce is undesirable. In this issue we'll tackle > integrating these into hbase without requiring external processes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)