[ 
https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240817#comment-14240817
 ] 

Jingcheng Du commented on HBASE-11861:
--------------------------------------

Hi Jon [~jmhsieh], some further comments.
bq.  I don't think we want to scan all the mob files to do a compaction on a 
single store. Also, because of splits and merges, there could be other del mob 
files that are relevant that have a start key earlier or later that cover the 
range in a particular store. I think we'll have to do some start key and end 
key tracking in the delmob files and the mob files to reduce the candidate list.
Actually we could scan all the mob files (read the start/stop keys from the 
metadata of each file) in the region server, region could read these 
information from the region server.
For the del file across regions, we need to create ref files for them just like 
the way for normal mob files.
In each store, all of the file whose start key is between start/stop keys of 
the current region are owned by the current region. They will be candidates for 
the mob compaction. Then we will find the small files in these candidates, then 
we will scan the del file to find the invalid files. After that we'll rewrite 
them to a new mob file.

Hi Jon, In your ideas, where should we run the mob compaction? In each region 
or in a single place for example HMaster? Please advise. Thanks.

> Native MOB Compaction mechanisms.
> ---------------------------------
>
>                 Key: HBASE-11861
>                 URL: https://issues.apache.org/jira/browse/HBASE-11861
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hsieh
>         Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
>
>
> Currently, the first cut of mob will have external processes to age off old 
> mob data (the ttl cleaner), and to compact away deleted or over written data 
> (the sweep tool).  
> From an operational point of view, having two external tools, especially one 
> that relies on MapReduce is undesirable.  In this issue we'll tackle 
> integrating these into hbase without requiring external processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to