Jianwei Cui created HBASE-15469:
-----------------------------------

             Summary: Take snapshot by family
                 Key: HBASE-15469
                 URL: https://issues.apache.org/jira/browse/HBASE-15469
             Project: HBase
          Issue Type: Improvement
          Components: snapshots
    Affects Versions: 2.0.0
            Reporter: Jianwei Cui


In our production environment, there are some 'wide' tables in offline cluster. 
The 'wide' table has a number of families, different applications will access 
different families of the table through MapReduce. When some application 
starting to provide online service, we need to copy needed families from 
offline cluster to online cluster. For future write, the inter-cluster 
replication supports setting families for table, we can use it to copy future 
edits for needed families. For existed data, we can take snapshot of the table 
on offline cluster, then exploit {{ExportSnapshot}} to copy snapshot to online 
cluster and clone the snapshot. However, we can only take snapshot for the 
whole table in which many families are not needed for the application, this 
will lead unnecessary data copy. I think it is useful to support taking 
snapshot by family, so that we can only copy needed data.
Possible solution to support such function:
1. Add family names field to the protobuf definition of {{SnapshotDescription}}
2. Allow to set families when taking snapshot in hbase shell, such as:
{code}
   snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => 
true}
{code}
3. Add family names to {{SnapshotDescription}} in client side
4. Read family names from {{SnapshotDescription}} in Master/Regionserver, keep 
only requested families when taking snapshot for region.
Discussions and suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to