[ 
https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203878#comment-15203878
 ] 

Jianwei Cui commented on HBASE-15469:
-------------------------------------

For our case, the goal is to copy existed data for given families and clone the 
snapshot, so that creating a new table with only the subset families is a 
better choice. For the restore case, the goal is to rollback the table to some 
history state? the snapshot with only a subset of families may not represent 
any history state of the table, so that should not be used for the restore 
purpose.
{quote}
we may block the restore of snapshots with only a subset of families. and that 
will solve the strange situation of restore. 
and when we clone we just create a new table with only the subset. In theory 
this is more clear for the end user. 
{quote}
Agreed with your analysis [~mbertozzi], and also expect other opinions and 
cases. Thanks!

> Take snapshot by family
> -----------------------
>
>                 Key: HBASE-15469
>                 URL: https://issues.apache.org/jira/browse/HBASE-15469
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
>         Attachments: HBASE-15469-v1.patch, HBASE-15469-v2.patch
>
>
> In our production environment, there are some 'wide' tables in offline 
> cluster. The 'wide' table has a number of families, different applications 
> will access different families of the table through MapReduce. When some 
> application starting to provide online service, we need to copy needed 
> families from offline cluster to online cluster. For future write, the 
> inter-cluster replication supports setting families for table, we can use it 
> to copy future edits for needed families. For existed data, we can take 
> snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to 
> copy snapshot to online cluster and clone the snapshot. However, we can only 
> take snapshot for the whole table in which many families are not needed for 
> the application, this will lead unnecessary data copy. I think it is useful 
> to support taking snapshot by family, so that we can only copy needed data.
> Possible solution to support such function:
> 1. Add family names field to the protobuf definition of 
> {{SnapshotDescription}}
> 2. Allow to set families when taking snapshot in hbase shell, such as:
> {code}
>    snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => 
> true}
> {code}
> 3. Add family names to {{SnapshotDescription}} in client side
> 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, 
> keep only requested families when taking snapshot for region.
> Discussions and suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to