Charles Mason wrote:
As far as I can tell there are two options:
...
2, Image the data stored on the HDFS cluster. Aren't there some big
issues with it not grabbing a consistent image as some updates won't
be flushed? Is there any way to force that, or to make it be
consistent some way, perhaps via snapshoting?
Yes (as others have said on this thread). We need to add a means of snapshotting an hbase cluster sending a signal to all members who on receipt flush their in-memory content to the filesystem writing across the cluser some sort of snapshot label or a manifest of all files that comprise the snapshot. Thereafter, I'd imagine an administrator would start up a big MR job to do a distcp from one filesystem to another out on some other cluster. HBASE-50 is the pertinent issue.

Related, this proposed feature in HDFS looks like it would make snapshotting HDFS a breeze: https://issues.apache.org/jira/browse/HADOOP-3637.

St.Ack

Reply via email to