[ https://issues.apache.org/jira/browse/HBASE-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593774#comment-13593774 ]
Matteo Bertozzi commented on HBASE-7987: ---------------------------------------- [~yuzhih...@gmail.com] The "file-tracking table" is described a bit in the pdf attached to HBASE-7806 ("future" section) and has a completely different idea from the manifest, and is not just snapshot related. anyway for the near term (.94/.96) I still haven't decided, I don't consider this one as super high priority since we have tested the multi-file for months and even on large cluster it was good enough. I'll probably make a patch for this by next week. but I prefer working on making everything working (e.g. merge, rename table, hbck) instead of saying "you can't merge a region if you use snapshots, you can rename a table & co..." and also it will be nice having more metrics to knows the state, how long it takes how many time it fails (the current FlushSnapshot fail every time there's a split or a region move) > Snapshot Manifest file instead of multiple empty files > ------------------------------------------------------ > > Key: HBASE-7987 > URL: https://issues.apache.org/jira/browse/HBASE-7987 > Project: HBase > Issue Type: Improvement > Components: snapshots > Reporter: Matteo Bertozzi > > Currently taking a snapshot means creating one empty file for each file in > the source table directory, plus copying the .regioninfo file for each > region, the table descriptor file and a snapshotInfo file. > during the restore or snapshot verification we traverse the filesystem > (fs.listStatus()) to find the snapshot files, and we open the .regioninfo > files to get the information. > to avoid hammering the NameNode and having lots of empty files, we can use a > manifest file that contains the list of files and information that we need. > To keep the RS parallelism that we have, each RS can write its own manifest. > {code} > message SnapshotDescriptor { > required string name; > optional string table; > optional int64 creationTime; > optional Type type; > optional int32 version; > } > message SnapshotRegionManifest { > optional int32 version; > required RegionInfo regionInfo; > repeated FamilyFiles familyFiles; > message StoreFile { > required string name; > optional Reference reference; > } > message FamilyFiles { > required bytes familyName; > repeated StoreFile storeFiles; > } > } > {code} > {code} > /hbase/.snapshot/<snapshotName> > /hbase/.snapshot/<snapshotName>/snapshotInfo > /hbase/.snapshot/<snapshotName>/<tableName> > /hbase/.snapshot/<snapshotName>/<tableName>/tableInfo > /hbase/.snapshot/<snapshotName>/<tableName>/regionManifest(.n) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira