[ https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Rodionov resolved HBASE-15227. --------------------------------------- Resolution: Fixed Done. > HBase Backup Phase 3: Fault tolerance (client/server) support > ------------------------------------------------------------- > > Key: HBASE-15227 > URL: https://issues.apache.org/jira/browse/HBASE-15227 > Project: HBase > Issue Type: Task > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > Priority: Major > Labels: backup > Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch > > > System must be tolerant to faults: > # Backup operations MUST be atomic (no partial completion state in the backup > system table) > # Process must detect any type of failures which can result in a data loss > (partial backup or partial restore) > # Proper system table state restore and cleanup must be done in case of a > failure > # Additional utility to repair backup system table and corresponding file > system cleanup must be implemented > h3. Backup > h4. General FT framework implementation > Before actual backup operation starts, snapshot of a backup system table is > taken and system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will > be removed upon backup completion. > In case of *any* server-side failures, client catches errors/exceptions and > handles them: > # Cleans up backup destination (removes partial backup data) > # Cleans up any temporary data > # Deletes any active snapshots of a tables being backed up (during full > backup we snapshot tables) > # Restores backup system table from snapshot > # Deletes backup system table snapshot (we read snapshot name from backup > system table before) > In case of *any* client-side failures: > Before any backup or restore operation run we check backup system table on > *ACTIVE_SNAPSHOT*, if flag is present, operation aborts with a message that > backup repair tool (see below) must be run > h4. Backup repair tool > The command line tool *backup repair* which executes the following steps: > # Reads info of a last failed backup session > # Cleans up backup destination (removes partial backup data) > # Cleans up any temporary data > # Deletes any active snapshots of a tables being backed up (during full > backup we snapshot tables) > # Restores backup system table from snapshot > # Deletes backup system table snapshot (we read snapshot name from backup > system table before) > h4. Detection of a partial loss of data > h5. Full backup > Export snapshot operation (?). > We count files and check sizes before and after DistCp run > h5. Incremental backup > Conversion of WAL to HFiles, when WAL file is moved from active to archive > directory. The code is in place to handle this situation > During DistCp run (same as above) > h3. Restore > This operation does not modify backup system table and is idempotent. No > special FT is required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)