ndimiduk commented on code in PR #6370: URL: https://github.com/apache/hbase/pull/6370#discussion_r1888442542
########## hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/mapreduce/MapReduceRestoreToOriginalSplitsJob.java: ########## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hbase.backup.mapreduce; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.backup.RestoreJob; +import org.apache.hadoop.hbase.tool.BulkLoadHFiles; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.FSVisitor; +import org.apache.yetus.audience.InterfaceAudience; + +import org.apache.hbase.thirdparty.com.google.common.collect.Lists; + [email protected] Review Comment: Is this job strictly internal? We never expect an operator to call it manually or involve it in some process of their own? Maybe `LimitedPrivate(HBaseInterfaceAudience.TOOLS)` ? ########## hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java: ########## @@ -197,55 +202,56 @@ protected List<byte[]> handleBulkLoad(List<TableName> sTableList) throws IOExcep } } } + mergeSplitBulkloads(activeFiles, archiveFiles, srcTable); + incrementalCopyBulkloadHFiles(tgtFs, srcTable); } - - copyBulkLoadedFiles(activeFiles, archiveFiles); - return pair.getSecond(); } - private void copyBulkLoadedFiles(List<String> activeFiles, List<String> archiveFiles) - throws IOException { - try { - // Enable special mode of BackupDistCp - conf.setInt(MapReduceBackupCopyJob.NUMBER_OF_LEVELS_TO_PRESERVE_KEY, 5); - // Copy active files - String tgtDest = backupInfo.getBackupRootDir() + Path.SEPARATOR + backupInfo.getBackupId(); - int attempt = 1; - while (activeFiles.size() > 0) { - LOG.info("Copy " + activeFiles.size() + " active bulk loaded files. Attempt =" + attempt++); - String[] toCopy = new String[activeFiles.size()]; - activeFiles.toArray(toCopy); - // Active file can be archived during copy operation, - // we need to handle this properly - try { - incrementalCopyHFiles(toCopy, tgtDest); - break; - } catch (IOException e) { - // Check if some files got archived - // Update active and archived lists - // When file is being moved from active to archive - // directory, the number of active files decreases - int numOfActive = activeFiles.size(); - updateFileLists(activeFiles, archiveFiles); - if (activeFiles.size() < numOfActive) { - continue; - } - // if not - throw exception - throw e; + private void mergeSplitBulkloads(List<String> activeFiles, List<String> archiveFiles, Review Comment: 👍 ########## hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupRestoreFactory.java: ########## @@ -43,8 +44,13 @@ private BackupRestoreFactory() { * @return backup restore job instance */ public static RestoreJob getRestoreJob(Configuration conf) { + Class<? extends RestoreJob> defaultCls = + conf.getBoolean(RestoreJob.KEEP_ORIGINAL_SPLITS_OPT, false) Review Comment: So then we'll need a deprecation plan for the old behavior. I believe that we can take it as an accelerated process due to this being beta/experimental. I propose: - default `KEEP_ORIGINAL_SPLITS_OPT = false` for branch-2.6 - default `KEEP_ORIGINAL_SPLITS_OPT = true` for branch-2+ We'll need a release note on the JIRA that calls out this behavior change. Include advice for operators who have any existing backups as they perform the upgrade. ########## hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/RestoreJob.java: ########## @@ -30,6 +30,9 @@ @InterfaceAudience.Private public interface RestoreJob extends Configurable { + + String KEEP_ORIGINAL_SPLITS_OPT = "hbase.backup.restorejob.keep_original_splits"; Review Comment: Please place a constant with the default value for this config next to the key constant. That way there's one definition of default behavior that is easy to identify. As it is now, you have at least two `false` values that need to be kept in sync. ########## hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java: ########## @@ -116,6 +121,7 @@ protected static int getIndex(TableName tbl, List<TableName> sTableList) { * the backup is marked as complete. * @param tablesToBackup list of tables to be backed up */ + @SuppressWarnings("unchecked") Review Comment: nit: can this warning be fixed instead of masked? ########## hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/RestoreJob.java: ########## @@ -30,6 +30,9 @@ @InterfaceAudience.Private public interface RestoreJob extends Configurable { + + String KEEP_ORIGINAL_SPLITS_OPT = "hbase.backup.restorejob.keep_original_splits"; Review Comment: nit: for configuration points that we interact with primarily through the `Configuration` objects, we usually call these variables `FOO_KEY`, not `FOO_OPT`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
