[ https://issues.apache.org/jira/browse/HBASE-28686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867407#comment-17867407 ]
Hudson commented on HBASE-28686: -------------------------------- Results for branch branch-3 [build #254 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/254/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/254/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk17 hadoop3 checks{color} -- For more information [see jdk17 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/254/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} -- Something went wrong with this stage, [check relevant console output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/254//console]. > MapReduceBackupCopyJob should support custom DistCp options > ----------------------------------------------------------- > > Key: HBASE-28686 > URL: https://issues.apache.org/jira/browse/HBASE-28686 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.6.0 > Reporter: Ray Mattingly > Assignee: Ray Mattingly > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1, 3.0.0-beta-2 > > > h4. Problem > The MapReduceBackupCopyJob class provides no means for updating DistCp job > options. This means that you're stuck with defaults, which isn't always > desirable. For example, my workplace would like the freedom to deviate from > at least two DistCp defaults: > # distcp.direct.write — we would like to set this to true, because writing > and renaming tmp files is expensive in S3 (where we store our backups). > # we would also like control over the number of mappers that DistCp will run > h4. Proposed Solution > It is not the prettiest solution, but I'm proposing that we support DistCp > customizations via the given backup client configuration like > [this.|https://github.com/HubSpot/hbase/compare/hubspot-2.6...HubSpot:hbase:backup-distcp-options] > It's necessary to do this conf -> arg conversion because we still want to > use [DistCp's run > method|https://github.com/HubSpot/hadoop/blob/c4c25b0ea2be1c8bca31d86962597060b2630f62/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L134-L171], > which expects args, so as to not change any error codes. Hadoop actually > does something similar, but in the opposite direction — the DistCp job has > logic to convert the args back to configurations (lol). > Further, the DistCp API is really unfortunately designed for programmatic > use, so it doesn't leave us great alternatives. For example, it doesn't > matter what you pass in as DistCpOptions to the constructor if you use the > run method, your options will be overwritten based on the args that you pass > in. Alternatively, if you pass in the DistCpOptions in the constructor and > use DistCp#execute or DistCp#createAndSubmitJob, then you get none of the > error specificity! -- This message was sent by Atlassian Jira (v8.20.10#820010)