Authur Wang created HADOOP-19307:
------------------------------------
Summary: Add option to add parent directory of source directories
to target directories
Key: HADOOP-19307
URL: https://issues.apache.org/jira/browse/HADOOP-19307
Project: Hadoop Common
Issue Type: New Feature
Components: tools/distcp
Affects Versions: 3.0.0
Environment: hadoop 3.3.1
Reporter: Authur Wang
Currently, when we execute the Hadoop distcp with -update -delete src1/*
src2/* dest command to keep the source and target directories exactly the same。
When either -update or -overwrite is specified, the *contents* of the
source-directories are copied to target, and not the source directories
themselves.
Consider a copy from /source/first/ and /source/second/ to /target/, where the
source paths have the following contents:
hdfs://nn1:8020/source/first/1
hdfs://nn1:8020/source/first/2
hdfs://nn1:8020/source/second/10
hdfs://nn1:8020/source/second/20
distcp2 -update hdfs://nn1:8020/source/first hdfs://nn1:8020/source/second
hdfs://nn2:8020/target
would yield the following contents in /target:
hdfs://nn2:8020/target/1
hdfs://nn2:8020/target/2
hdfs://nn2:8020/target/10
hdfs://nn2:8020/target/20
But, sometimes, we need to preserve parent directories like this:
hdfs://nn1:8020/target/first/1
hdfs://nn1:8020/target/first/2
hdfs://nn1:8020/target/second/10
hdfs://nn1:8020/target/second/20
So, should we introduce an option -preserveParentDir to keep the parent
directories to be copied with -update or -overwrite ?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]