Steven Rand created MAPREDUCE-7226:
--------------------------------------
Summary: option to not fail distcp when source file is being
written to
Key: MAPREDUCE-7226
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7226
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: distcp
Affects Versions: 3.2.0
Reporter: Steven Rand
If a file is being written to during a distcp, then we'll throw an IOE because
its size at the target will be less than its size at the source:
{code:java}
private void compareFileLengths(CopyListingFileStatus source, Path target,
Configuration configuration, long targetLen)
throws IOException {
final Path sourcePath = source.getPath();
FileSystem fs = sourcePath.getFileSystem(configuration);
long srcLen = fs.getFileStatus(sourcePath).getLen();
if (srcLen != targetLen)
throw new IOException("Mismatch in length of source:" + sourcePath + "
(" + srcLen +
") and target:" + target + " (" + targetLen + ")");
}
{code}
This happens even when the {{-i}} flag is given to distcp to ignore failures,
since the only exceptions distcp will ignore are instances of
{{CopyReadException}}, and {{compareFileLengths}} just throws an IOE.
This means that distcp can't be used for certain workflows. The particular one
that I have in mind is incrementally copying data from a production cluster to
a DR cluster. This can be handled nicely using distcp with the {{-update}} and
{{-delete}} flags, but the problem is that clients might be modifying the
production cluster while the distcp runs, thereby causing it to fail even when
{{-i}} is given.
One idea is to:
1. Have {{compareFileLengths}} throw a custom exception. It doesn't make sense
to use {{CopyReadException}}, since this isn't a failure on read, but we could
make another subclass of IOE for writes.
2. Have the CopyMapper check for our new custom exception in the same way that
it does for {{CopyReadException}} when the {{-i}} flag is given. Or, if we
don't want to change the behavior of the existing flag, we could add a new flag.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]