Aman Poonia created HBASE-28836:
-----------------------------------
Summary: Parallelize the archival of compacted files
Key: HBASE-28836
URL: https://issues.apache.org/jira/browse/HBASE-28836
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 2.5.10
Reporter: Aman Poonia
Assignee: Aman Poonia
While splitting a region in hbase it has to cleanup compacted files for
bookkeeping.
Currently we do it sequentially and that is good enough because for hdfs it is
a fast operation. When we do the same in s3 it becomes a issue. We need to
paralleize this to make it faster.
{code:java}
// code placeholder
for (File file : toArchive) {
// if its a file archive it
try {
LOG.trace("Archiving {}", file);
if (file.isFile()) {
// attempt to archive the file
if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) {
LOG.warn("Couldn't archive " + file + " into backup directory: " +
baseArchiveDir);
failures.add(file);
}
} else {
// otherwise its a directory and we need to archive all files
LOG.trace("{} is a directory, archiving children files", file);
// so we add the directory name to the one base archive
Path parentArchiveDir = new Path(baseArchiveDir, file.getName());
// and then get all the files from that directory and attempt to
// archive those too
Collection<File> children = file.getChildren();
failures.addAll(resolveAndArchive(fs, parentArchiveDir, children,
start));
}
} catch (IOException e) {
LOG.warn("Failed to archive {}", file, e);
failures.add(file);
}
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)