[
https://issues.apache.org/jira/browse/HBASE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved HBASE-28836.
----------------------------------
Fix Version/s: 2.7.0
3.0.0-beta-2
2.5.11
2.6.2
Hadoop Flags: Reviewed
Resolution: Fixed
> Parallelize the archival of compacted files
> --------------------------------------------
>
> Key: HBASE-28836
> URL: https://issues.apache.org/jira/browse/HBASE-28836
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 2.5.10
> Reporter: Aman Poonia
> Assignee: Aman Poonia
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.5.11, 2.6.2
>
>
> While splitting a region in hbase it has to cleanup compacted files for
> bookkeeping.
>
> Currently we do it sequentially and that is good enough because for hdfs as
> it is a fast operation. When we do the same in s3 it becomes a issue. We need
> to paralleize this to make it faster.
> {code:java}
> // code placeholder
> for (File file : toArchive) {
> // if its a file archive it
> try {
> LOG.trace("Archiving {}", file);
> if (file.isFile()) {
> // attempt to archive the file
> if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) {
> LOG.warn("Couldn't archive " + file + " into backup directory: "
> + baseArchiveDir);
> failures.add(file);
> }
> } else {
> // otherwise its a directory and we need to archive all files
> LOG.trace("{} is a directory, archiving children files", file);
> // so we add the directory name to the one base archive
> Path parentArchiveDir = new Path(baseArchiveDir, file.getName());
> // and then get all the files from that directory and attempt to
> // archive those too
> Collection<File> children = file.getChildren();
> failures.addAll(resolveAndArchive(fs, parentArchiveDir, children,
> start));
> }
> } catch (IOException e) {
> LOG.warn("Failed to archive {}", file, e);
> failures.add(file);
> }
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)