[jira] [Updated] (HBASE-28836) Parallelize the archival of compacted files

Aman Poonia (Jira) Mon, 16 Sep 2024 00:04:05 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aman Poonia updated HBASE-28836:
--------------------------------
    Description: 
While splitting a region in hbase it has to cleanup compacted files for 
bookkeeping.
 
Currently we do it sequentially and that is good enough because for hdfs as it 
is a fast operation. When we do the same in s3 it becomes a issue. We need to 
paralleize this to make it faster. 
{code:java}
// code placeholder
for (File file : toArchive) {
      // if its a file archive it
      try {
        LOG.trace("Archiving {}", file);
        if (file.isFile()) {
          // attempt to archive the file
          if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) {
            LOG.warn("Couldn't archive " + file + " into backup directory: " + 
baseArchiveDir);
            failures.add(file);
          }
        } else {
          // otherwise its a directory and we need to archive all files
          LOG.trace("{} is a directory, archiving children files", file);
          // so we add the directory name to the one base archive
          Path parentArchiveDir = new Path(baseArchiveDir, file.getName());
          // and then get all the files from that directory and attempt to
          // archive those too
          Collection<File> children = file.getChildren();
          failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, 
start));
        }
      } catch (IOException e) {
        LOG.warn("Failed to archive {}", file, e);
        failures.add(file);
      }
    } {code}

  was:
While splitting a region in hbase it has to cleanup compacted files for 
bookkeeping.
 
Currently we do it sequentially and that is good enough because for hdfs it is 
a fast operation. When we do the same in s3 it becomes a issue. We need to 
paralleize this to make it faster.
{code:java}
// code placeholder
for (File file : toArchive) {
      // if its a file archive it
      try {
        LOG.trace("Archiving {}", file);
        if (file.isFile()) {
          // attempt to archive the file
          if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) {
            LOG.warn("Couldn't archive " + file + " into backup directory: " + 
baseArchiveDir);
            failures.add(file);
          }
        } else {
          // otherwise its a directory and we need to archive all files
          LOG.trace("{} is a directory, archiving children files", file);
          // so we add the directory name to the one base archive
          Path parentArchiveDir = new Path(baseArchiveDir, file.getName());
          // and then get all the files from that directory and attempt to
          // archive those too
          Collection<File> children = file.getChildren();
          failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, 
start));
        }
      } catch (IOException e) {
        LOG.warn("Failed to archive {}", file, e);
        failures.add(file);
      }
    } {code}


> Parallelize the archival of compacted files 
> --------------------------------------------
>
>                 Key: HBASE-28836
>                 URL: https://issues.apache.org/jira/browse/HBASE-28836
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 2.5.10
>            Reporter: Aman Poonia
>            Assignee: Aman Poonia
>            Priority: Major
>              Labels: pull-request-available
>
> While splitting a region in hbase it has to cleanup compacted files for 
> bookkeeping.
>  
> Currently we do it sequentially and that is good enough because for hdfs as 
> it is a fast operation. When we do the same in s3 it becomes a issue. We need 
> to paralleize this to make it faster. 
> {code:java}
> // code placeholder
> for (File file : toArchive) {
>       // if its a file archive it
>       try {
>         LOG.trace("Archiving {}", file);
>         if (file.isFile()) {
>           // attempt to archive the file
>           if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) {
>             LOG.warn("Couldn't archive " + file + " into backup directory: " 
> + baseArchiveDir);
>             failures.add(file);
>           }
>         } else {
>           // otherwise its a directory and we need to archive all files
>           LOG.trace("{} is a directory, archiving children files", file);
>           // so we add the directory name to the one base archive
>           Path parentArchiveDir = new Path(baseArchiveDir, file.getName());
>           // and then get all the files from that directory and attempt to
>           // archive those too
>           Collection<File> children = file.getChildren();
>           failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, 
> start));
>         }
>       } catch (IOException e) {
>         LOG.warn("Failed to archive {}", file, e);
>         failures.add(file);
>       }
>     } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-28836) Parallelize the archival of compacted files

Reply via email to