I've attached a proposed patch to this email for the bug described here.

# Backstory

We use Tar at Meta to perform incremental nightly backups of development
servers. In the process of migrating our development environments from CentOS 8
(Tar 1.30) to CentOS 9 (Tar 1.34), a small set of our developers reported that
restoring from a backup now took multiple days. We expect that restoring a
development environment from backup to take no more than a few hours, and only a
few minutes in most cases. (Credit to our Developer Support Specialist, Akeem
Seymens!)

Upon investigation, my coworker Rob Schmidt discovered that the developers
reporting these problems had backups that contained source code repositories
with lots of small files and directories inside. When running Tar with `strace`,
we noticed that there were unexpectedly large gaps in time between each `mkdir`
syscall. We generated a flamegraph of Tar's function calls, and discovered a
disproportionate amount of time was being spent in `delay_set_stat` and
`strcmp`.



# The Bug

Looking at the code with this knowledge in hand, it quickly became obvious why:
`delay_set_stat` contains an O(n) scan of the underlying linked list to detect
and avoid inserting duplicate entries. This code was added in
<https://git.savannah.gnu.org/cgit/tar.git/commit/src?id=d70b8b3b3978df2ba204f3afe60b18ded6164b07>,
which was included in the Tar 1.33 release.

Normally, this O(n) scan isn't a problem, because the `delayed_set_stat` data
structure isn't usually very long. This is because when tar finishes extracting
a directory, it will call `apply_nonancestor_delayed_set_stat`, which will
remove the elements from that linked list.

However, because we use an incremental backup, Tar predicts we'll have an
"unusual" member order, so it sets a global `delay_directory_restore_option`
flag. This flag means that Tar no longer knows when we're actually done
processing a directory, and so it only calls
`apply_nonancestor_delayed_set_stat` during `extract_finish`.

This means that `delayed_set_stat` will eventually contain an entry for every
directory in the archive. The total extraction time becomes O(n^2), where `n`
is the number of directories.

This is similar to the bug in delayed link creation that was fixed by
<https://git.savannah.gnu.org/cgit/tar.git/commit/src?id=258d1c44e5ee7c58b28bf0000e9d737df6081885>.



# Test Case

The patch includes an automated test case, but the issue can also be manually
reproduced.

Here's a one-liner for creating a large directory tree with 90300 directories
(n=300 can be adjusted):

```
cd /tmp && mkdir z && cd z && perl -le '$n=300; for $i (1..$n) { for $j (1..$n) 
{ print "$i/$j" } }'|xargs mkdir -p
```

(credit to Jim Meyering for the one-liner code golf)

Tar can then be used to create and then extract this directory:

```
cd /tmp && tar cf z.tar z
mkdir 1 && cd 1 && time /bin/tar xf ../z.tar 
```

Without my patch, this takes about 42 seconds, most of which is spent in
userspace:

```
/bin/tar xf ../z.tar --delay-directory-restore  41.73s user 0.63s system 99% 
cpu 42.594 total
```


With my patch, this takes about 0.3 seconds, most of which is spent performing
syscalls:

```
~/tar/src/tar xf ../z.tar --delay-directory-restore  0.09s user 0.22s system 
99% cpu 0.314 total
```



# The Patch

My patch mitigates this issue by introducing a hash table of the entries in
`delayed_set_stat_head`. Upon insertion, this table can be checked for
duplicates, instead of using a linear scan.

The linked list is kept because insertion order is significant and must still be
tracked.



# Remaining Linear Scans


This only fixes the common case of an O(n) scan during insertion. There are
other places that perform linear scans of the table that are not fixed by this
patch. I do not have plans to fix these, in part because some of the fixes would
be more complicated, but also because they seem harder to trigger, or because I
don't believe they're problematic.

I do not have test-cases for any of these situations.

### `remove_delayed_set_stat`

This is called by `remove_any_file`, which is called in some overwrite modes.
This could be fixed by making `delayed_set_stat` data structure into a doubly
linked list to allow an O(1) removal, following an O(1) lookup in the hash
table.

### `find_direct_ancestor`

This logic is used when handling symlinks. This could cause similar issues to
what we've observed for directories, but for extremely symlink-heavy archives.

This function performs a prefix match between the current `file_name` and
previously inserted directories. This could be rewritten to split the current
`file_name` by slash characters, and then to perform lookups in the hash table
for each parent directory.

### `repair_delayed_set_stat`

This function appears to be used for edge cases where paths with `..` or
symlinks form a cyclic directory reference.

This traverses starting with the most-recently-inserted entry, and returns when
it finds a match. This scan is probably not problematic because we expect the
match to be near the beginning of the insertion list.

### `apply_nonancestor_delayed_set_stat`

This function is not problematic because it removes entries during traversal.

When it is called frequently (when `--delay-directory-restore` is not used), the
linked list will be kept small.

When `--delay-directory-restore` is used, it is only called twice in
`extract_finish`, so the overall runtime is O(n), not O(n^2).

>From 80c5729348d1802e121534ea6e2c4b0e644d1800 Mon Sep 17 00:00:00 2001
From: Benjamin Woodruff <b...@meta.com>
Date: Thu, 27 Jul 2023 10:54:34 -0700
Subject: [PATCH] Fix O(n^2) time bug in --delay-directory-restore

delayed_set_stat avoids inserting duplicate entries into
delayed_set_stat_head. It was doing this by scanning the entire
list.

Normally this list is small, but if --delay-directory-restore is
used (including automatically for incremental archives), this list
grows with the total number of directories in the archive.

The entire scan takes O(n) time. Extracting an archive with `n`
directories could therefore take O(n^2) time.

The included test uses AT_SKIP_LARGE_FILES, allowing it to optionally be
skipped. It may execute slowly on certain filesystems or disks, as it
creates thousands of directories.

There are still potentially problematic O(n) scans in
find_direct_ancestor and remove_delayed_set_stat, which this patch does
not attempt to fix.

* NEWS: Update.
* src/extract.c (delayed_set_stat_table): Create a table for O(1)
lookups of entries in the delayed_set_stat_head list. The list
remains, as tracking insertion order is important.
(dl_hash, dl_compare): New hash table helper functions.
(delay_set_stat): Create the hash table, replace the O(n) list scan
with a hash_lookup, insert new entries into the hash table.
(remove_delayed_set_stat): Also remove entry from hash table.
(apply_nonancestor_delayed_set_stat): Also remove entry from hash
table.
(extract_finish): Free the (empty) hash table.
* tests/extrac26.at: New file.
* tests/Makefile.am (TESTSUITE_AT): Include extrac26.at.
* tests/testsuite.at: Include extrac26.at.
---
 NEWS               |  5 +++++
 src/extract.c      | 39 +++++++++++++++++++++++++++++++++++----
 tests/Makefile.am  |  1 +
 tests/extrac26.at  | 43 +++++++++++++++++++++++++++++++++++++++++++
 tests/testsuite.at |  1 +
 5 files changed, 85 insertions(+), 4 deletions(-)
 create mode 100644 tests/extrac26.at

diff --git a/NEWS b/NEWS
index 5cf09a8a..dfa1e756 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,11 @@ version TBD
 
 * New manual section "Reproducibility", for reproducible tarballs.
 
+* Bug fixes
+
+** Fixed O(n^2) time complexity bug for large numbers of directories when
+   extracting with --delay-directory-restore or reading incremental archives.
+
 
 version 1.35 - Sergey Poznyakoff, 2023-07-18
 
diff --git a/src/extract.c b/src/extract.c
index 314d8bc0..a0263bb5 100644
--- a/src/extract.c
+++ b/src/extract.c
@@ -130,6 +130,9 @@ struct delayed_set_stat
 
 static struct delayed_set_stat *delayed_set_stat_head;
 
+/* Table of delayed stat updates hashed by path; null if none.  */
+static Hash_table *delayed_set_stat_table;
+
 /* A link whose creation we have delayed.  */
 struct delayed_link
   {
@@ -214,6 +217,20 @@ dl_compare (void const *a, void const *b)
   return (da->dev == db->dev) & (da->ino == db->ino);
 }
 
+static size_t
+ds_hash (void const *entry, size_t table_size)
+{
+  struct delayed_set_stat const *ds = entry;
+  return hash_string (ds->file_name, table_size);
+}
+
+static bool
+ds_compare (void const *a, void const *b)
+{
+  struct delayed_set_stat const *dsa = a, *dsb = b;
+  return strcmp (dsa->file_name, dsb->file_name) == 0;
+}
+
 /*  Set up to extract files.  */
 void
 extr_init (void)
@@ -513,11 +530,14 @@ delay_set_stat (char const *file_name, struct 
tar_stat_info const *st,
   size_t file_name_len = strlen (file_name);
   struct delayed_set_stat *data;
 
-  for (data = delayed_set_stat_head; data; data = data->next)
-    if (strcmp (data->file_name, file_name) == 0)
-      break;
+  if (! (delayed_set_stat_table
+        || (delayed_set_stat_table = hash_initialize (0, 0, ds_hash,
+                                                      ds_compare, NULL))))
+    xalloc_die ();
+
+  const struct delayed_set_stat key = { .file_name = (char*) file_name };
 
-  if (data)
+  if ((data = hash_lookup (delayed_set_stat_table, &key)) != NULL)
     {
       if (data->interdir)
        {
@@ -541,6 +561,8 @@ delay_set_stat (char const *file_name, struct tar_stat_info 
const *st,
       delayed_set_stat_head = data;
       data->file_name_len = file_name_len;
       data->file_name = xstrdup (file_name);
+      if (! hash_insert (delayed_set_stat_table, data))
+       xalloc_die ();
       data->after_links = false;
       if (st)
        {
@@ -652,6 +674,7 @@ remove_delayed_set_stat (const char *fname)
       if (chdir_current == data->change_dir
          && strcmp (data->file_name, fname) == 0)
        {
+         hash_remove (delayed_set_stat_table, data);
          free_delayed_set_stat (data);
          if (prev)
            prev->next = next;
@@ -1000,6 +1023,7 @@ apply_nonancestor_delayed_set_stat (char const 
*file_name, bool after_links)
        }
 
       delayed_set_stat_head = data->next;
+      hash_remove (delayed_set_stat_table, data);
       free_delayed_set_stat (data);
     }
 }
@@ -1962,6 +1986,13 @@ extract_finish (void)
   /* Finally, fix the status of directories that are ancestors
      of delayed links.  */
   apply_nonancestor_delayed_set_stat ("", 1);
+
+  /* This table should be empty after apply_nonancestor_delayed_set_stat  */
+  if (delayed_set_stat_table != NULL)
+    {
+      hash_free (delayed_set_stat_table);
+      delayed_set_stat_table = NULL;
+    }
 }
 
 bool
diff --git a/tests/Makefile.am b/tests/Makefile.am
index b696f016..57c9696f 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -133,6 +133,7 @@ TESTSUITE_AT = \
  extrac23.at\
  extrac24.at\
  extrac25.at\
+ extrac26.at\
  filerem01.at\
  filerem02.at\
  grow.at\
diff --git a/tests/extrac26.at b/tests/extrac26.at
new file mode 100644
index 00000000..48fe468b
--- /dev/null
+++ b/tests/extrac26.at
@@ -0,0 +1,43 @@
+# Test suite for GNU tar.                             -*- autotest -*-
+# Copyright 2022-2023 Free Software Foundation, Inc.
+#
+# This file is part of GNU tar.
+#
+# GNU tar is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# GNU tar is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+AT_SETUP([extract a large directory tree with --delay-directory-restore])
+AT_KEYWORDS([extract extrac26])
+
+AT_TAR_CHECK([
+AT_SKIP_LARGE_FILES
+AT_TIMEOUT_PREREQ
+
+echo Creating dirtree
+awk 'BEGIN { for (j = 0; j < 300; j++) for (k = 0; k < 300; k++) print 
"dirtree/" j "/" k }' | \
+  xargs mkdir -p
+
+echo Creating archive
+tar -cf archive.tar dirtree
+
+echo Extracting archive
+mkdir output
+timeout 15 tar -xf archive.tar --delay-directory-restore -C output
+],
+[0],
+[Creating dirtree
+Creating archive
+Extracting archive
+],
+[],[],[],[gnu])
+
+AT_CLEANUP
diff --git a/tests/testsuite.at b/tests/testsuite.at
index 3a98f865..7cfa636f 100644
--- a/tests/testsuite.at
+++ b/tests/testsuite.at
@@ -349,6 +349,7 @@ m4_include([extrac22.at])
 m4_include([extrac23.at])
 m4_include([extrac24.at])
 m4_include([extrac25.at])
+m4_include([extrac26.at])
 
 m4_include([backup01.at])
 
-- 
2.41.0

Reply via email to