Solar Designer <[email protected]> writes:
>> === BUG #03 (HIGH) ===
>> Signed integer overflow in PAX v0.0/v0.1 sparse offset+numbytes
>>
>> Files: src/xheader.c:1426 (sparse_numbytes_decoder)
>> src/xheader.c:1480 (sparse_map_decoder)
>> Impact: Signed 64-bit integer overflow in offset+numbytes arithmetic.
>> Undefined behavior that can corrupt internal state.
>>
>> Description:
>>
>> The PAX v1.0 decoder in sparse.c already checks:
>>
>> if (INT_ADD_OVERFLOW (sp.offset, u))
>> ...
>>
>> But the PAX v0.0 and v0.1 decoders in xheader.c do not. A crafted
>> extended header with offset=9223372036854775800 and numbytes=100
>> passes individual range checks but overflows signed off_t when summed.
>>
>> UBSan output:
>>
>> xheader.c:1473: runtime error: signed integer overflow:
>> 9223372036854775800 + 100 cannot be represented in type 'long int'
>>
>> Proposed fix:
>>
>> Add the same INT_ADD_OVERFLOW check after assigning numbytes in both
>> sparse_numbytes_decoder() and sparse_map_decoder(), and return early
>> with an error if the check fires.
Both of the functions they reference check that the value fits in off_t.
A commit from 16 years ago shows the relevant code [1], but even before
that overflows where checked.
>> === BUG #04 (LOW) ===
>> FLEXNSIZEOF overflow in create_placeholder_file
>>
>> File: src/extract.c:1444
>> Impact: Theoretical heap buffer overflow if link_name is near SIZE_MAX.
>>
>> Description:
>>
>> create_placeholder_file() computes:
>>
>> xmalloc (FLEXNSIZEOF (struct delayed_link, target,
>> strlen (current_stat_info.link_name) + 1));
>>
>> If strlen(link_name) is near SIZE_MAX, adding 1 overflows size_t, and
>> FLEXNSIZEOF computes a small allocation size. The subsequent strcpy
>> writes past the buffer. PAX extended headers allow arbitrarily long
>> linkpath values, so the attacker controls the length. In practice
>> this requires the system to have nearly SIZE_MAX bytes of memory
>> available, so it is mostly theoretical.
>>
>> Proposed fix:
>>
>> size_t link_len = strlen (current_stat_info.link_name);
>> if (SIZE_MAX - sizeof (struct delayed_link) <= link_len)
>> xalloc_die ();
I think it is safe to assume that no one has SIZE_MAX bytes of memory.
>> === BUG #09 (MEDIUM) ===
>> strcmp overread past the non-NUL-terminated magic field
>>
>> File: src/list.c:565, 632 (read_header, decode_header)
>> Impact: Buffer overread past the 6-byte magic field boundary.
>> Format misdetection with crafted headers.
>>
>> Description:
>>
>> strcmp (h->magic, TMAGIC) // line 565
>> strcmp (header->header.magic, TMAGIC) // line 632
>>
>> TMAGIC is "ustar\0" (6 bytes including NUL). The magic field in the
>> header struct is exactly 6 bytes (char magic[6]). If the field
>> contains "ustar\x01" (no NUL terminator), strcmp reads past byte 5
>> into the adjacent version[2] and uname[32] fields until it finds a NUL.
>>
>> ASan (with tight heap allocator) reports:
>>
>> ERROR: AddressSanitizer: heap-buffer-overflow
>> READ of size 1 ... in __strcmp_sse42
>>
>> Proposed fix:
>>
>> memcmp (h->magic, TMAGIC, sizeof TMAGIC)
This is just copied from Paul's commit in 2025 [2].
>> === BUG #10 (MEDIUM) ===
>> Incomplete destructor for delayed_link_table hash entries
>>
>> File: src/extract.c:1478 (create_placeholder_file)
>> Impact: Memory leak on hash collision/replacement.
>>
>> Description:
>>
>> hash_initialize (0, 0, dl_hash, dl_compare, free)
>>
>> The destructor callback is just free(), which frees the top-level
>> struct delayed_link but not its sub-allocations: the sources linked
>> list, cntx_name, acls_a_ptr, acls_d_ptr, and xattr_map. When a hash
>> collision causes an old entry to be evicted, these are leaked.
>>
>> Proposed fix:
>>
>> Write a proper destructor:
>>
>> static void
>> free_delayed_link (void *entry)
>> {
>> struct delayed_link *p = entry;
>> struct string_list *s, *next;
>> for (s = p->sources; s; s = next)
>> {
>> next = s->next;
>> free (s);
>> }
>> free (p->cntx_name);
>> free (p->acls_a_ptr);
>> free (p->acls_d_ptr);
>> xattr_map_free (&p->xattr_map);
>> free (p);
>> }
>>
>> And pass free_delayed_link instead of free to hash_initialize.
GNU tar never calls hash_free, and therefore will never call the free
function given to hash_initialize. It is explained in a comment and the
commit message [3]:
if (false)
{
/* There is little point to freeing, as we are about to exit,
and freeing is more likely to cause than cure trouble. */
hash_free (delayed_link_table);
delayed_link_table = NULL;
}
>> === BUG #13 (LOW) ===
>> Integer truncation in decode_record
>>
>> File: src/xheader.c:630 (decode_record)
>> Impact: Implicit narrowing from ptrdiff_t to int; UBSan
>> -fsanitize=implicit-conversion fires.
>>
>> Description:
>>
>> int len_len = len_lim - p; // line 630
>>
>> len_lim - p is ptrdiff_t. If the difference exceeds INT_MAX, the
>> implicit conversion to int wraps to a negative value. The value is
>> only used to format an error message, so the practical impact is
>> limited to a garbled diagnostic, but UBSan rightfully flags it.
>>
>> Proposed fix:
>>
>> int len_len = (int) (len_lim - p < 1000 ? len_lim - p : 1000);
>>
>> This clamps the value to a reasonable range for the error message.
This proposal was just copied from Paul's commit in 2024 [4].
>> === SUMMARY TABLE ===
>>
>> # Severity File Function / Line Short description
>> -- -------- ----------- ---------------------------
>> --------------------------
>> 01 CRITICAL sparse.c:593 sparse_extract_file Archive stream
>> desync -> file injection
>> 14 CRITICAL (same root cause as #01) Full RCE chain
>> demo
>> 02 HIGH sparse.c:803 oldgnu_add_sparse Overlapping
>> sparse regions -> corruption
>> 03 HIGH xheader.c sparse_numbytes/map_decoder Missing
>> INT_ADD_OVERFLOW
>> 04 LOW extract.c:1444 create_placeholder_file FLEXNSIZEOF
>> size_t overflow
>> 05 MEDIUM extract.c:566 delay_set_stat Memory leak on
>> reuse path
>> 06 MEDIUM extract.c:996 apply_nonancestor_delayed.. Shallow xattr_map
>> copy (UAF)
>> 07 MEDIUM extract.c:985 apply_nonancestor_delayed.. Uninitialized
>> tar_stat_info
>> 08 MEDIUM extract.c:1898 apply_delayed_link Uninitialized
>> tar_stat_info
>> 09 MEDIUM list.c:565 read_header / decode_header strcmp past magic
>> field
>> 10 MEDIUM extract.c:1478 create_placeholder_file Incomplete hash
>> destructor
>> 11 LOW extract.c:364 check_time Signed overflow
>> in time diff
>> 12 LOW extract.c:953 apply_nonancestor_delayed.. file_name_len=0
>> underflow
>> 13 LOW xheader.c:630 decode_record ptrdiff_t -> int
>> truncation
I didn't look much at the others since I am not very familiar with tar.
Hopefully Paul can quickly tell if they are bogus or not.
>> === REPRODUCTION ===
>>
>> All bugs were confirmed on tar 1.35 (commit 8e3c8fa17 from
>> https://git.savannah.gnu.org/cgit/tar.git). Bugs #01, #02, and #14
>> are reproducible without sanitizers on any 64-bit Linux system. The
>> remaining bugs require ASan, UBSan, or MSan to observe.
This commit hash does not exist, which does not inspire much confidence:
$ git log 8e3c8fa17
fatal: ambiguous argument '8e3c8fa17': unknown revision or path not in the
working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Collin
[1]
https://git.savannah.gnu.org/cgit/tar.git/commit/?id=a59c819beb4886ee43f16dfd80ec1151fda1abe6
[2]
https://git.savannah.gnu.org/cgit/tar.git/commit/?id=c11084bcc2d7d9976570a12263b81d2488066115
[3]
https://git.savannah.gnu.org/cgit/tar.git/commit/?id=258d1c44e5ee7c58b28bf0000e9d737df6081885
[4]
https://git.savannah.gnu.org/cgit/tar.git/commit/?id=d1e72a536f26188230a147d948b9057714fd0b6b