Hi,

While testing pg_stash_advice, I came across a
crash that's reachable through the persistence dump file:

If $PGDATA/pg_stash_advice.tsv contains a stash name of 64 bytes or more
(i.e. >= NAMEDATALEN), the worker hits an assertion in dshash_strhash()
on startup. The postmaster treats the SIGABRT as a crash, performs full
crash recovery, restarts the worker, and the worker hits the same
assertion again. The cluster never reaches a stable state until the
file is removed by hand.

SQL-level callers cannot reach this because pgsa_check_stash_name()
rejects long names. The persistence parser, however, accepts the name
straight from the file and feeds it into dshash_find_or_insert(), which
in turn calls dshash_strhash() configured with key_size = NAMEDATALEN.
That function asserts strlen(v) < size. Strictly speaking the file
parser was missing a length check that the SQL path already performs.

Minimal reproduction
====================

    $ pg_ctl -D $PGDATA stop -m fast
    $ python3 -c "print('stash\t' + 'a'*64)" > $PGDATA/pg_stash_advice.tsv
    $ pg_ctl -D $PGDATA -l logfile start

Server log (excerpt, every ~1-2 seconds):

    DEBUG:  pg_stash_advice worker started
    DEBUG:  restoring advice stash "aaaa...aaaa"
    TRAP: failed Assert("strlen((const char *) v) < size"),
          File: "dshash.c", Line: 634
    postgres: pg_stash_advice worker (ExceptionalCondition+0xbb)
    postgres: pg_stash_advice worker (dshash_strhash+0x48)
    postgres: pg_stash_advice worker (dshash_find_or_insert_extended+0x2e)
    pg_stash_advice.so (pgsa_create_stash)
    pg_stash_advice.so (pgsa_restore_stashes)
    pg_stash_advice.so (pgsa_read_from_disk)
    pg_stash_advice.so (pg_stash_advice_worker_main+0x1aa)
    ...
    LOG:  background worker "pg_stash_advice worker" was terminated by
          signal 6: Aborted
    LOG:  terminating any other active server processes
    LOG:  all server processes terminated; reinitializing
    [worker comes back up, asserts again, ad infinitum]

In my testing this produced ~18 crashes in ~12 seconds before I removed
the file. Threshold is exact: 63 bytes is fine, 64 bytes crashes.

In a production (non-assertion) build the name would instead be silently
truncated to NAMEDATALEN-1 bytes by the dshash key-copy path, allowing
distinct long names that share a 63-byte prefix to collide in the stash
table.

The attached patch validates the parsed stash-name length in the same
place where the parser already raises ERRCODE_DATA_CORRUPTED on other
forms of syntax error, mirroring the existing length check in
pgsa_check_stash_name(). Not sure if TAP test too is needed for this, but
I've included one in the patch.

Regards,
Ayush

Attachment: 0001-pg_stash_advice-reject-overlong-stash-names-when-loa.patch
Description: Binary data

Reply via email to