Hi,

Currently, around $subject, we have a few weaknesses:

1) For small-ish power-of-two allocations, aset.c does not put a sentinel at
   the end of the allocation. Given how common power-of-two allocations are,
   that seems decidedly suboptimal.

2) The sentinel we use is just one byte. This means that basic variable
   alignment rules will often lead to access-beyond-end-of-allocation to be
   further out, preventing us from detecting that.

3) If we do detect memory context issues, we just issue WARNINGs, we do not
   crash. I think there are a few different problems with that approach:

   often there will eventually be a crashes, due to the memory corruption that
   already occurred, but it'll be further away from where we detected the
   problem, rather than failing closer to the issue.

   Another big issue is that this means that in contexts where we do not check
   for WARNINGs, e.g. tap tests or in background processes, such WARNINGs
   won't trigger test failures. Which seems rather bad to me.

   A third issue is that the warnings are useless for detecting problems
   during automated testing like fuzzing. It's very normal for some WARNINGs
   to be issued when e.g. datatype input functions, something like memory
   corruption really needs to result in a crash to be properly fuzzable.

4) We support using address sanitizer, but due to address sanitizer only
   learning about AllocBlock (et al) level allocations, many read accesses
   beyond the end of an allocation will never be detected.

5) Normally address sanitizers quarantines individual allocation "addresses"
   for a while, to make it more likely to catch use-after-free issues.
   Without something like that, many types of issues won't be detected.


I think these aren't actually *that* hard to solve:

1) The reason we don't always use the sentinel for power-of-two allocations is
   that naively doing so would interfere with the size classing for the
   freelist.  But that's relatively easy to address - we can "just" add the
   space for the size classes *after* determining the size class.  IIRC we
   used to have a similar issue with the per-allocation chunk header and
   solved it this way too.

2) We should make make the sentinel size configurable and default to either 8
   or 16 bytes.

3) I think we should make the memory context failures crash.  Perhaps by
   emitting WARNINGs and then crashing if any of them occurred.

   As we trigger memory context checking during commit/abort handling, just
   using PANIC won't always reliably work, E.g. the AllocSetCheck() in
   AllocSetReset(), can trigger recursive PANICs due to the
   MemoryContextReset() in errstart(), if the corruption in in ErrorContext.
   Whether that's a real issue worth worrying about, IDK.

4) For this I prototyped making the valgrind annotations more generic and
   using the address sanitizer interface to mark memory as poisoned /
   unpoisoned.  That doesn't provide quite all the checking that valgrind can
   do (it doesn't track uninitialized memory), but it's considerably better
   than our memory context checking, and *much* *much* faster than valgrind.

5) I think we should add a mode to all our allocators that just turns every
   allocation into a large allocation (perhaps with something slightly
   different for slab). In that mode asan, valgrind, et al would be more
   precise.  Of course that'll have rather deleterious performance effects,
   but I think it'd still be quite useful for debugging.


I have prototype patches for 1-4, but wanted to hear about whether such
changes sound sane to others, before spending time to polish all of this up.

Greetings,

Andres Freund


Reply via email to