On Thu, Jan 4, 2024 at 10:22 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> I'm not a fan of adding overhead to such a performance-critical
> thing as spinlocks in order to catch coding errors that are easily
> detectable statically.  IMV the correct usage of spinlocks is that
> they should only be held across *short, straight line* code segments.
> We should be making an effort to ban coding patterns like
> "return with spinlock still held", because they're just too prone
> to errors similar to this one.  Note that trying to take another
> lock is far from the only bad thing that can happen if you're
> not very conservative about what code can execute with a spinlock
> held.

I agree that we don't want to add overhead, and also about how
spinlocks should be used, but I dispute "easily detectable
statically." I mean, if you or I look at some code that uses a
spinlock, we'll know whether the pattern that you mention is being
followed or not, modulo differences of opinion in debatable cases. But
you and I cannot be there to look at all the code all the time. If we
had a static checking tool that was run as part of every build, or in
the buildfarm, or by cfbot, or somewhere else that raised the alarm if
this rule was violated, then we could claim to be effectively
enforcing this rule. But with 20-30 active committers and ~100 active
developers at any given point in time, any system that relies on every
relevant person knowing all the rules and remembering to enforce them
on every commit is bound to be less than 100% effective. Some people
won't know what the rule is, some people will decide that their
particular situation is Very Special, some people will just forget to
check for violations, and some people will check for violations but
miss something.

I think the question we should be asking here is what the purpose of
the PANIC is. I can think of two possible purposes. It could be either
(a) an attempt to prevent real-world harm by turning database hangs
into database panics, so that at least the system will restart and get
moving again instead of sitting there stuck for all eternity or (b) an
attempt to punish people for writing bad code by turning coding rule
violations into panics on production systems. If it's (a), that's
defensible, though we can still ask whether it does more harm than
good. If it's (b), that's not a good way of handling that problem,
because (b1) it affects production builds and not just development
builds, (b2) many coding rule violations are vanishingly unlikely to
trigger that PANIC in practice, and (b3) if the PANIC does fire, it
gives you basically zero help in figuring out where the actual problem
is. The PostgreSQL code base is way too big for "ERROR: you screwed
up" to be an acceptable diagnostic.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to