Hi, On 2024-01-04 14:59:06 +0800, Andy Fan wrote: > My question is if someone doesn't obey the rule by mistake (everyone > can make mistake), shall we PANIC on a production environment? IMO I > think it can be a WARNING on a production environment and be a stuck > when 'ifdef USE_ASSERT_CHECKING'. > > [...] > > I notice this issue actually because of the patch "Cache relation > sizes?" from Thomas Munro [1], where the latest patch[2] still have the > following code. > + sr = smgr_alloc_sr(); <-- HERE a spin lock is hold > + > + /* Upgrade to exclusive lock so we can create a mapping. */ > + LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex > operation is needed. it may take a long time. > > Our internal testing system found more misuses on our own PG version.
> I think a experienced engineer like Thomas can make this mistake and the > patch was reviewed by 3 peoples, the bug is still there. It is not easy > to say just don't do it. I don't follow this argument - just ignoring the problem, which emitting a WARNING basically is, doesn't reduce the impact of the bug, it *increases* the impact, because now the system will not recover from the bug without explicit operator intervention. During that time the system might be essentially unresponsive, because all backends end up contending for some spinlock, which makes investigating such issues very hard. I think we should add infrastructure to detect bugs like this during development, but not PANICing when this happens in production seems completely non-viable. Greetings, Andres Freund