On 5/18/23 20:45, Tomas Vondra wrote: > ... > > 0001 fixes the issue. 0002 is the original fix, and 0003 is just the > pageinspect changes (for master only). > > For the backbranches, I thought about making the code more like master > (by moving some of the handling from opclasses to brin.c), but decided > not to. It'd be low-risk, but it feels wrong to kinda do what the master > does under "oi_regular_nulls" flag. >
I've now pushed all these patches into relevant branches, after some minor last-minute tweaks, and so far it didn't cause any buildfarm issues. Assuming this fully fixes the NULL-handling for BRIN, this leaves just the deadlock issue discussed in [1]. It seems rather unfortunate all these issues went unnoticed / unreported essentially since BRIN was introduced in 9.5. To some extent it might be explained by fairly low likelihood of actually hitting the issue (just the right timing, concurrency with summarization, NULL values, ...). It took me quite a bit of time and luck to (accidentally) hit these issues while stress testing the code. But there's also the problem of writing tests for this kind of thing. To exercise the interesting parts (e.g. the union_tuples), it's necessary to coordinate the order of concurrent steps - but what's a good generic way to do that (which we could do in TAP tests)? In manual testing it's doable by setting breakpoints on a particular lines, and step through the concurrent processes that way. But that doesn't seem like a particularly great solution for regression tests. I can imagine adding some sort of "probes" into the code and then attaching breakpoints to those, but surely we're not the first project needing this ... regards [1] https://www.postgresql.org/message-id/261e68bc-f5f5-5234-fb2c-af4f58351...@enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company