On Sat, 7 Jun 2025 at 23:00, Alexander Lakhin <exclus...@gmail.com> wrote: > #5 0x00005dd0a1788377 in ExceptionalCondition (conditionName=0x5dd0a1822ee8 > "memcmp(&tmp, cattr, sizeof(CompactAttribute)) == 0", > fileName=0x5dd0a1822ed9 "tupdesc.c", lineNumber=165) at assert.c:66
Thanks for finding that and for the test case. I can recreate this locally. It seems to be down to a race condition with the shared TupleDesc. If I expand that memcmp Assert to check equality on each of the individual fields, I see that it's failing on attcacheoff. This is because a worker does some tuple deforming while the failing Assert backend is calling verify_compact_attribute(). That results in the attcacheoff being changed by the deforming backend and the new attcacheoff becomes visible sometime after the verify_compact_attribute() backend does tmp.attcacheoff = cattr.attcacheoff; and before it does the Assert. Using your script with the attached more_asserting.patch applied, I get: ITERATION 13 NOTICE: tmp.attcacheoff = -1, cattr->attcacheoff = 4 (So, it seems my concerns about this in [1] were warranted...) This whole shared TupleDesc code seems quite dangerous. Unfortunately, this isn't a new problem to PostgreSQL 18. This race condition existed prior to CompactAttribute, it's just we never noticed because there was no code checking for this before. The hazard here is that any code checking a shared TupleDesc's attcacheoff might not get the same value on subsequent reads. Maybe that's not too bad as the compiler would likely only load the value from the struct once and store that in a register. Also, since the offset is an int, store should be written atomically and won't leave readers susceptible to torn reads. If we can live with the race condition for now, then it's possible to fix the failing Assert by making a copy of the TupleDesc's CompactAttribute so we're not Asserting against the shared memory one. I've done that in the attached fix_idea.patch. I do have a half-done patch which changes where attcacheoff is populated so it gets populated when the TupleDesc is made rather than during tuple deformation. That allows the deforming code to be shrunk down and allows us to use a special-case deformer to deform all leading non-nullable fixed-width attributes. That should also fix this issue, but that's v19 material. I propose I just push the fix_idea.patch and leave it at that for v18. Does anyone have any other ideas? David [1] https://postgr.es/m/CAApHDvr8e8-7tL5SUHoB-CDKF162BEMszDumH0W8-%2BxZzrpP0w%40mail.gmail.com
more_asserting.patch
Description: Binary data
fix_idea.patch
Description: Binary data