On Mon, 15 Apr 2024 at 16:10, Robins Tharakan <thara...@gmail.com> wrote: > - I now have 2 separate runs stuck on pg_sleep() - HEAD / REL_16_STABLE > - I'll keep them (stuck) for this week, in case there's more we can get > from them (and to see how long they take) > - Attached are 'bt full' outputs for both (b.txt - HEAD / a.txt - > REL_16_STABLE)
Thanks for getting those. #4 0x000000000090b7b4 in pg_sleep (fcinfo=<optimized out>) at misc.c:406 delay = <optimized out> delay_ms = <optimized out> endtime = 0 This endtime looks like a problem. It seems unlikely to be caused by gettimeofday's timeval fields being zeroed given that the number of seconds should have been added to that. I can't quite make sense of how we end up sleeping at all with a zero endtime. Assuming the subsequent GetNowFloats() worked, "delay = endtime - GetNowFloat();" would result in a negative sleep duration and we'd break out of the sleep loop. If GetNowFloat() somehow was returning a negative number then we could end up with a large delay. But if gettimeofday() was so badly broken then wouldn't there be some evidence of this in the log timestamps on failing runs? I'm not that familiar with the buildfarm config, but I do see some Valgrind related setting in there. Is PostgreSQL running under Valgrind on these runs? David