Hello Thomas, 29.08.2024 01:16, Thomas Munro wrote:
Yeah. That's quite interesting, and must destabilise that simple-minded demo. I'm curious to know exactly what contention is causing that (about 3/4 of a millisecond that I don't see and now I want to know what it's waiting for), but it's a very crude test lacking timer resolution in the earlier messages, and it's an unrelated topic and a distraction. Perhaps it explains why you saw two different behaviours in Q15 with the patch and I didn't, though. Really it shouldn't be so sensitive to such variations, it's obviously a terrible plan, and TPC-DS needs a planner hacker mega-brain applied to it; I'm going to try to nerd-snipe one...
I looked at two perf profiles of such out-of-sync processes and found no extra calls or whatsoever in the slow one, it just has the number of perf samples increased proportionally. It made me suspect CPU frequency scaling... Indeed, with the "performance" governor set and the boost mode disabled, I'm now seeing much more stable numbers (I do this tuning before running performance tests, but I had forgotten about that when I ran that your test, my bad). I'm sorry for the noise and the distraction. Still, now I can confirm your results. Without the patch, two parallel workers gave "Buffers: shared hit=217 / Buffers: shared hit=226" 10 times out of 10. And with the patch, I've got "shared hit=219 / shared hit=224" 3 times, "shared hit=220 / shared hit=223" 4 times, "shared hit=221 / shared hit=222" 3 times of 10. On b7b0f3f27~1, my results are: "shared hit=219 / shared hit=224": 2 "shared hit=220 / shared hit=223": 3 "shared hit=221 / shared hit=222": 4 "shared hit=218 / shared hit=225": 1 Best regards, Alexander