On Tue, Nov 22, 2022 at 1:58 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Thu, 3 Nov 2022 at 06:25, Andres Freund <and...@anarazel.de> wrote: > > Attached is an experimental patch/hack for that. It ended up being more > > beneficial to make the access ordering more optimal than prefetching the > tuple > > contents, but I'm not at all sure that's the be-all-end-all. > > Thanks for writing that patch. I've been experimenting with it. > > I tried unrolling the loop (patch 0003) as you mentioned in: > > + * FIXME: Worth unrolling so that we don't fetch the same cacheline > + * over and over, due to line items being smaller than a cacheline? > > but didn't see any gains from doing that. > > I also adjusted your patch a little so that instead of doing: > > - OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */ > + OffsetNumber *rs_vistuples; > + OffsetNumber rs_vistuples_d[MaxHeapTuplesPerPage]; /* their offsets */ > > to work around the issue of having to populate rs_vistuples_d in > reverse, I added a new field called rs_startindex to mark where the > first element in the rs_vistuples array is. The way you wrote it seems > to require fewer code changes, but per the FIXME comment you left, I > get the idea you just did it the way you did to make it work enough > for testing. > > I'm quite keen to move forward in committing the 0001 patch to add the > pg_prefetch_mem macro. What I'm a little undecided about is what the > best patch is to commit first to make use of the new macro. > > I did some tests on the attached set of patches: > > alter system set max_parallel_workers_per_gather = 0; > select pg_reload_conf(); > > create table t as select a from generate_series(1,10000000)a; > alter table t set (autovacuum_enabled=false); > > $ cat bench.sql > select * from t where a = 0; > > psql -c "select pg_prewarm('t');" postgres > > -- Test 1 no frozen tuples in "t" > > Master (@9c6ad5eaa): > $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 383.332 ms > latency average = 375.747 ms > latency average = 376.090 ms > > Master + 0001 + 0002: > $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 370.133 ms > latency average = 370.149 ms > latency average = 370.157 ms > > Master + 0001 + 0005: > $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 372.662 ms > latency average = 371.034 ms > latency average = 372.709 ms > > -- Test 2 "select count(*) from t" with all tuples frozen > > $ cat bench1.sql > select count(*) from t; > > psql -c "vacuum freeze t;" postgres > psql -c "select pg_prewarm('t');" postgres > > Master (@9c6ad5eaa): > $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 406.238 ms > latency average = 407.029 ms > latency average = 406.962 ms > > Master + 0001 + 0005: > $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 345.470 ms > latency average = 345.775 ms > latency average = 345.354 ms > > My current thoughts are that it might be best to go with 0005 to start > with. I know Melanie is working on making some changes in this area, > so perhaps it's best to leave 0002 until that work is complete. > I ran your test1 exactly like your setup except the row count is 3000000 (with 13275 blocks). Shared_buffers is 128MB and the hardware configuration details at the bottom of the mail. It appears *Master + 0001 + 0005 *regressed compared to master slightly . *Master (@56d0ed3b756b2e3799a7bbc0ac89bc7657ca2c33)* Before vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 430.287 ms After Vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 369.046 ms *Master + 0001 + 0002:* Before vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 427.983 ms After Vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 367.185 ms *Master + 0001 + 0005:* Before vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 447.045 ms After Vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 374.484 ms lscpu output Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Stepping: 2 CPU MHz: 2397.224 BogoMIPS: 4794.44 Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 32 KiB L1i cache: 32 KiB L2 cache: 256 KiB L3 cache: 30 MiB NUMA node0 CPU(s): 0 Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt m d_clear > > David >