Re: Prefetch the next tuple's memory during seqscans

sirisha chamarthi Tue, 22 Nov 2022 23:29:47 -0800

On Tue, Nov 22, 2022 at 1:58 PM David Rowley <dgrowle...@gmail.com> wrote:


> On Thu, 3 Nov 2022 at 06:25, Andres Freund <and...@anarazel.de> wrote:
> > Attached is an experimental patch/hack for that. It ended up being more
> > beneficial to make the access ordering more optimal than prefetching the
> tuple
> > contents, but I'm not at all sure that's the be-all-end-all.
>
> Thanks for writing that patch. I've been experimenting with it.
>
> I tried unrolling the loop (patch 0003) as you mentioned in:
>
> + * FIXME: Worth unrolling so that we don't fetch the same cacheline
> + * over and over, due to line items being smaller than a cacheline?
>
> but didn't see any gains from doing that.
>
> I also adjusted your patch a little so that instead of doing:
>
> - OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */
> + OffsetNumber *rs_vistuples;
> + OffsetNumber rs_vistuples_d[MaxHeapTuplesPerPage]; /* their offsets */
>
> to work around the issue of having to populate rs_vistuples_d in
> reverse, I added a new field called rs_startindex to mark where the
> first element in the rs_vistuples array is. The way you wrote it seems
> to require fewer code changes, but per the FIXME comment you left, I
> get the idea you just did it the way you did to make it work enough
> for testing.
>
> I'm quite keen to move forward in committing the 0001 patch to add the
> pg_prefetch_mem macro. What I'm a little undecided about is what the
> best patch is to commit first to make use of the new macro.
>
> I did some tests on the attached set of patches:
>
> alter system set max_parallel_workers_per_gather = 0;
> select pg_reload_conf();
>
> create table t as select a from generate_series(1,10000000)a;
> alter table t set (autovacuum_enabled=false);
>
> $ cat bench.sql
> select * from t where a = 0;
>
> psql -c "select pg_prewarm('t');" postgres
>
> -- Test 1 no frozen tuples in "t"
>
> Master (@9c6ad5eaa):
> $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 383.332 ms
> latency average = 375.747 ms
> latency average = 376.090 ms
>
> Master + 0001 + 0002:
> $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 370.133 ms
> latency average = 370.149 ms
> latency average = 370.157 ms
>
> Master + 0001 + 0005:
> $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 372.662 ms
> latency average = 371.034 ms
> latency average = 372.709 ms
>
> -- Test 2 "select count(*) from t" with all tuples frozen
>
> $ cat bench1.sql
> select count(*) from t;
>
> psql -c "vacuum freeze t;" postgres
> psql -c "select pg_prewarm('t');" postgres
>
> Master (@9c6ad5eaa):
> $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 406.238 ms
> latency average = 407.029 ms
> latency average = 406.962 ms
>
> Master + 0001 + 0005:
> $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 345.470 ms
> latency average = 345.775 ms
> latency average = 345.354 ms
>
> My current thoughts are that it might be best to go with 0005 to start
> with.  I know Melanie is working on making some changes in this area,
> so perhaps it's best to leave 0002 until that work is complete.
>

I ran your test1 exactly like your setup except the row count is 3000000
(with 13275 blocks). Shared_buffers is 128MB and the hardware configuration
details at the bottom of the mail. It appears *Master + 0001 + 0005 *regressed
compared to master slightly .

*Master (@56d0ed3b756b2e3799a7bbc0ac89bc7657ca2c33)*

Before vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 430.287 ms

After Vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 369.046 ms

*Master + 0001 + 0002:*

Before vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 427.983 ms

After Vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 367.185 ms

*Master + 0001 + 0005:*

Before vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 447.045 ms

After Vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 374.484 ms

lscpu output

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          1
On-line CPU(s) list:             0
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           63
Model name:                      Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Stepping:                        2
CPU MHz:                         2397.224
BogoMIPS:                        4794.44
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       32 KiB
L1i cache:                       32 KiB
L2 cache:                        256 KiB
L3 cache:                        30 MiB
NUMA node0 CPU(s):               0
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host
state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted,
no microcode; SMT Host state unknown
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and
__user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled,
RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic
sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx
pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni
                                 pclmulqdq ssse3 fma cx16 pcid sse4_1
sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm
invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt m
                                 d_clear


>
> David
>

Re: Prefetch the next tuple's memory during seqscans

Reply via email to