On Fri, Jun 21, 2019 at 1:21 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > I ran into someone with a system where big queries scanning 8GB+ of > all-in-cache data took consistently ~2.5x longer on a primary server > than on a replica. Both servers had concurrent activity on them but > plenty of spare capacity and similar specs. After some investigation > it turned out that on the primary there were (1) some select() > syscalls waiting for 1ms, which might indicate contended > SpinLockAcquire() back-offs, and (2) a huge amount of time spent in: > > + 93,31% 0,00% postgres postgres [.] index_getnext > + 93,30% 0,00% postgres postgres [.] index_fetch_heap > + 81,66% 0,01% postgres postgres [.] heap_page_prune_opt > + 75,85% 0,00% postgres postgres [.] TransactionIdLimitedForOldSnapshots > + 75,83% 0,01% postgres postgres [.] RelationHasUnloggedIndex > + 75,79% 0,00% postgres postgres [.] RelationGetIndexList > + 75,79% 75,78% postgres postgres [.] list_copy
On my laptop, all prewarmed, no concurrency, the mere existence of 10 brin indexes causes a sequential scan to take ~5% longer and an uncorrelated index scan to take ~45% longer (correlated index scans don't suffer). Here's a draft patch for v13 that fixes that problem by caching the result of RelationHasUnloggedIndex(). Reproducer scripts also attached. I ran them with shared_buffers=8GB, old_snapshot_threshold=10s and pg_prewarm installed. I didn't try to look into the complaint about suspected spinlock contention. -- Thomas Munro https://enterprisedb.com
0001-Cache-the-result-of-RelationHasUnloggedIndex.patch
Description: Binary data
test-seqscan.sql
Description: Binary data
test-indexscan.sql
Description: Binary data