Re: strange slow query - lost lot of time somewhere
po 16. 5. 2022 v 6:11 odesílatel David Rowley napsal: > On Fri, 6 May 2022 at 21:27, David Rowley wrote: > > I've attached a patch to fix. I'll look at it in more detail after the > weekend. > > I've now pushed this fix to master and backpatched to 14. > Thank you Pavel > > David >
Re: strange slow query - lost lot of time somewhere
On Fri, 6 May 2022 at 21:27, David Rowley wrote: > I've attached a patch to fix. I'll look at it in more detail after the > weekend. I've now pushed this fix to master and backpatched to 14. David
Re: strange slow query - lost lot of time somewhere
On Tue, 10 May 2022 at 14:22, Justin Pryzby wrote: > On Fri, May 06, 2022 at 09:27:57PM +1200, David Rowley wrote: > > I'm very tempted to change the EXPLAIN output in at least master to > > display the initial and final (maximum) hash table sizes. Wondering if > > anyone would object to that? > > No objection to add it to v15. > > I'll point out that "Cache Mode" was added to EXPLAIN between 11.1 and 11.2 > without controversy, so this could conceivably be backpatched to v14, too. > > commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085 This is seemingly a good point, but I don't really think it's a case of just keeping the EXPLAIN output stable in minor versions, it's more about adding new fields to structs. I just went and wrote the patch and the fundamental difference seems to be that what I did in 6c32c0977 managed to only add a new field in the empty padding between two fields. That resulted in no fields in the struct being pushed up in their address offset. The idea here is not to break any extension that's already been compiled that references some field that comes after that. In the patch I've just written, I've had to add some fields which causes sizeof(MemoizeState) to go up resulting in the offsets of some later fields changing. One thing I'll say about this patch is that I found it annoying that I had to add code to cache_lookup() when we failed to find an entry. That's probably not the end of the world as that's only for cache misses. Ideally, I'd just be looking at the size of the hash table at the end of execution, however, naturally, we must show the EXPLAIN output before we shut down the executor. I just copied the Hash Join output. It looks like: # alter table tt alter column a set (n_distinct=4); ALTER TABLE # analyze tt; # explain (analyze, costs off, timing off) select * from tt inner join t2 on tt.a=t2.a; QUERY PLAN - Nested Loop (actual rows=100 loops=1) -> Seq Scan on tt (actual rows=100 loops=1) -> Memoize (actual rows=1 loops=100) Cache Key: tt.a Cache Mode: logical Hits: 90 Misses: 10 Evictions: 0 Overflows: 0 Memory Usage: 2kB Hash Buckets: 16 (originally 4) -> Index Only Scan using t2_pkey on t2 (actual rows=1 loops=10) Index Cond: (a = tt.a) Heap Fetches: 0 Planning Time: 0.483 ms Execution Time: 862.860 ms (12 rows) Does anyone have any views about the attached patch going into v15? David diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index d2a2479822..c535d63883 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -3177,6 +3177,11 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es) ExplainPropertyInteger("Cache Evictions", NULL, mstate->stats.cache_evictions, es); ExplainPropertyInteger("Cache Overflows", NULL, mstate->stats.cache_overflows, es); ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es); + ExplainPropertyInteger("Maximum Hash Buckets", NULL, + mstate->stats.hashsize_max, es); + ExplainPropertyInteger("Original Hash Buckets", NULL, + mstate->stats.hashsize_orig, es); + } else { @@ -3188,6 +3193,14 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es) mstate->stats.cache_evictions, mstate->stats.cache_overflows, memPeakKb); + ExplainIndentText(es); + if (mstate->stats.hashsize_max != mstate->stats.hashsize_orig) + appendStringInfo(es->str, "Hash Buckets: %u (originally %u)\n", + mstate->stats.hashsize_max, + mstate->stats.hashsize_orig); + else + appendStringInfo(es->str, "Hash Buckets: %u\n", + mstate->stats.hashsize_max); } } @@ -3227,6 +3240,14 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es) si->cache_hits, si->cache_misses, si->cache_evictions, si->cache_overflows, memPeakKb); + ExplainIndentText(es); + if (si->hashsize_max != si->hashsize_
Re: strange slow query - lost lot of time somewhere
On Tue, May 03, 2022 at 02:13:18PM +1200, David Rowley wrote: > I'm wishing I put the initial hash table size and the final hash table > size in EXPLAIN + EXPLAIN ANALYZE now. Perhaps it's not too late for > v15 to do that so that it might help us figure things out in the > future. On Fri, May 06, 2022 at 09:27:57PM +1200, David Rowley wrote: > I'm very tempted to change the EXPLAIN output in at least master to > display the initial and final (maximum) hash table sizes. Wondering if > anyone would object to that? No objection to add it to v15. I'll point out that "Cache Mode" was added to EXPLAIN between 11.1 and 11.2 without controversy, so this could conceivably be backpatched to v14, too. commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085 Author: David Rowley Date: Wed Nov 24 10:07:38 2021 +1300 Allow Memoize to operate in binary comparison mode
Re: strange slow query - lost lot of time somewhere
pá 6. 5. 2022 v 10:05 odesílatel David Rowley napsal: > On Fri, 6 May 2022 at 17:52, Pavel Stehule > wrote: > > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at > nodeMemoize.c:268 > > 268 if (size == 0) > > (gdb) p size > > $1 = 4369066 > > Thanks for the report. I think I now see the problem. Looking at > [1], it seems that's a bushy plan. That's fine, but less common than a > left deep plan. > > I think the problem is down to some incorrect code in > get_memoize_path() where we pass the wrong value of "calls" to > create_memoize_path(). I think instead of outer_path->parent->rows it > instead should be outer_path->rows. > > If you look closely at the plan, you'll see that the outer side of the > inner-most Nested Loop is parameterized by some higher-level nested > loop. > > -> Nested Loop (cost=1.14..79.20 rows=91 width=8) (actual > time=0.024..0.024 rows=1 loops=66) > -> Index Only Scan using > uq_isi_itemid_itemimageid on item_share_image itemsharei2__1 > (cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1 > loops=66) >Index Cond: (item_id = itembo0_.id) >Heap Fetches: 21 > -> Memoize (cost=0.57..2.07 rows=1 width=8) > (actual time=0.013..0.013 rows=1 loops=66) > > so instead of passing 91 to create_memoize_path() as I thought. Since > I can't see any WHERE clause items filtering rows from the > itemsharei2__1 relation, then the outer_path->parent->rows is should > be whatever pg_class.reltuples says. > > Are you able to send the results of: > > explain select item_id from item_share_image group by item_id; -- I'm > interested in the estimated number of groups in the plan's top node. > > > select reltuples from pg_class where oid = 'item_share_image'::regclass; > > I'm expecting the estimated number of rows in the top node of the > group by plan to be about 4369066. > (2022-05-06 12:30:23) prd_aukro=# explain select item_id from item_share_image group by item_id; QUERY PLAN Finalize HashAggregate (cost=1543418.63..1554179.08 rows=1076045 width=8) Group Key: item_id -> Gather (cost=1000.57..1532658.18 rows=4304180 width=8) Workers Planned: 4 -> Group (cost=0.57..1101240.18 rows=1076045 width=8) Group Key: item_id -> Parallel Index Only Scan using ixfk_isi_itemid on item_share_image (cost=0.57..1039823.86 rows=24566530 width=8) (7 řádek) Čas: 1,808 ms (2022-05-06 12:30:26) prd_aukro=# select reltuples from pg_class where oid = 'item_share_image'::regclass; reltuples 9.826612e+07 (1 řádka) Čas: 0,887 ms Regards Pavel > David > > [1] https://explain.depesz.com/s/2rBw#source >
Re: strange slow query - lost lot of time somewhere
On Fri, 6 May 2022 at 20:04, David Rowley wrote: > Thanks for the report. I think I now see the problem. Looking at > [1], it seems that's a bushy plan. That's fine, but less common than a > left deep plan. On second thoughts, it does not need to be a bushy plan for the outer side of the nested loop to be parameterized by some higher-level nested loop. There's an example of a plan like this in the regression tests. regression=# explain (analyze, costs off, summary off) regression-# select * from tenk1 t1 left join regression-# (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) regression-# on t1.hundred = t2.hundred and t1.ten = t3.ten regression-# where t1.unique1 = 1; QUERY PLAN - Nested Loop Left Join (actual time=0.258..0.487 rows=20 loops=1) -> Index Scan using tenk1_unique1 on tenk1 t1 (actual time=0.049..0.049 rows=1 loops=1) Index Cond: (unique1 = 1) -> Nested Loop (actual time=0.204..0.419 rows=20 loops=1) Join Filter: (t1.ten = t3.ten) Rows Removed by Join Filter: 80 -> Bitmap Heap Scan on tenk1 t2 (actual time=0.064..0.194 rows=100 loops=1) Recheck Cond: (t1.hundred = hundred) Heap Blocks: exact=86 -> Bitmap Index Scan on tenk1_hundred (actual time=0.036..0.036 rows=100 loops=1) Index Cond: (hundred = t1.hundred) -> Memoize (actual time=0.001..0.001 rows=1 loops=100) Cache Key: t2.thousand Cache Mode: logical Hits: 90 Misses: 10 Evictions: 0 Overflows: 0 Memory Usage: 4kB -> Index Scan using tenk1_unique2 on tenk1 t3 (actual time=0.009..0.009 rows=1 loops=10) Index Cond: (unique2 = t2.thousand) (17 rows) debugging this I see that the memorize plan won because it was passing 1 as the number of calls. It should have been passing 100. The memorize node's number of loops agrees with that. Fixing the calls to correctly pass 100 gets rid of the Memoize node. I've attached a patch to fix. I'll look at it in more detail after the weekend. I'm very tempted to change the EXPLAIN output in at least master to display the initial and final (maximum) hash table sizes. Wondering if anyone would object to that? David diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 9a8c5165b0..313f379ed8 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -610,7 +610,7 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel, hash_operators, extra->inner_unique, binary_mode, - outer_path->parent->rows); + outer_path->rows); } return NULL; diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index bf1a2db2cf..bd3375f2ba 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -3673,8 +3673,8 @@ select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred and t1.ten = t3.ten where t1.unique1 = 1; - QUERY PLAN --- + QUERY PLAN + Nested Loop Left Join -> Index Scan using tenk1_unique1 on tenk1 t1 Index Cond: (unique1 = 1) @@ -3684,20 +3684,17 @@ where t1.unique1 = 1; Recheck Cond: (t1.hundred = hundred) -> Bitmap Index Scan on tenk1_hundred Index Cond: (hundred = t1.hundred) - -> Memoize - Cache Key: t2.thousand - Cache Mode: logical - -> Index Scan using tenk1_unique2 on tenk1 t3 - Index Cond: (unique2 = t2.thousand) -(14 rows) + -> Index Scan using tenk1_unique2 on tenk1 t3 + Index Cond: (unique2 = t2.thousand) +(11 rows) explain (costs off) select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten where t1.unique1 = 1; - QUERY PLAN --- + QUERY PLAN +-
Re: strange slow query - lost lot of time somewhere
On Fri, 6 May 2022 at 17:52, Pavel Stehule wrote: > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at > nodeMemoize.c:268 > 268 if (size == 0) > (gdb) p size > $1 = 4369066 Thanks for the report. I think I now see the problem. Looking at [1], it seems that's a bushy plan. That's fine, but less common than a left deep plan. I think the problem is down to some incorrect code in get_memoize_path() where we pass the wrong value of "calls" to create_memoize_path(). I think instead of outer_path->parent->rows it instead should be outer_path->rows. If you look closely at the plan, you'll see that the outer side of the inner-most Nested Loop is parameterized by some higher-level nested loop. -> Nested Loop (cost=1.14..79.20 rows=91 width=8) (actual time=0.024..0.024 rows=1 loops=66) -> Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2__1 (cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1 loops=66) Index Cond: (item_id = itembo0_.id) Heap Fetches: 21 -> Memoize (cost=0.57..2.07 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=66) so instead of passing 91 to create_memoize_path() as I thought. Since I can't see any WHERE clause items filtering rows from the itemsharei2__1 relation, then the outer_path->parent->rows is should be whatever pg_class.reltuples says. Are you able to send the results of: explain select item_id from item_share_image group by item_id; -- I'm interested in the estimated number of groups in the plan's top node. select reltuples from pg_class where oid = 'item_share_image'::regclass; I'm expecting the estimated number of rows in the top node of the group by plan to be about 4369066. David [1] https://explain.depesz.com/s/2rBw#source
Re: strange slow query - lost lot of time somewhere
Pavel Stehule writes: > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at > nodeMemoize.c:268 > 268 if (size == 0) > (gdb) p size > $1 = 4369066 Uh-huh regards, tom lane
Re: strange slow query - lost lot of time somewhere
pá 6. 5. 2022 v 1:19 odesílatel David Rowley napsal: > On Thu, 5 May 2022 at 19:26, Pavel Stehule > wrote: > > > > čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak > napsal: > >> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 > >> > (gdb) bt > >> > #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 > >> > #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 > >> > #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 > >> > #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 > >> > #4 0x00932134 in AllocSetAlloc () > >> > #5 0x009376cf in MemoryContextAllocExtended () > >> > #6 0x006ad915 in ExecInitMemoize () > >> > >> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) > rpms / dpkgs(?) so I hope you are basically 1 command away of being able to > debug it further what happens in ExecInitMemoize() > >> Those packages seem to be safe as they modify only /usr/lib/debug so > should not have any impact on production workload. > > > > I just have to wait for admin action - I have no root rights for the > server. > > Looking at ExecInitMemoize() it's hard to see what could take such a > long time other than the build_hash_table(). Tom did mention this, > but I can't quite see how the size given to that function could be > larger than 91 in your case. > > If you get the debug symbols installed, can you use gdb to > > break nodeMemoize.c:268 > p size > > maybe there's something I'm missing following the code and maybe there > is some way that est_entries is not set to what I thought it was. > > It would also be good to see the same perf report again after the > debug symbols are installed in order to resolve those unresolved > function names. > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at nodeMemoize.c:268 268 if (size == 0) (gdb) p size $1 = 4369066 This is work_mem size + 99,92% 0,00% postmaster postgres [.] ServerLoop ▒ + 99,92% 0,00% postmaster postgres [.] PostgresMain ▒ + 99,92% 0,00% postmaster postgres [.] exec_simple_query ▒ + 99,70% 0,00% postmaster postgres [.] PortalRun ▒ + 99,70% 0,00% postmaster postgres [.] FillPortalStore ▒ + 99,70% 0,02% postmaster postgres [.] PortalRunUtility ▒ + 99,68% 0,00% postmaster pg_stat_statements.so [.] 0x7f5579b599c6 ▒ + 99,68% 0,00% postmaster postgres [.] standard_ProcessUtility ▒ + 99,68% 0,00% postmaster postgres [.] ExplainQuery ◆ + 99,63% 0,00% postmaster postgres [.] ExplainOneQuery ▒ + 99,16% 0,00% postmaster postgres [.] ExplainOnePlan ▒ + 99,06% 0,00% postmaster pg_stat_statements.so [.] 0x7f5579b5ad2c ▒ + 99,06% 0,00% postmaster postgres [.] standard_ExecutorStart ▒ + 99,06% 0,00% postmaster postgres [.] InitPlan (inlined) ▒ + 99,06% 0,00% postmaster postgres [.] ExecInitNode ▒ + 99,06% 0,00% postmaster postgres [.] ExecInitNestLoop ▒ + 99,00% 0,02% postmaster postgres [.] ExecInitMemoize ▒ + 98,87%26,80% postmaster libc-2.28.so [.] __memset_avx2_erms ▒ + 98,87% 0,00% postmaster postgres [.] build_hash_table (inlined) ▒ + 98,87% 0,00% postmaster postgres [.] memoize_create (inlined)▒ + 98,87% 0,00% postmaster postgres [.] memoize_allocate (inlined) ▒ + 98,87% 0,00% postmaster postgres [.] MemoryContextAllocExtended ▒ + 72,08%72,08% postmaster [unknown] [k] 0xbaa010e0 0,47% 0,00% postmaster postgres [.] pg_plan_query 0,47% 0,00% postmaster pg_stat_statements.so [.] 0x7f5579b59ba4 0,47% 0,00% postmaster postgres [.] standard_planner 0,47% 0,00% postmaster postgres [.] subquery_planner 0,47% 0,00% postmaster postgres [.] grouping_planner 0,47% 0,00% postmaster postgres [.] query_planner > > David >
Re: strange slow query - lost lot of time somewhere
pá 6. 5. 2022 v 1:28 odesílatel David G. Johnston < david.g.johns...@gmail.com> napsal: > On Mon, May 2, 2022 at 10:02 PM Pavel Stehule > wrote: > >> >> >> út 3. 5. 2022 v 6:57 odesílatel Tom Lane napsal: >> >>> Pavel Stehule writes: >>> > there is really something strange (see attached file). Looks so this >>> issue >>> > is much more related to planning time than execution time >>> >>> You sure there's not something taking an exclusive lock on one of these >>> tables every so often? >>> >> >> I am almost sure, I can see this issue only every time when I set a >> higher work mem. I don't see this issue in other cases. >> >>> >>> > What are the values of work_mem and hash_mem_multiplier for the two cases? > (2022-05-06 07:35:21) prd_aukro=# show work_mem ; ┌──┐ │ work_mem │ ├──┤ │ 400MB│ └──┘ (1 řádka) Čas: 0,331 ms (2022-05-06 07:35:32) prd_aukro=# show hash_mem_multiplier ; ┌─┐ │ hash_mem_multiplier │ ├─┤ │ 1 │ └─┘ (1 řádka) > David J. >
Re: strange slow query - lost lot of time somewhere
On Mon, May 2, 2022 at 10:02 PM Pavel Stehule wrote: > > > út 3. 5. 2022 v 6:57 odesílatel Tom Lane napsal: > >> Pavel Stehule writes: >> > there is really something strange (see attached file). Looks so this >> issue >> > is much more related to planning time than execution time >> >> You sure there's not something taking an exclusive lock on one of these >> tables every so often? >> > > I am almost sure, I can see this issue only every time when I set a higher > work mem. I don't see this issue in other cases. > >> >> What are the values of work_mem and hash_mem_multiplier for the two cases? David J.
Re: strange slow query - lost lot of time somewhere
On Thu, 5 May 2022 at 19:26, Pavel Stehule wrote: > > čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak napsal: >> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 >> > (gdb) bt >> > #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 >> > #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 >> > #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 >> > #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 >> > #4 0x00932134 in AllocSetAlloc () >> > #5 0x009376cf in MemoryContextAllocExtended () >> > #6 0x006ad915 in ExecInitMemoize () >> >> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / >> dpkgs(?) so I hope you are basically 1 command away of being able to debug >> it further what happens in ExecInitMemoize() >> Those packages seem to be safe as they modify only /usr/lib/debug so should >> not have any impact on production workload. > > I just have to wait for admin action - I have no root rights for the server. Looking at ExecInitMemoize() it's hard to see what could take such a long time other than the build_hash_table(). Tom did mention this, but I can't quite see how the size given to that function could be larger than 91 in your case. If you get the debug symbols installed, can you use gdb to break nodeMemoize.c:268 p size maybe there's something I'm missing following the code and maybe there is some way that est_entries is not set to what I thought it was. It would also be good to see the same perf report again after the debug symbols are installed in order to resolve those unresolved function names. David
Re: strange slow query - lost lot of time somewhere
čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak napsal: > Hi Pavel, > > > I have not debug symbols, so I have not more details now > > Breakpoint 1 at 0x7f557f0c16c0 > > (gdb) c > > Continuing. > > > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 > > (gdb) bt > > #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 > > #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 > > #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 > > #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 > > #4 0x00932134 in AllocSetAlloc () > > #5 0x009376cf in MemoryContextAllocExtended () > > #6 0x006ad915 in ExecInitMemoize () > > Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms > / dpkgs(?) so I hope you are basically 1 command away of being able to > debug it further what happens in ExecInitMemoize() > Those packages seem to be safe as they modify only /usr/lib/debug so > should not have any impact on production workload. > I just have to wait for admin action - I have no root rights for the server. > > -J. > > >
RE: strange slow query - lost lot of time somewhere
Hi Pavel, > I have not debug symbols, so I have not more details now > Breakpoint 1 at 0x7f557f0c16c0 > (gdb) c > Continuing. > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 > (gdb) bt > #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 > #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 > #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 > #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 > #4 0x00932134 in AllocSetAlloc () > #5 0x009376cf in MemoryContextAllocExtended () > #6 0x006ad915 in ExecInitMemoize () Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / dpkgs(?) so I hope you are basically 1 command away of being able to debug it further what happens in ExecInitMemoize() Those packages seem to be safe as they modify only /usr/lib/debug so should not have any impact on production workload. -J.
Re: strange slow query - lost lot of time somewhere
st 4. 5. 2022 v 16:08 odesílatel Jakub Wartak napsal: > > Additional three ways to figure that one (all are IMHO production safe): > a) already mentioned perf with --call-graph dwarf -p PID > b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having > MAP_ANONYMOUS flag, size of mmap() request will somehow match work_mem > sizing > c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check > MAP_ANONYMOUS as above > > I have not debug symbols, so I have not more details now Breakpoint 1 at 0x7f557f0c16c0 (gdb) c Continuing. Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 (gdb) bt #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 #4 0x00932134 in AllocSetAlloc () #5 0x009376cf in MemoryContextAllocExtended () #6 0x006ad915 in ExecInitMemoize () #7 0x0068dc02 in ExecInitNode () #8 0x006b37ff in ExecInitNestLoop () #9 0x0068dc56 in ExecInitNode () #10 0x006b37ff in ExecInitNestLoop () #11 0x0068dc56 in ExecInitNode () #12 0x006b37de in ExecInitNestLoop () #13 0x0068dc56 in ExecInitNode () #14 0x006b37de in ExecInitNestLoop () #15 0x0068dc56 in ExecInitNode () #16 0x00687e4d in standard_ExecutorStart () #17 0x7f5579b5ad2d in pgss_ExecutorStart () from /usr/pgsql-14/lib/pg_stat_statements.so #18 0x0062e643 in ExplainOnePlan () #19 0x0062e83d in ExplainOneQuery () #20 0x0062ee6f in ExplainQuery () #21 0x007f9b15 in standard_ProcessUtility () #22 0x7f5579b599c7 in pgss_ProcessUtility () from /usr/pgsql-14/lib/pg_stat_statements.so #23 0x007f7fef in PortalRunUtility () #24 0x007f83af in FillPortalStore () #25 0x007f86dd in PortalRun () #26 0x007f48cb in exec_simple_query () #27 0x007f610e in PostgresMain () #28 0x00776b8a in ServerLoop () #29 0x00777a03 in PostmasterMain () #30 0x004fe413 in main () (gdb) p The history is empty. (gdb) c Continuing. Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 (gdb) bt #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 #4 0x00932134 in AllocSetAlloc () #5 0x009376cf in MemoryContextAllocExtended () #6 0x006ad915 in ExecInitMemoize () #7 0x0068dc02 in ExecInitNode () #8 0x006b37ff in ExecInitNestLoop () #9 0x0068dc56 in ExecInitNode () #10 0x006b37ff in ExecInitNestLoop () #11 0x0068dc56 in ExecInitNode () #12 0x006b37de in ExecInitNestLoop () #13 0x0068dc56 in ExecInitNode () #14 0x00687e4d in standard_ExecutorStart () #15 0x7f5579b5ad2d in pgss_ExecutorStart () from /usr/pgsql-14/lib/pg_stat_statements.so #16 0x0062e643 in ExplainOnePlan () #17 0x0062e83d in ExplainOneQuery () #18 0x0062ee6f in ExplainQuery () #19 0x007f9b15 in standard_ProcessUtility () #20 0x7f5579b599c7 in pgss_ProcessUtility () from /usr/pgsql-14/lib/pg_stat_statements.so #21 0x007f7fef in PortalRunUtility () #22 0x007f83af in FillPortalStore () #23 0x007f86dd in PortalRun () #24 0x007f48cb in exec_simple_query () #25 0x007f610e in PostgresMain () #26 0x00776b8a in ServerLoop () #27 0x00777a03 in PostmasterMain () #28 0x004fe413 in main () (gdb) c Continuing. there was 2 hits of mmap Regards Pavel > [1] - > https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com > > -J. >
Re: strange slow query - lost lot of time somewhere
st 4. 5. 2022 v 2:15 odesílatel David Rowley napsal: > On Tue, 3 May 2022 at 17:02, Pavel Stehule > wrote: > > út 3. 5. 2022 v 6:57 odesílatel Tom Lane napsal: > >> You sure there's not something taking an exclusive lock on one of these > >> tables every so often? > > > > I am almost sure, I can see this issue only every time when I set a > higher work mem. I don't see this issue in other cases. > > hmm, I don't think the query being blocked on a table lock would cause > this behaviour. As far as I know, all table locks should be obtained > before the timer starts for the "Execution Time" timer in EXPLAIN > ANALYZE. However, locks are obtained on indexes at executor startup, > so if there was some delay in obtaining a lock on the index it would > show up this way. I just don't know of anything that obtains a > conflicting lock on an index without the same conflicting lock on the > table that the index belongs to. > > I do agree that the perf report does indicate that the extra time is > taken due to some large amount of memory being allocated. I just can't > quite see how that would happen in Memoize given that > estimate_num_groups() clamps the distinct estimate as the number of > input rows, which is 91 in both cases in your problem query. > > Are you able to run the Memoize query in psql with \watch 0.1 for a > few seconds while you do: > > perf record --call-graph dwarf --pid sleep 2 > > then send along the perf report. > > I locally hacked build_hash_table() in nodeMemoize.c to make the > hashtable 100 million elements and I see my perf report for a trivial > Memoize query come up as: > > + 99.98% 0.00% postgres postgres [.] _start > + 99.98% 0.00% postgres libc.so.6 [.] > __libc_start_main_alias_2 (inlined) > + 99.98% 0.00% postgres libc.so.6 [.] > __libc_start_call_main > + 99.98% 0.00% postgres postgres [.] main > + 99.98% 0.00% postgres postgres [.] PostmasterMain > + 99.98% 0.00% postgres postgres [.] ServerLoop > + 99.98% 0.00% postgres postgres [.] BackendStartup > (inlined) > + 99.98% 0.00% postgres postgres [.] BackendRun (inlined) > + 99.98% 0.00% postgres postgres [.] PostgresMain > + 99.98% 0.00% postgres postgres [.] exec_simple_query > + 99.98% 0.00% postgres postgres [.] PortalRun > + 99.98% 0.00% postgres postgres [.] FillPortalStore > + 99.98% 0.00% postgres postgres [.] PortalRunUtility > + 99.98% 0.00% postgres postgres [.] > standard_ProcessUtility > + 99.98% 0.00% postgres postgres [.] ExplainQuery > + 99.98% 0.00% postgres postgres [.] ExplainOneQuery > + 99.95% 0.00% postgres postgres [.] ExplainOnePlan > + 87.87% 0.00% postgres postgres [.] > standard_ExecutorStart > + 87.87% 0.00% postgres postgres [.] InitPlan (inlined) > + 87.87% 0.00% postgres postgres [.] ExecInitNode > + 87.87% 0.00% postgres postgres [.] ExecInitNestLoop > + 87.87% 0.00% postgres postgres [.] ExecInitMemoize > + 87.87% 0.00% postgres postgres [.] > build_hash_table (inlined) < > + 87.87% 0.00% postgres postgres [.] memoize_create > (inlined) > + 87.87% 0.00% postgres postgres [.] > memoize_allocate (inlined) > + 87.87% 0.00% postgres postgres [.] > MemoryContextAllocExtended > + 87.87% 0.00% postgres postgres [.] memset (inlined) > > Failing that, are you able to pg_dump these tables and load them into > a PostgreSQL instance that you can play around with and patch? > Provided you can actually recreate the problem on that instance. > + 71,98%14,36% postmaster [kernel.kallsyms] [k] page_fault ▒ + 70,19% 6,59% postmaster libc-2.28.so [.] __memset_avx2_erms ▒ + 68,20% 0,00% postmaster postgres [.] ExecInitNode ▒ + 68,20% 0,00% postmaster postgres [.] ExecInitNestLoop▒ + 68,13% 0,00% postmaster postgres [.] ExecInitMemoize ▒ + 68,13% 0,00% postmaster postgres [.] MemoryContextAllocExtended ▒ + 63,20% 0,00% postmaster postgres [.] 0x00776b89 ▒ + 63,20% 0,00% postmaster postgres [.] PostgresMain ◆ + 63,03% 0,00% postmaster
RE: strange slow query - lost lot of time somewhere
> I do agree that the perf report does indicate that the extra time is taken > due to > some large amount of memory being allocated. I just can't quite see how that > would happen in Memoize given that > estimate_num_groups() clamps the distinct estimate as the number of input > rows, which is 91 in both cases in your problem query. > > Are you able to run the Memoize query in psql with \watch 0.1 for a few > seconds > while you do: > > perf record --call-graph dwarf --pid sleep 2 > > then send along the perf report. > > I locally hacked build_hash_table() in nodeMemoize.c to make the hashtable 100 > million elements and I see my perf report for a trivial Memoize query come up > as: > [..] > > Failing that, are you able to pg_dump these tables and load them into a > PostgreSQL instance that you can play around with and patch? > Provided you can actually recreate the problem on that instance. > +1 to what David says, we need a reproducer. In [1] Pavel wrote that he's having a lot of clear_page_erms(), so maybe this will be a little help: I recall having similar issue having a lot of minor page faults and high %sys when raising work_mem. For me it was different issue some time ago, but it was something like build_hash_table() being used by UNION recursive calls -> BuildTupleHashTable() -> .. malloc() -> mmap64(). When mmap() is issued with MAP_ANONYMOUS the kernel will zero out the memory (more memory -> potentially bigger CPU waste visible as minor page faults; erms stands for "Enhanced REP MOVSB/STOSB"; this is on kernel side). The culprit was planner allocating something that wouldn't be used later. Additional three ways to figure that one (all are IMHO production safe): a) already mentioned perf with --call-graph dwarf -p PID b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having MAP_ANONYMOUS flag, size of mmap() request will somehow match work_mem sizing c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check MAP_ANONYMOUS as above [1] - https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com -J.
Re: strange slow query - lost lot of time somewhere
On Tue, 3 May 2022 at 17:02, Pavel Stehule wrote: > út 3. 5. 2022 v 6:57 odesílatel Tom Lane napsal: >> You sure there's not something taking an exclusive lock on one of these >> tables every so often? > > I am almost sure, I can see this issue only every time when I set a higher > work mem. I don't see this issue in other cases. hmm, I don't think the query being blocked on a table lock would cause this behaviour. As far as I know, all table locks should be obtained before the timer starts for the "Execution Time" timer in EXPLAIN ANALYZE. However, locks are obtained on indexes at executor startup, so if there was some delay in obtaining a lock on the index it would show up this way. I just don't know of anything that obtains a conflicting lock on an index without the same conflicting lock on the table that the index belongs to. I do agree that the perf report does indicate that the extra time is taken due to some large amount of memory being allocated. I just can't quite see how that would happen in Memoize given that estimate_num_groups() clamps the distinct estimate as the number of input rows, which is 91 in both cases in your problem query. Are you able to run the Memoize query in psql with \watch 0.1 for a few seconds while you do: perf record --call-graph dwarf --pid sleep 2 then send along the perf report. I locally hacked build_hash_table() in nodeMemoize.c to make the hashtable 100 million elements and I see my perf report for a trivial Memoize query come up as: + 99.98% 0.00% postgres postgres [.] _start + 99.98% 0.00% postgres libc.so.6 [.] __libc_start_main_alias_2 (inlined) + 99.98% 0.00% postgres libc.so.6 [.] __libc_start_call_main + 99.98% 0.00% postgres postgres [.] main + 99.98% 0.00% postgres postgres [.] PostmasterMain + 99.98% 0.00% postgres postgres [.] ServerLoop + 99.98% 0.00% postgres postgres [.] BackendStartup (inlined) + 99.98% 0.00% postgres postgres [.] BackendRun (inlined) + 99.98% 0.00% postgres postgres [.] PostgresMain + 99.98% 0.00% postgres postgres [.] exec_simple_query + 99.98% 0.00% postgres postgres [.] PortalRun + 99.98% 0.00% postgres postgres [.] FillPortalStore + 99.98% 0.00% postgres postgres [.] PortalRunUtility + 99.98% 0.00% postgres postgres [.] standard_ProcessUtility + 99.98% 0.00% postgres postgres [.] ExplainQuery + 99.98% 0.00% postgres postgres [.] ExplainOneQuery + 99.95% 0.00% postgres postgres [.] ExplainOnePlan + 87.87% 0.00% postgres postgres [.] standard_ExecutorStart + 87.87% 0.00% postgres postgres [.] InitPlan (inlined) + 87.87% 0.00% postgres postgres [.] ExecInitNode + 87.87% 0.00% postgres postgres [.] ExecInitNestLoop + 87.87% 0.00% postgres postgres [.] ExecInitMemoize + 87.87% 0.00% postgres postgres [.] build_hash_table (inlined) < + 87.87% 0.00% postgres postgres [.] memoize_create (inlined) + 87.87% 0.00% postgres postgres [.] memoize_allocate (inlined) + 87.87% 0.00% postgres postgres [.] MemoryContextAllocExtended + 87.87% 0.00% postgres postgres [.] memset (inlined) Failing that, are you able to pg_dump these tables and load them into a PostgreSQL instance that you can play around with and patch? Provided you can actually recreate the problem on that instance. David
Re: strange slow query - lost lot of time somewhere
út 3. 5. 2022 v 6:57 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > there is really something strange (see attached file). Looks so this > issue > > is much more related to planning time than execution time > > You sure there's not something taking an exclusive lock on one of these > tables every so often? > I am almost sure, I can see this issue only every time when I set a higher work mem. I don't see this issue in other cases. Regards Pavel > > regards, tom lane >
Re: strange slow query - lost lot of time somewhere
Pavel Stehule writes: > there is really something strange (see attached file). Looks so this issue > is much more related to planning time than execution time You sure there's not something taking an exclusive lock on one of these tables every so often? regards, tom lane
Re: strange slow query - lost lot of time somewhere
út 3. 5. 2022 v 6:09 odesílatel Pavel Stehule napsal: > > > po 2. 5. 2022 v 23:48 odesílatel David Rowley > napsal: > >> On Mon, 2 May 2022 at 21:00, Pavel Stehule >> wrote: >> > I found a query that is significantly slower with more memory >> >> Can you clarify what you mean here? More memory was installed on the >> machine? or work_mem was increased? or? >> >> > plan 1 - fast https://explain.depesz.com/s/XM1f >> > >> > plan 2 - slow https://explain.depesz.com/s/2rBw >> >> If it was work_mem you increased, it seems strange that the plan would >> switch over to using a Nested Loop / Memoize plan. Only 91 rows are >> estimated on the outer side of the join. It's hard to imagine that >> work_mem was so low that the Memoize costing code thought there would >> ever be cache evictions. >> >> > Strange - the time of last row is +/- same, but execution time is 10x >> worse >> > >> > It looks like slow environment cleaning >> >> Can you also show EXPLAIN for the Memoize plan without ANALYZE? >> > > yes - it is strange - it is just slow without execution > > > > ┌──┐ > │ > QUERY PLAN > │ > > ├──┤ > │ Nested Loop Anti Join (cost=59.62..3168.15 rows=1 width=16) > > │ > │ -> Nested Loop Anti Join (cost=59.34..3165.24 rows=1 width=16) > > │ > │ -> Nested Loop Semi Join (cost=58.48..3133.09 rows=1 width=16) > > │ > │ -> Bitmap Heap Scan on item itembo0_ > (cost=57.34..2708.22 rows=11 width=16) > > │ > │ Recheck Cond: ((ending_time IS NULL) OR ((status_id > = ANY ('{1,4,5}'::bigint[])) AND (CURRENT_TIMESTAMP < ending_time) AND > (starting_time <= CURRENT_TIMESTAMP)))│ > │ Filter: ((to_expose_date IS NULL) AND (status_id = > ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP) AND > ((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time))) │ > │ -> BitmapOr (cost=57.34..57.34 rows=1751 width=0) > > │ > │ -> Bitmap Index Scan on stehule354 > (cost=0.00..2.08 rows=1 width=0) >│ > │ Index Cond: (ending_time IS NULL) > > │ > │ -> Bitmap Index Scan on stehule1010 > (cost=0.00..55.26 rows=1751 width=0) > │ > │ Index Cond: ((status_id = ANY > ('{1,4,5}'::bigint[])) AND (ending_time > CURRENT_TIMESTAMP) AND > (starting_time <= CURRENT_TIMESTAMP)) > │ > │ -> Nested Loop (cost=1.14..37.71 rows=91 width=8) > > │ > │ -> Index Only Scan using uq_isi_itemid_itemimageid > on item_share_image itemsharei2__1 (cost=0.57..3.80 rows=91 width=16) > │ > │ Index Cond: (item_id = itembo0_.id) > > │ > │ -> Memoize (cost=0.57..2.09 rows=1 width=8) > > │ > │ Cache Key: itemsharei2__1.item_image_id > > │ > │ Cache Mode: logical > > │ > │ -> Index Only Scan using pk_item_image on > item_image itemimageb3__1 (cost=0.56..2.08 rows=1 width=8) > │ > │ Index Cond: (id = > itemsharei2__1.item_image_id) > >│ > │ -> Nested Loop (cost=0.85..32.14 rows=1 width=8) > > │ > │ -> Index Only Scan using uq_isi_itemid_itemimageid on > item_share_image itemsharei2_ (cost=0.57..3.80 rows=91 width=16) > │ > │ Index Cond: (item_id = itembo0_.id) > > │ > │ -> Memoize (cost=0.29..1.72 rows=1 width=8
Re: strange slow query - lost lot of time somewhere
po 2. 5. 2022 v 23:48 odesílatel David Rowley napsal: > On Mon, 2 May 2022 at 21:00, Pavel Stehule > wrote: > > I found a query that is significantly slower with more memory > > Can you clarify what you mean here? More memory was installed on the > machine? or work_mem was increased? or? > > > plan 1 - fast https://explain.depesz.com/s/XM1f > > > > plan 2 - slow https://explain.depesz.com/s/2rBw > > If it was work_mem you increased, it seems strange that the plan would > switch over to using a Nested Loop / Memoize plan. Only 91 rows are > estimated on the outer side of the join. It's hard to imagine that > work_mem was so low that the Memoize costing code thought there would > ever be cache evictions. > > > Strange - the time of last row is +/- same, but execution time is 10x > worse > > > > It looks like slow environment cleaning > > Can you also show EXPLAIN for the Memoize plan without ANALYZE? > yes - it is strange - it is just slow without execution ┌──┐ │ QUERY PLAN │ ├──┤ │ Nested Loop Anti Join (cost=59.62..3168.15 rows=1 width=16) │ │ -> Nested Loop Anti Join (cost=59.34..3165.24 rows=1 width=16) │ │ -> Nested Loop Semi Join (cost=58.48..3133.09 rows=1 width=16) │ │ -> Bitmap Heap Scan on item itembo0_ (cost=57.34..2708.22 rows=11 width=16) │ │ Recheck Cond: ((ending_time IS NULL) OR ((status_id = ANY ('{1,4,5}'::bigint[])) AND (CURRENT_TIMESTAMP < ending_time) AND (starting_time <= CURRENT_TIMESTAMP)))│ │ Filter: ((to_expose_date IS NULL) AND (status_id = ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP) AND ((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time))) │ │ -> BitmapOr (cost=57.34..57.34 rows=1751 width=0) │ │ -> Bitmap Index Scan on stehule354 (cost=0.00..2.08 rows=1 width=0) │ │ Index Cond: (ending_time IS NULL) │ │ -> Bitmap Index Scan on stehule1010 (cost=0.00..55.26 rows=1751 width=0) │ │ Index Cond: ((status_id = ANY ('{1,4,5}'::bigint[])) AND (ending_time > CURRENT_TIMESTAMP) AND (starting_time <= CURRENT_TIMESTAMP)) │ │ -> Nested Loop (cost=1.14..37.71 rows=91 width=8) │ │ -> Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2__1 (cost=0.57..3.80 rows=91 width=16) │ │ Index Cond: (item_id = itembo0_.id) │ │ -> Memoize (cost=0.57..2.09 rows=1 width=8) │ │ Cache Key: itemsharei2__1.item_image_id │ │ Cache Mode: logical │ │ -> Index Only Scan using pk_item_image on item_image itemimageb3__1 (cost=0.56..2.08 rows=1 width=8) │ │ Index Cond: (id = itemsharei2__1.item_image_id) │ │ -> Nested Loop (cost=0.85..32.14 rows=1 width=8) │ │ -> Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2_ (cost=0.57..3.80 rows=91 width=16) │ │ Index Cond: (item_id = itembo0_.id) │ │ -> Memoize (cost=0.29..1.72 rows=1 width=8) │ │ Cache Key: itemsharei2_.item_image_id │ │ Ca
Re: strange slow query - lost lot of time somewhere
On Tue, 3 May 2022 at 15:22, David G. Johnston wrote: > Plugging in: > n = 2,000 > e = 500 > c = 10,000 > > proper = 5% > incorrect = 25% > > But of the 10,000 calls we will receive, the first 2,000 will be misses while > 2,000 of the remaining 8,000 will be hits, due to sharing 2,000 distinct > groups among the available inventory of 500 (25% of 8,000 is 2,000). 2,000 > hits in 10,000 calls yields 20%. > > I believe the correct formula to be: > > ((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio > .80 * .25 = .20 I think you're correct here. The formula should be that. However, two things; 1) this being incorrect is not the cause of the original problem reported on this thread, and 2) There's just no way we could consider fixing this in v15, let alone back patch it to v14. Maybe we should open a new thread about this and put an entry in the first CF for v16 under bugs and come back to it after we branch. Thinking the cache hit ratio is lower than it actually is going to be will reduce the chances of the planner switching to a Nested Loop / Memoize plan vs a Hash or Merge Join plan. I was already fairly concerned that Memoize could cause performance regressions when the ndistinct value or expected cache entry size is underestimated or the outer side rows are overestimated. What I've got to calculate the cache hit ratio does seem incorrect given what you're showing, however it does add an element of pessimism and reduces the chances of a bad plan being picked when work_mem is too low to cache all entries. Swapping it out for your formula seems like it would increase the chances of a Memoize plan being picked when the row, ndistinct and cache entry size estimates are correct, however, it could also increase the chance of a bad plan when being picked in cases where the estimates are incorrect. My problem with changing this now would be that we already often perform Nested Loop joins when a Hash or Merge join would be a better option. I'd hate to take us in a direction where we make that problem even worse. David
Re: strange slow query - lost lot of time somewhere
On Mon, May 2, 2022 at 7:30 PM David Rowley wrote: > On Tue, 3 May 2022 at 13:43, David G. Johnston > wrote: > > hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to > 0.0 > > I don't understand the adjustment factor ndistinct/calls > > I've attached a spreadsheet showing you the impact of subtracting > (ndistinct / calls). What this is correcting for is the fact that the > first scan from each unique value is a cache miss. The more calls we > have, the more hits we'll get. If there was only 1 call per distinct > value then there'd never be any hits. Without subtracting (ndistinct / > calls) and assuming there's space in the cache for each ndistinct > value, we'd assume 100% cache hit ratio if calls == ndistinct. What > we should assume in that case is a 0% hit ratio as the first scan for > each distinct parameter must always be a miss as we've never had a > chance to cache any tuples for it yet. > > Thank you. I understand the theory and agree with it - but the math doesn't seem to be working out. Plugging in: n = 2,000 e = 500 c = 10,000 proper = 5% incorrect = 25% But of the 10,000 calls we will receive, the first 2,000 will be misses while 2,000 of the remaining 8,000 will be hits, due to sharing 2,000 distinct groups among the available inventory of 500 (25% of 8,000 is 2,000). 2,000 hits in 10,000 calls yields 20%. I believe the correct formula to be: ((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio .80 * .25 = .20 First we recognize that our hit ratio can be at most c-n/c since n misses are guaranteed. We take that ratio and scale it by our cache efficiency since of the remaining hits that fraction will turn into misses due to the relevant cache not being in memory. David J.
Re: strange slow query - lost lot of time somewhere
On Mon, May 2, 2022 at 7:13 PM David Rowley wrote: > On Tue, 3 May 2022 at 11:02, Tom Lane wrote: > > > > David Rowley writes: > > > On Mon, 2 May 2022 at 21:00, Pavel Stehule > wrote: > > >> I found a query that is significantly slower with more memory > > > > > If it was work_mem you increased, it seems strange that the plan would > > > switch over to using a Nested Loop / Memoize plan. > > > > Yeah, there's something unexplained there. > > > > I think that the most probable explanation for the symptoms is that > > cost_memoize_rescan is computing some insane value for est_entries, > > causing ExecInitMemoize to allocate-and-zero a huge hash table, > > which ExecEndMemoize then frees again. Neither of those steps > > gets counted into any plan node's runtime, but EXPLAIN's total > > execution time will include them. An insane value for est_entries > > could perhaps go along with a cost misestimate that convinces the > > planner to include the memoize even though it seems pointless. > > That seems pretty unlikely to me. est_entries is based on the minimum > value of the expected number of total cache entries and the ndistinct > value. ndistinct cannot be insane here as ndistinct is never going to > be higher than the number of calls, which is the row estimate from the > outer side of the join. That's 91 in both cases here. As far as I > can see, that's just going to make a table of 128 buckets. > If est_entries goes to zero due to hash_mem_bytes/est_entry_bytes < 1 (hence floor takes it to zero) the executor will use a size value of 1024 instead in build_hash_table. That seems unlikely but there is no data to support or refute it. > I'm open to making improvements to the comments in that area. I do > remember spending quite a bit of time trying to make things as clear > as possible as it is fairly complex what's going on there. > > A few more intermediate calculation variables, along with descriptions, would help. e.g., min(est_cache_entries, ndistinct) is repeated twice after its initial definition. retention_ratio per my other reply The (ndistinct/calls) part of hit_ratio being described specifically. David J.
Re: strange slow query - lost lot of time somewhere
On Tue, 3 May 2022 at 13:43, David G. Johnston wrote: > hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0 > I don't understand the adjustment factor ndistinct/calls I've attached a spreadsheet showing you the impact of subtracting (ndistinct / calls). What this is correcting for is the fact that the first scan from each unique value is a cache miss. The more calls we have, the more hits we'll get. If there was only 1 call per distinct value then there'd never be any hits. Without subtracting (ndistinct / calls) and assuming there's space in the cache for each ndistinct value, we'd assume 100% cache hit ratio if calls == ndistinct. What we should assume in that case is a 0% hit ratio as the first scan for each distinct parameter must always be a miss as we've never had a chance to cache any tuples for it yet. > This is a "rescan" so aside from cache management isn't the cost of > originally populating the cache already accounted for elsewhere? The cost of the first scan is calculated in create_memoize_path(). Since the first scan will always be a cache miss, the code there just adds some cache management surcharges. Namely: /* * Add a small additional charge for caching the first entry. All the * harder calculations for rescans are performed in cost_memoize_rescan(). */ pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost; pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost; David memoize_cache_hits.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: strange slow query - lost lot of time somewhere
On Tue, 3 May 2022 at 11:02, Tom Lane wrote: > > David Rowley writes: > > On Mon, 2 May 2022 at 21:00, Pavel Stehule wrote: > >> I found a query that is significantly slower with more memory > > > If it was work_mem you increased, it seems strange that the plan would > > switch over to using a Nested Loop / Memoize plan. > > Yeah, there's something unexplained there. > > I think that the most probable explanation for the symptoms is that > cost_memoize_rescan is computing some insane value for est_entries, > causing ExecInitMemoize to allocate-and-zero a huge hash table, > which ExecEndMemoize then frees again. Neither of those steps > gets counted into any plan node's runtime, but EXPLAIN's total > execution time will include them. An insane value for est_entries > could perhaps go along with a cost misestimate that convinces the > planner to include the memoize even though it seems pointless. That seems pretty unlikely to me. est_entries is based on the minimum value of the expected number of total cache entries and the ndistinct value. ndistinct cannot be insane here as ndistinct is never going to be higher than the number of calls, which is the row estimate from the outer side of the join. That's 91 in both cases here. As far as I can see, that's just going to make a table of 128 buckets. See estimate_num_groups_incremental() at: /* * We don't ever want to return an estimate of zero groups, as that tends * to lead to division-by-zero and other unpleasantness. The input_rows * estimate is usually already at least 1, but clamp it just in case it * isn't. */ input_rows = clamp_row_est(input_rows); > I spent some time studying cost_memoize_rescan, and the only > conclusions I arrived at were that the variable names are poorly > chosen and the comments are unhelpful. For instance, one would > think that est_entry_bytes is the expected size of one cache entry, > but it seems to actually be the total space that would be occupied > if the whole input relation were loaded into the cache. I think you've misunderstood. It *is* the estimated size of a single entry. I think you might be going wrong in assuming "tuples" is the expected tuples from all rescans of the inner side of the join. It's actually from a single scan. I can add a comment there to help make that clear. > And > the est_cache_entries computation seems nonsensical; if it does > make sense, the comment sure doesn't illuminate why. So I am > quite prepared to buy into the idea that cost_memoize_rescan is > producing bogus answers, but it's hard to tell what it's coming out > with in this example. Too bad EXPLAIN doesn't print est_entries. I'm wishing I put the initial hash table size and the final hash table size in EXPLAIN + EXPLAIN ANALYZE now. Perhaps it's not too late for v15 to do that so that it might help us figure things out in the future. I'm open to making improvements to the comments in that area. I do remember spending quite a bit of time trying to make things as clear as possible as it is fairly complex what's going on there. David
Re: strange slow query - lost lot of time somewhere
On Mon, May 2, 2022 at 4:02 PM Tom Lane wrote: > David Rowley writes: > > On Mon, 2 May 2022 at 21:00, Pavel Stehule > wrote: > >> I found a query that is significantly slower with more memory > > > If it was work_mem you increased, it seems strange that the plan would > > switch over to using a Nested Loop / Memoize plan. > > Yeah, there's something unexplained there. > > I spent some time studying cost_memoize_rescan, and the only > conclusions I arrived at were that the variable names are poorly > chosen and the comments are unhelpful. For instance, one would > think that est_entry_bytes is the expected size of one cache entry, > but it seems to actually be the total space that would be occupied > if the whole input relation were loaded into the cache. And > the est_cache_entries computation seems nonsensical; if it does > make sense, the comment sure doesn't illuminate why. My take on this is that a cache entry is keyed by a parameterization and any given entry can have, at most, every tuple saved into it (hence the computation of tuples*per-tuple-size). So the maximum number of hash keys is the total available memory divided by input relation size. This upper bound is stored in est_cache_entries. If the number of unique parameterizations expected (at worst one-per-call) is less than this we use that value and never evict. If it is more we use the est_cache_entries and plan to evict. What I'm expecting to find but don't see is that by definition each unique parameterization must positively match a unique subset of the input relation tuples. If we remember only those tuples that matched then at no point should the total memory for the hash table exceed the size of the input relation. Now, I'm not completely confident the cache only holds matched tuples...but if so: >From that the mpath->est_entries should be "min(hash_mem_bytes / est_entry_bytes, 1.0) * ndistinct" i.e., all groups or a fractional subset based upon available hash memory Then: ndistinct = estimate_num_groups() || calls retention_ratio = min(hash_mem_bytes / est_entry_bytes, 1.0) est_entries = retention_ratio * ndistinct evict_ratio = 1.0 - retention_ratio hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0 I don't understand the adjustment factor ndistinct/calls evictions total cost adjustment also assumes we are evicting all tuples as opposed to tuples/est_entries This is a "rescan" so aside from cache management isn't the cost of originally populating the cache already accounted for elsewhere? David J.
Re: strange slow query - lost lot of time somewhere
David Rowley writes: > On Mon, 2 May 2022 at 21:00, Pavel Stehule wrote: >> I found a query that is significantly slower with more memory > If it was work_mem you increased, it seems strange that the plan would > switch over to using a Nested Loop / Memoize plan. Yeah, there's something unexplained there. I think that the most probable explanation for the symptoms is that cost_memoize_rescan is computing some insane value for est_entries, causing ExecInitMemoize to allocate-and-zero a huge hash table, which ExecEndMemoize then frees again. Neither of those steps gets counted into any plan node's runtime, but EXPLAIN's total execution time will include them. An insane value for est_entries could perhaps go along with a cost misestimate that convinces the planner to include the memoize even though it seems pointless. I spent some time studying cost_memoize_rescan, and the only conclusions I arrived at were that the variable names are poorly chosen and the comments are unhelpful. For instance, one would think that est_entry_bytes is the expected size of one cache entry, but it seems to actually be the total space that would be occupied if the whole input relation were loaded into the cache. And the est_cache_entries computation seems nonsensical; if it does make sense, the comment sure doesn't illuminate why. So I am quite prepared to buy into the idea that cost_memoize_rescan is producing bogus answers, but it's hard to tell what it's coming out with in this example. Too bad EXPLAIN doesn't print est_entries. regards, tom lane
Re: strange slow query - lost lot of time somewhere
On Mon, 2 May 2022 at 21:00, Pavel Stehule wrote: > I found a query that is significantly slower with more memory Can you clarify what you mean here? More memory was installed on the machine? or work_mem was increased? or? > plan 1 - fast https://explain.depesz.com/s/XM1f > > plan 2 - slow https://explain.depesz.com/s/2rBw If it was work_mem you increased, it seems strange that the plan would switch over to using a Nested Loop / Memoize plan. Only 91 rows are estimated on the outer side of the join. It's hard to imagine that work_mem was so low that the Memoize costing code thought there would ever be cache evictions. > Strange - the time of last row is +/- same, but execution time is 10x worse > > It looks like slow environment cleaning Can you also show EXPLAIN for the Memoize plan without ANALYZE? Does the slowness present every time that plan is executed? Can you show the EXPLAIN ANALYZE of the nested loop plan with enable_memoize = off? You may ned to disable hash and merge join. David
Re: strange slow query - lost lot of time somewhere
po 2. 5. 2022 v 16:44 odesílatel Matthias van de Meent < boekewurm+postg...@gmail.com> napsal: > On Mon, 2 May 2022 at 16:09, Pavel Stehule > wrote: > > > > > > > > po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule > napsal: > >> there is just shared buffers changed to 32GB and work_mem to 70MB. > Unfortunately - it is in production environment with customer data, so I > cannot to play too much > >> > >> This is perf of slow > >> > >> 25,94% postmaster [kernel.kallsyms] [k] clear_page_erms > >> 11,06% postmaster [kernel.kallsyms] [k] page_fault > >>5,51% postmaster [kernel.kallsyms] [k] prepare_exit_to_usermode > >>5,18% postmaster [kernel.kallsyms] [k] __list_del_entry_valid > >>5,15% postmaster libc-2.28.so [.] __memset_avx2_erms > >>3,99% postmaster [kernel.kallsyms] [k] unmap_page_range > >>3,07% postmaster postgres [.] hash_search_with_hash_value > >>2,73% postmaster [kernel.kallsyms] [k] cgroup_throttle_swaprate > >>2,49% postmaster postgres [.] heap_page_prune_opt > >>1,92% postmaster [kernel.kallsyms] [k] try_charge > >>1,85% postmaster [kernel.kallsyms] [k] > swapgs_restore_regs_and_return_to_usermode > >>1,82% postmaster [kernel.kallsyms] [k] error_entry > >>1,73% postmaster postgres [.] _bt_checkkeys > >>1,48% postmaster [kernel.kallsyms] [k] free_pcppages_bulk > >>1,35% postmaster [kernel.kallsyms] [k] get_page_from_freelist > >>1,20% postmaster [kernel.kallsyms] [k] __pagevec_lru_add_fn > >>1,08% postmaster [kernel.kallsyms] [k] > percpu_ref_put_many.constprop.84 > >>1,08% postmaster postgres [.] 0x003c1be6 > >>1,06% postmaster [kernel.kallsyms] [k] > get_mem_cgroup_from_mm.part.49 > >>0,86% postmaster [kernel.kallsyms] [k] __handle_mm_fault > >>0,79% postmaster [kernel.kallsyms] [k] mem_cgroup_charge > >>0,70% postmaster [kernel.kallsyms] [k] release_pages > >>0,61% postmaster postgres [.] _bt_checkpage > >>0,61% postmaster [kernel.kallsyms] [k] free_pages_and_swap_cache > >>0,60% postmaster [kernel.kallsyms] [k] handle_mm_fault > >>0,57% postmaster postgres [.] tbm_iterate > >>0,56% postmaster [kernel.kallsyms] [k] > __count_memcg_events.part.70 > >>0,55% postmaster [kernel.kallsyms] [k] __mod_memcg_lruvec_state > >>0,52% postmaster postgres [.] 0x0015f6e5 > >>0,50% postmaster [kernel.kallsyms] [k] prep_new_page > >>0,49% postmaster [kernel.kallsyms] [k] __do_page_fault > >>0,46% postmaster [kernel.kallsyms] [k] _raw_spin_lock > >>0,44% postmaster [kernel.kallsyms] [k] do_anonymous_page > >> > >> This is fast > >> > >> 21,13% postmaster postgres [.] hash_search_with_hash_value > >> 15,33% postmaster postgres [.] heap_page_prune_opt > >> 10,22% postmaster libc-2.28.so [.] __memset_avx2_erms > >> 10,00% postmaster postgres [.] _bt_checkkeys > >>6,23% postmaster postgres [.] 0x003c1be6 > >>4,94% postmaster postgres [.] _bt_checkpage > >>2,85% postmaster postgres [.] tbm_iterate > >>2,31% postmaster postgres [.] nocache_index_getattr > >>2,13% postmaster postgres [.] pg_qsort > >>1,58% postmaster postgres [.] heap_hot_search_buffer > >>1,58% postmaster postgres [.] FunctionCall2Coll > >>1,58% postmaster postgres [.] 0x0015f6e5 > >>1,17% postmaster postgres [.] LWLockRelease > >>0,85% postmaster libc-2.28.so [.] __memcmp_avx2_movbe > >>0,64% postmaster postgres [.] 0x003e4233 > >>0,54% postmaster postgres [.] hash_bytes > >>0,53% postmaster postgres [.] 0x00306fbb > >>0,53% postmaster postgres [.] LWLockAcquire > >>0,42% postmaster postgres [.] 0x003c1c6f > >>0,42% postmaster postgres [.] _bt_compare > >> > > > > It looks so memoization allocate lot of memory - maybe there are some > temporal memory leak > > Memoization doesn't leak memory any more than hash tables do; so I > doubt that that is the issue. > > > Performance counter stats for process id '4004464': > > > > 84,26 msec task-clock#0,012 CPUs > utilized > > 3 context-switches #0,036 K/sec > > 0 cpu-migrations#0,000 K/sec > > 19 page-faults #0,225 K/sec > > 0 cycles#0,000 GHz > >106 873 995 instructions > > 20 225 431 branches # 240,026 M/sec > >348 834 branch-misses #1,72% of all > branches > > > >7,106142051 seconds time
Re: strange slow query - lost lot of time somewhere
On Mon, 2 May 2022 at 16:09, Pavel Stehule wrote: > > > > po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule > napsal: >> there is just shared buffers changed to 32GB and work_mem to 70MB. >> Unfortunately - it is in production environment with customer data, so I >> cannot to play too much >> >> This is perf of slow >> >> 25,94% postmaster [kernel.kallsyms] [k] clear_page_erms >> 11,06% postmaster [kernel.kallsyms] [k] page_fault >>5,51% postmaster [kernel.kallsyms] [k] prepare_exit_to_usermode >>5,18% postmaster [kernel.kallsyms] [k] __list_del_entry_valid >>5,15% postmaster libc-2.28.so [.] __memset_avx2_erms >>3,99% postmaster [kernel.kallsyms] [k] unmap_page_range >>3,07% postmaster postgres [.] hash_search_with_hash_value >>2,73% postmaster [kernel.kallsyms] [k] cgroup_throttle_swaprate >>2,49% postmaster postgres [.] heap_page_prune_opt >>1,92% postmaster [kernel.kallsyms] [k] try_charge >>1,85% postmaster [kernel.kallsyms] [k] >> swapgs_restore_regs_and_return_to_usermode >>1,82% postmaster [kernel.kallsyms] [k] error_entry >>1,73% postmaster postgres [.] _bt_checkkeys >>1,48% postmaster [kernel.kallsyms] [k] free_pcppages_bulk >>1,35% postmaster [kernel.kallsyms] [k] get_page_from_freelist >>1,20% postmaster [kernel.kallsyms] [k] __pagevec_lru_add_fn >>1,08% postmaster [kernel.kallsyms] [k] percpu_ref_put_many.constprop.84 >>1,08% postmaster postgres [.] 0x003c1be6 >>1,06% postmaster [kernel.kallsyms] [k] get_mem_cgroup_from_mm.part.49 >>0,86% postmaster [kernel.kallsyms] [k] __handle_mm_fault >>0,79% postmaster [kernel.kallsyms] [k] mem_cgroup_charge >>0,70% postmaster [kernel.kallsyms] [k] release_pages >>0,61% postmaster postgres [.] _bt_checkpage >>0,61% postmaster [kernel.kallsyms] [k] free_pages_and_swap_cache >>0,60% postmaster [kernel.kallsyms] [k] handle_mm_fault >>0,57% postmaster postgres [.] tbm_iterate >>0,56% postmaster [kernel.kallsyms] [k] __count_memcg_events.part.70 >>0,55% postmaster [kernel.kallsyms] [k] __mod_memcg_lruvec_state >>0,52% postmaster postgres [.] 0x0015f6e5 >>0,50% postmaster [kernel.kallsyms] [k] prep_new_page >>0,49% postmaster [kernel.kallsyms] [k] __do_page_fault >>0,46% postmaster [kernel.kallsyms] [k] _raw_spin_lock >>0,44% postmaster [kernel.kallsyms] [k] do_anonymous_page >> >> This is fast >> >> 21,13% postmaster postgres [.] hash_search_with_hash_value >> 15,33% postmaster postgres [.] heap_page_prune_opt >> 10,22% postmaster libc-2.28.so [.] __memset_avx2_erms >> 10,00% postmaster postgres [.] _bt_checkkeys >>6,23% postmaster postgres [.] 0x003c1be6 >>4,94% postmaster postgres [.] _bt_checkpage >>2,85% postmaster postgres [.] tbm_iterate >>2,31% postmaster postgres [.] nocache_index_getattr >>2,13% postmaster postgres [.] pg_qsort >>1,58% postmaster postgres [.] heap_hot_search_buffer >>1,58% postmaster postgres [.] FunctionCall2Coll >>1,58% postmaster postgres [.] 0x0015f6e5 >>1,17% postmaster postgres [.] LWLockRelease >>0,85% postmaster libc-2.28.so [.] __memcmp_avx2_movbe >>0,64% postmaster postgres [.] 0x003e4233 >>0,54% postmaster postgres [.] hash_bytes >>0,53% postmaster postgres [.] 0x00306fbb >>0,53% postmaster postgres [.] LWLockAcquire >>0,42% postmaster postgres [.] 0x003c1c6f >>0,42% postmaster postgres [.] _bt_compare >> > > It looks so memoization allocate lot of memory - maybe there are some > temporal memory leak Memoization doesn't leak memory any more than hash tables do; so I doubt that that is the issue. > Performance counter stats for process id '4004464': > > 84,26 msec task-clock#0,012 CPUs utilized > 3 context-switches #0,036 K/sec > 0 cpu-migrations#0,000 K/sec > 19 page-faults #0,225 K/sec > 0 cycles#0,000 GHz >106 873 995 instructions > 20 225 431 branches # 240,026 M/sec >348 834 branch-misses #1,72% of all branches > >7,106142051 seconds time elapsed > Assuming the above was for the fast query > Performance counter stats for process id '4004464': > > 1 116,97 msec task-clock#0,214 CPUs utilized > 4 context-switches #0,004 K/sec >
Re: strange slow query - lost lot of time somewhere
po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule napsal: > > > po 2. 5. 2022 v 15:28 odesílatel Matthias van de Meent < > boekewurm+postg...@gmail.com> napsal: > >> On Mon, 2 May 2022 at 11:00, Pavel Stehule >> wrote: >> > >> > Hi >> > >> > I found a query that is significantly slower with more memory >> >> Which PostgreSQL version did you use? Did you enable assert checking? >> Do you have an example database setup to work with? >> >> > plan 2 >> > QUERY PLAN >> > >> > Nested Loop Anti Join (cost=46.53..2914.58 rows=1 width=16) (actual >> time=18.306..23.065 rows=32 loops=1) >> > ... >> > Execution Time: 451.161 ms >> >> Truly strange; especially the 418ms difference between execution time >> and the root node's "actual time". I haven't really seen such >> differences happen, except when concurrent locks were held at the >> table / index level. >> >> > plan 1 - fast https://explain.depesz.com/s/XM1f >> > >> > plan 2 - slow https://explain.depesz.com/s/2rBw >> > >> > Strange - the time of last row is +/- same, but execution time is 10x >> worse >> >> The only difference between the two plans that I see is that plan 1 >> doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes >> (one of 66 misses; one of 342 misses). The only "expensive" operation >> that I see in memoize nodes is the check for memory size in >> assert-enabled builds; and that should have very low overhead >> considering that the size of the memoized data is only 8kB and 25kB >> respectively. >> > > This is PostgreSQL 14 used in production environment > > (2022-05-02 15:37:29) prd_aukro=# show debug_assertions ; > ┌──┐ > │ debug_assertions │ > ├──┤ > │ off │ > └──┘ > (1 řádka) > > Čas: 0,295 ms > (2022-05-02 15:37:35) prd_aukro=# select version(); > > ┌┐ > │version > │ > > ├┤ > │ PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 > 20210514 (Red Hat 8.5.0-4), 64-bit │ > > └┘ > (1 řádka) > Čas: 0,629 ms > > there is just shared buffers changed to 32GB and work_mem to 70MB. > Unfortunately - it is in production environment with customer data, so I > cannot to play too much > > This is perf of slow > > 25,94% postmaster [kernel.kallsyms] [k] clear_page_erms > 11,06% postmaster [kernel.kallsyms] [k] page_fault >5,51% postmaster [kernel.kallsyms] [k] prepare_exit_to_usermode >5,18% postmaster [kernel.kallsyms] [k] __list_del_entry_valid >5,15% postmaster libc-2.28.so [.] __memset_avx2_erms >3,99% postmaster [kernel.kallsyms] [k] unmap_page_range >3,07% postmaster postgres [.] hash_search_with_hash_value >2,73% postmaster [kernel.kallsyms] [k] cgroup_throttle_swaprate >2,49% postmaster postgres [.] heap_page_prune_opt >1,92% postmaster [kernel.kallsyms] [k] try_charge >1,85% postmaster [kernel.kallsyms] [k] > swapgs_restore_regs_and_return_to_usermode >1,82% postmaster [kernel.kallsyms] [k] error_entry >1,73% postmaster postgres [.] _bt_checkkeys >1,48% postmaster [kernel.kallsyms] [k] free_pcppages_bulk >1,35% postmaster [kernel.kallsyms] [k] get_page_from_freelist >1,20% postmaster [kernel.kallsyms] [k] __pagevec_lru_add_fn >1,08% postmaster [kernel.kallsyms] [k] > percpu_ref_put_many.constprop.84 >1,08% postmaster postgres [.] 0x003c1be6 >1,06% postmaster [kernel.kallsyms] [k] get_mem_cgroup_from_mm.part.49 >0,86% postmaster [kernel.kallsyms] [k] __handle_mm_fault >0,79% postmaster [kernel.kallsyms] [k] mem_cgroup_charge >0,70% postmaster [kernel.kallsyms] [k] release_pages >0,61% postmaster postgres [.] _bt_checkpage >0,61% postmaster [kernel.kallsyms] [k] free_pages_and_swap_cache >0,60% postmaster [kernel.kallsyms] [k] handle_mm_fault >0,57% postmaster postgres [.] tbm_iterate >0,56% postmaster [kernel.kallsyms] [k] __count_memcg_events.part.70 >0,55% postmaster [kernel.kallsyms] [k] __mod_memcg_lruvec_state >0,52% postmaster postgres [.] 0x0015f6e5 >0,50% postmaster [kernel.kallsyms] [k] prep_new_page >0,49% postmaster [kernel.kallsyms] [k] __do_page_fault >0,46% postmaster [kernel.kallsyms] [k] _raw_spin_lock >0,44% postmaster [kernel.kallsyms] [k] do_anonymous_page > > This is fast > > 21,13% postmaster postgres [.] hash_search_with_hash_value > 15,33% postmaster postgres [.] heap_page_prune_opt > 10,22% po
Re: strange slow query - lost lot of time somewhere
po 2. 5. 2022 v 15:28 odesílatel Matthias van de Meent < boekewurm+postg...@gmail.com> napsal: > On Mon, 2 May 2022 at 11:00, Pavel Stehule > wrote: > > > > Hi > > > > I found a query that is significantly slower with more memory > > Which PostgreSQL version did you use? Did you enable assert checking? > Do you have an example database setup to work with? > > > plan 2 > > QUERY PLAN > > > > Nested Loop Anti Join (cost=46.53..2914.58 rows=1 width=16) (actual > time=18.306..23.065 rows=32 loops=1) > > ... > > Execution Time: 451.161 ms > > Truly strange; especially the 418ms difference between execution time > and the root node's "actual time". I haven't really seen such > differences happen, except when concurrent locks were held at the > table / index level. > > > plan 1 - fast https://explain.depesz.com/s/XM1f > > > > plan 2 - slow https://explain.depesz.com/s/2rBw > > > > Strange - the time of last row is +/- same, but execution time is 10x > worse > > The only difference between the two plans that I see is that plan 1 > doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes > (one of 66 misses; one of 342 misses). The only "expensive" operation > that I see in memoize nodes is the check for memory size in > assert-enabled builds; and that should have very low overhead > considering that the size of the memoized data is only 8kB and 25kB > respectively. > This is PostgreSQL 14 used in production environment (2022-05-02 15:37:29) prd_aukro=# show debug_assertions ; ┌──┐ │ debug_assertions │ ├──┤ │ off │ └──┘ (1 řádka) Čas: 0,295 ms (2022-05-02 15:37:35) prd_aukro=# select version(); ┌┐ │version │ ├┤ │ PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit │ └┘ (1 řádka) Čas: 0,629 ms there is just shared buffers changed to 32GB and work_mem to 70MB. Unfortunately - it is in production environment with customer data, so I cannot to play too much This is perf of slow 25,94% postmaster [kernel.kallsyms] [k] clear_page_erms 11,06% postmaster [kernel.kallsyms] [k] page_fault 5,51% postmaster [kernel.kallsyms] [k] prepare_exit_to_usermode 5,18% postmaster [kernel.kallsyms] [k] __list_del_entry_valid 5,15% postmaster libc-2.28.so [.] __memset_avx2_erms 3,99% postmaster [kernel.kallsyms] [k] unmap_page_range 3,07% postmaster postgres [.] hash_search_with_hash_value 2,73% postmaster [kernel.kallsyms] [k] cgroup_throttle_swaprate 2,49% postmaster postgres [.] heap_page_prune_opt 1,92% postmaster [kernel.kallsyms] [k] try_charge 1,85% postmaster [kernel.kallsyms] [k] swapgs_restore_regs_and_return_to_usermode 1,82% postmaster [kernel.kallsyms] [k] error_entry 1,73% postmaster postgres [.] _bt_checkkeys 1,48% postmaster [kernel.kallsyms] [k] free_pcppages_bulk 1,35% postmaster [kernel.kallsyms] [k] get_page_from_freelist 1,20% postmaster [kernel.kallsyms] [k] __pagevec_lru_add_fn 1,08% postmaster [kernel.kallsyms] [k] percpu_ref_put_many.constprop.84 1,08% postmaster postgres [.] 0x003c1be6 1,06% postmaster [kernel.kallsyms] [k] get_mem_cgroup_from_mm.part.49 0,86% postmaster [kernel.kallsyms] [k] __handle_mm_fault 0,79% postmaster [kernel.kallsyms] [k] mem_cgroup_charge 0,70% postmaster [kernel.kallsyms] [k] release_pages 0,61% postmaster postgres [.] _bt_checkpage 0,61% postmaster [kernel.kallsyms] [k] free_pages_and_swap_cache 0,60% postmaster [kernel.kallsyms] [k] handle_mm_fault 0,57% postmaster postgres [.] tbm_iterate 0,56% postmaster [kernel.kallsyms] [k] __count_memcg_events.part.70 0,55% postmaster [kernel.kallsyms] [k] __mod_memcg_lruvec_state 0,52% postmaster postgres [.] 0x0015f6e5 0,50% postmaster [kernel.kallsyms] [k] prep_new_page 0,49% postmaster [kernel.kallsyms] [k] __do_page_fault 0,46% postmaster [kernel.kallsyms] [k] _raw_spin_lock 0,44% postmaster [kernel.kallsyms] [k] do_anonymous_page This is fast 21,13% postmaster postgres [.] hash_search_with_hash_value 15,33% postmaster postgres [.] heap_page_prune_opt 10,22% postmaster libc-2.28.so [.] __memset_avx2_erms 10,00% postmaster postgres [.] _bt_checkkeys 6,23% postmaster postgres [.] 0x003c1be6 4,94% postmaster postgres [.] _bt_checkpage 2,85%
Re: strange slow query - lost lot of time somewhere
On Mon, 2 May 2022 at 11:00, Pavel Stehule wrote: > > Hi > > I found a query that is significantly slower with more memory Which PostgreSQL version did you use? Did you enable assert checking? Do you have an example database setup to work with? > plan 2 > QUERY PLAN > > Nested Loop Anti Join (cost=46.53..2914.58 rows=1 width=16) (actual > time=18.306..23.065 rows=32 loops=1) > ... > Execution Time: 451.161 ms Truly strange; especially the 418ms difference between execution time and the root node's "actual time". I haven't really seen such differences happen, except when concurrent locks were held at the table / index level. > plan 1 - fast https://explain.depesz.com/s/XM1f > > plan 2 - slow https://explain.depesz.com/s/2rBw > > Strange - the time of last row is +/- same, but execution time is 10x worse The only difference between the two plans that I see is that plan 1 doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes (one of 66 misses; one of 342 misses). The only "expensive" operation that I see in memoize nodes is the check for memory size in assert-enabled builds; and that should have very low overhead considering that the size of the memoized data is only 8kB and 25kB respectively.