Re: strange slow query - lost lot of time somewhere

2022-05-15 Thread Pavel Stehule
po 16. 5. 2022 v 6:11 odesílatel David Rowley  napsal:

> On Fri, 6 May 2022 at 21:27, David Rowley  wrote:
> > I've attached a patch to fix.  I'll look at it in more detail after the
> weekend.
>
> I've now pushed this fix to master and backpatched to 14.
>

Thank you

Pavel


>
> David
>


Re: strange slow query - lost lot of time somewhere

2022-05-15 Thread David Rowley
On Fri, 6 May 2022 at 21:27, David Rowley  wrote:
> I've attached a patch to fix.  I'll look at it in more detail after the 
> weekend.

I've now pushed this fix to master and backpatched to 14.

David




Re: strange slow query - lost lot of time somewhere

2022-05-10 Thread David Rowley
On Tue, 10 May 2022 at 14:22, Justin Pryzby  wrote:
> On Fri, May 06, 2022 at 09:27:57PM +1200, David Rowley wrote:
> > I'm very tempted to change the EXPLAIN output in at least master to
> > display the initial and final (maximum) hash table sizes. Wondering if
> > anyone would object to that?
>
> No objection to add it to v15.
>
> I'll point out that "Cache Mode" was added to EXPLAIN between 11.1 and 11.2
> without controversy, so this could conceivably be backpatched to v14, too.
>
> commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085

This is seemingly a good point, but I don't really think it's a case
of just keeping the EXPLAIN output stable in minor versions, it's more
about adding new fields to structs.

I just went and wrote the patch and the fundamental difference seems
to be that what I did in 6c32c0977 managed to only add a new field in
the empty padding between two fields. That resulted in no fields in
the struct being pushed up in their address offset. The idea here is
not to break any extension that's already been compiled that
references some field that comes after that.

In the patch I've just written, I've had to add some fields which
causes sizeof(MemoizeState) to go up resulting in the offsets of some
later fields changing.

One thing I'll say about this patch is that I found it annoying that I
had to add code to cache_lookup() when we failed to find an entry.
That's probably not the end of the world as that's only for cache
misses.  Ideally, I'd just be looking at the size of the hash table at
the end of execution, however, naturally, we must show the EXPLAIN
output before we shut down the executor.

I just copied the Hash Join output. It looks like:

# alter table tt alter column a set (n_distinct=4);
ALTER TABLE
# analyze tt;
# explain (analyze, costs off, timing off) select * from tt inner join
t2 on tt.a=t2.a;
   QUERY PLAN
-
 Nested Loop (actual rows=100 loops=1)
   ->  Seq Scan on tt (actual rows=100 loops=1)
   ->  Memoize (actual rows=1 loops=100)
 Cache Key: tt.a
 Cache Mode: logical
 Hits: 90  Misses: 10  Evictions: 0  Overflows: 0  Memory Usage: 2kB
 Hash Buckets: 16 (originally 4)
 ->  Index Only Scan using t2_pkey on t2 (actual rows=1 loops=10)
   Index Cond: (a = tt.a)
   Heap Fetches: 0
 Planning Time: 0.483 ms
 Execution Time: 862.860 ms
(12 rows)

Does anyone have any views about the attached patch going into v15?

David
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d2a2479822..c535d63883 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3177,6 +3177,11 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, 
ExplainState *es)
ExplainPropertyInteger("Cache Evictions", NULL, 
mstate->stats.cache_evictions, es);
ExplainPropertyInteger("Cache Overflows", NULL, 
mstate->stats.cache_overflows, es);
ExplainPropertyInteger("Peak Memory Usage", "kB", 
memPeakKb, es);
+   ExplainPropertyInteger("Maximum Hash Buckets", NULL,
+   mstate->stats.hashsize_max, es);
+   ExplainPropertyInteger("Original Hash Buckets", NULL,
+   mstate->stats.hashsize_orig, es);
+
}
else
{
@@ -3188,6 +3193,14 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, 
ExplainState *es)
 
mstate->stats.cache_evictions,
 
mstate->stats.cache_overflows,
 memPeakKb);
+   ExplainIndentText(es);
+   if (mstate->stats.hashsize_max != 
mstate->stats.hashsize_orig)
+   appendStringInfo(es->str, "Hash Buckets: %u 
(originally %u)\n",
+
mstate->stats.hashsize_max,
+
mstate->stats.hashsize_orig);
+   else
+   appendStringInfo(es->str, "Hash Buckets: %u\n",
+
mstate->stats.hashsize_max);
}
}
 
@@ -3227,6 +3240,14 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, 
ExplainState *es)
 si->cache_hits, 
si->cache_misses,
 si->cache_evictions, 
si->cache_overflows,
 memPeakKb);
+   ExplainIndentText(es);
+   if (si->hashsize_max != 

Re: strange slow query - lost lot of time somewhere

2022-05-09 Thread Justin Pryzby
On Tue, May 03, 2022 at 02:13:18PM +1200, David Rowley wrote:
> I'm wishing I put the initial hash table size and the final hash table
> size in EXPLAIN + EXPLAIN ANALYZE now. Perhaps it's not too late for
> v15 to do that so that it might help us figure things out in the
> future.

On Fri, May 06, 2022 at 09:27:57PM +1200, David Rowley wrote:
> I'm very tempted to change the EXPLAIN output in at least master to
> display the initial and final (maximum) hash table sizes. Wondering if
> anyone would object to that?

No objection to add it to v15.

I'll point out that "Cache Mode" was added to EXPLAIN between 11.1 and 11.2
without controversy, so this could conceivably be backpatched to v14, too.

commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085
Author: David Rowley 
Date:   Wed Nov 24 10:07:38 2021 +1300

Allow Memoize to operate in binary comparison mode




Re: strange slow query - lost lot of time somewhere

2022-05-06 Thread Pavel Stehule
pá 6. 5. 2022 v 10:05 odesílatel David Rowley  napsal:

> On Fri, 6 May 2022 at 17:52, Pavel Stehule 
> wrote:
> > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at
> nodeMemoize.c:268
> > 268 if (size == 0)
> > (gdb) p size
> > $1 = 4369066
>
> Thanks for the report.  I think I now see the problem.  Looking at
> [1], it seems that's a bushy plan. That's fine, but less common than a
> left deep plan.
>
> I think the problem is down to some incorrect code in
> get_memoize_path() where we pass the wrong value of "calls" to
> create_memoize_path(). I think instead of outer_path->parent->rows it
> instead should be outer_path->rows.
>
> If you look closely at the plan, you'll see that the outer side of the
> inner-most Nested Loop is parameterized by some higher-level nested
> loop.
>
> ->  Nested Loop  (cost=1.14..79.20 rows=91 width=8) (actual
> time=0.024..0.024 rows=1 loops=66)
>  ->  Index Only Scan using
> uq_isi_itemid_itemimageid on item_share_image itemsharei2__1
> (cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1
> loops=66)
>Index Cond: (item_id = itembo0_.id)
>Heap Fetches: 21
>  ->  Memoize  (cost=0.57..2.07 rows=1 width=8)
> (actual time=0.013..0.013 rows=1 loops=66)
>
> so instead of passing 91 to create_memoize_path() as I thought. Since
> I can't see any WHERE clause items filtering rows from the
> itemsharei2__1 relation, then the outer_path->parent->rows is should
> be whatever pg_class.reltuples says.
>
> Are you able to send the results of:
>
> explain select item_id from item_share_image group by item_id; -- I'm
> interested in the estimated number of groups in the plan's top node.
>



>
> select reltuples from pg_class where oid = 'item_share_image'::regclass;
>
> I'm expecting the estimated number of rows in the top node of the
> group by plan to be about 4369066.
>

(2022-05-06 12:30:23) prd_aukro=# explain select item_id from
item_share_image group by item_id;
QUERY PLAN


Finalize HashAggregate  (cost=1543418.63..1554179.08 rows=1076045 width=8)
  Group Key: item_id
  ->  Gather  (cost=1000.57..1532658.18 rows=4304180 width=8)
Workers Planned: 4
->  Group  (cost=0.57..1101240.18 rows=1076045 width=8)
  Group Key: item_id
  ->  Parallel Index Only Scan using ixfk_isi_itemid on
item_share_image  (cost=0.57..1039823.86 rows=24566530 width=8)
(7 řádek)

Čas: 1,808 ms
(2022-05-06 12:30:26) prd_aukro=# select reltuples from pg_class where oid
= 'item_share_image'::regclass;
 reltuples

9.826612e+07
(1 řádka)

Čas: 0,887 ms

Regards

Pavel


> David
>
> [1] https://explain.depesz.com/s/2rBw#source
>


Re: strange slow query - lost lot of time somewhere

2022-05-06 Thread David Rowley
On Fri, 6 May 2022 at 20:04, David Rowley  wrote:
> Thanks for the report.  I think I now see the problem.  Looking at
> [1], it seems that's a bushy plan. That's fine, but less common than a
> left deep plan.

On second thoughts, it does not need to be a bushy plan for the outer
side of the nested loop to be parameterized by some higher-level
nested loop. There's an example of a plan like this in the regression
tests.

regression=# explain (analyze, costs off, summary off)
regression-# select * from tenk1 t1 left join
regression-# (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
regression-#  on t1.hundred = t2.hundred and t1.ten = t3.ten
regression-# where t1.unique1 = 1;
   QUERY PLAN
-
 Nested Loop Left Join (actual time=0.258..0.487 rows=20 loops=1)
   ->  Index Scan using tenk1_unique1 on tenk1 t1 (actual
time=0.049..0.049 rows=1 loops=1)
 Index Cond: (unique1 = 1)
   ->  Nested Loop (actual time=0.204..0.419 rows=20 loops=1)
 Join Filter: (t1.ten = t3.ten)
 Rows Removed by Join Filter: 80
 ->  Bitmap Heap Scan on tenk1 t2 (actual time=0.064..0.194
rows=100 loops=1)
   Recheck Cond: (t1.hundred = hundred)
   Heap Blocks: exact=86
   ->  Bitmap Index Scan on tenk1_hundred (actual
time=0.036..0.036 rows=100 loops=1)
 Index Cond: (hundred = t1.hundred)
 ->  Memoize (actual time=0.001..0.001 rows=1 loops=100)
   Cache Key: t2.thousand
   Cache Mode: logical
   Hits: 90  Misses: 10  Evictions: 0  Overflows: 0
Memory Usage: 4kB
   ->  Index Scan using tenk1_unique2 on tenk1 t3 (actual
time=0.009..0.009 rows=1 loops=10)
 Index Cond: (unique2 = t2.thousand)
(17 rows)

debugging this I see that the memorize plan won because it was passing
1 as the number of calls. It should have been passing 100.  The
memorize node's number of loops agrees with that. Fixing the calls to
correctly pass 100 gets rid of the Memoize node.

I've attached a patch to fix.  I'll look at it in more detail after the weekend.

I'm very tempted to change the EXPLAIN output in at least master to
display the initial and final (maximum) hash table sizes. Wondering if
anyone would object to that?

David
diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index 9a8c5165b0..313f379ed8 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -610,7 +610,7 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,

hash_operators,

extra->inner_unique,

binary_mode,
-   
outer_path->parent->rows);
+   
outer_path->rows);
}
 
return NULL;
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index bf1a2db2cf..bd3375f2ba 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3673,8 +3673,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-  QUERY PLAN  
---
+   QUERY PLAN   
+
  Nested Loop Left Join
->  Index Scan using tenk1_unique1 on tenk1 t1
  Index Cond: (unique1 = 1)
@@ -3684,20 +3684,17 @@ where t1.unique1 = 1;
Recheck Cond: (t1.hundred = hundred)
->  Bitmap Index Scan on tenk1_hundred
  Index Cond: (hundred = t1.hundred)
- ->  Memoize
-   Cache Key: t2.thousand
-   Cache Mode: logical
-   ->  Index Scan using tenk1_unique2 on tenk1 t3
- Index Cond: (unique2 = t2.thousand)
-(14 rows)
+ ->  Index Scan using tenk1_unique2 on tenk1 t3
+   Index Cond: (unique2 = t2.thousand)
+(11 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-  QUERY PLAN  
---
+   QUERY PLAN   

Re: strange slow query - lost lot of time somewhere

2022-05-06 Thread David Rowley
On Fri, 6 May 2022 at 17:52, Pavel Stehule  wrote:
> Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at 
> nodeMemoize.c:268
> 268 if (size == 0)
> (gdb) p size
> $1 = 4369066

Thanks for the report.  I think I now see the problem.  Looking at
[1], it seems that's a bushy plan. That's fine, but less common than a
left deep plan.

I think the problem is down to some incorrect code in
get_memoize_path() where we pass the wrong value of "calls" to
create_memoize_path(). I think instead of outer_path->parent->rows it
instead should be outer_path->rows.

If you look closely at the plan, you'll see that the outer side of the
inner-most Nested Loop is parameterized by some higher-level nested
loop.

->  Nested Loop  (cost=1.14..79.20 rows=91 width=8) (actual
time=0.024..0.024 rows=1 loops=66)
 ->  Index Only Scan using
uq_isi_itemid_itemimageid on item_share_image itemsharei2__1
(cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1
loops=66)
   Index Cond: (item_id = itembo0_.id)
   Heap Fetches: 21
 ->  Memoize  (cost=0.57..2.07 rows=1 width=8)
(actual time=0.013..0.013 rows=1 loops=66)

so instead of passing 91 to create_memoize_path() as I thought. Since
I can't see any WHERE clause items filtering rows from the
itemsharei2__1 relation, then the outer_path->parent->rows is should
be whatever pg_class.reltuples says.

Are you able to send the results of:

explain select item_id from item_share_image group by item_id; -- I'm
interested in the estimated number of groups in the plan's top node.

select reltuples from pg_class where oid = 'item_share_image'::regclass;

I'm expecting the estimated number of rows in the top node of the
group by plan to be about 4369066.

David

[1] https://explain.depesz.com/s/2rBw#source




Re: strange slow query - lost lot of time somewhere

2022-05-06 Thread Tom Lane
Pavel Stehule  writes:
> Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at
> nodeMemoize.c:268
> 268 if (size == 0)
> (gdb) p size
> $1 = 4369066

Uh-huh 

regards, tom lane




Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread Pavel Stehule
pá 6. 5. 2022 v 1:19 odesílatel David Rowley  napsal:

> On Thu, 5 May 2022 at 19:26, Pavel Stehule 
> wrote:
> >
> > čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak 
> napsal:
> >> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> >> > (gdb) bt
> >> > #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> >> > #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> >> > #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> >> > #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
> >> > #4  0x00932134 in AllocSetAlloc ()
> >> > #5  0x009376cf in MemoryContextAllocExtended ()
> >> > #6  0x006ad915 in ExecInitMemoize ()
> >>
> >> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo)
> rpms / dpkgs(?) so I hope you are basically 1 command away of being able to
> debug it further what happens in ExecInitMemoize()
> >> Those packages seem to be safe as they modify only /usr/lib/debug so
> should not have any impact on production workload.
> >
> > I just have to wait for admin action - I have no root rights for the
> server.
>
> Looking at ExecInitMemoize() it's hard to see what could take such a
> long time other than the build_hash_table().  Tom did mention this,
> but I can't quite see how the size given to that function could be
> larger than 91 in your case.
>
> If you get the debug symbols installed, can you use gdb to
>
> break nodeMemoize.c:268
> p size
>
> maybe there's something I'm missing following the code and maybe there
> is some way that est_entries is not set to what I thought it was.
>
> It would also be good to see the same perf report again after the
> debug symbols are installed in order to resolve those unresolved
> function names.
>

Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at
nodeMemoize.c:268
268 if (size == 0)
(gdb) p size
$1 = 4369066

This is work_mem size

+   99,92% 0,00%  postmaster  postgres   [.] ServerLoop
 ▒
+   99,92% 0,00%  postmaster  postgres   [.] PostgresMain
 ▒
+   99,92% 0,00%  postmaster  postgres   [.]
exec_simple_query
▒
+   99,70% 0,00%  postmaster  postgres   [.] PortalRun
  ▒
+   99,70% 0,00%  postmaster  postgres   [.]
FillPortalStore
▒
+   99,70% 0,02%  postmaster  postgres   [.]
PortalRunUtility
 ▒
+   99,68% 0,00%  postmaster  pg_stat_statements.so  [.]
0x7f5579b599c6
 ▒
+   99,68% 0,00%  postmaster  postgres   [.]
standard_ProcessUtility
▒
+   99,68% 0,00%  postmaster  postgres   [.] ExplainQuery
 ◆
+   99,63% 0,00%  postmaster  postgres   [.]
ExplainOneQuery
▒
+   99,16% 0,00%  postmaster  postgres   [.] ExplainOnePlan
 ▒
+   99,06% 0,00%  postmaster  pg_stat_statements.so  [.]
0x7f5579b5ad2c
 ▒
+   99,06% 0,00%  postmaster  postgres   [.]
standard_ExecutorStart
 ▒
+   99,06% 0,00%  postmaster  postgres   [.] InitPlan
(inlined)  ▒
+   99,06% 0,00%  postmaster  postgres   [.] ExecInitNode
 ▒
+   99,06% 0,00%  postmaster  postgres   [.]
ExecInitNestLoop
 ▒
+   99,00% 0,02%  postmaster  postgres   [.]
ExecInitMemoize
▒
+   98,87%26,80%  postmaster  libc-2.28.so   [.]
__memset_avx2_erms
 ▒
+   98,87% 0,00%  postmaster  postgres   [.]
build_hash_table (inlined)
 ▒
+   98,87% 0,00%  postmaster  postgres   [.] memoize_create
(inlined)▒
+   98,87% 0,00%  postmaster  postgres   [.]
memoize_allocate (inlined)
 ▒
+   98,87% 0,00%  postmaster  postgres   [.]
MemoryContextAllocExtended
 ▒
+   72,08%72,08%  postmaster  [unknown]  [k]
0xbaa010e0
 0,47% 0,00%  postmaster  postgres   [.] pg_plan_query
 0,47% 0,00%  postmaster  pg_stat_statements.so  [.]
0x7f5579b59ba4
 0,47% 0,00%  postmaster  postgres   [.]
standard_planner
 0,47% 0,00%  postmaster  postgres   [.]
subquery_planner
 0,47% 0,00%  postmaster  postgres   [.]
grouping_planner
 0,47% 0,00%  postmaster  postgres   [.] query_planner


>
> David
>


Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread Pavel Stehule
pá 6. 5. 2022 v 1:28 odesílatel David G. Johnston <
david.g.johns...@gmail.com> napsal:

> On Mon, May 2, 2022 at 10:02 PM Pavel Stehule 
> wrote:
>
>>
>>
>> út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:
>>
>>> Pavel Stehule  writes:
>>> > there is really something strange (see attached file). Looks so this
>>> issue
>>> > is much more related to planning time than execution time
>>>
>>> You sure there's not something taking an exclusive lock on one of these
>>> tables every so often?
>>>
>>
>> I am almost sure, I can see this issue only every time when I set a
>> higher work mem. I don't see this issue in other cases.
>>
>>>
>>>
> What are the values of work_mem and hash_mem_multiplier for the two cases?
>

 (2022-05-06 07:35:21) prd_aukro=# show work_mem ;
┌──┐
│ work_mem │
├──┤
│ 400MB│
└──┘
(1 řádka)

Čas: 0,331 ms
(2022-05-06 07:35:32) prd_aukro=# show hash_mem_multiplier ;
┌─┐
│ hash_mem_multiplier │
├─┤
│ 1   │
└─┘
(1 řádka)


> David J.
>


Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread David G. Johnston
On Mon, May 2, 2022 at 10:02 PM Pavel Stehule 
wrote:

>
>
> út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:
>
>> Pavel Stehule  writes:
>> > there is really something strange (see attached file). Looks so this
>> issue
>> > is much more related to planning time than execution time
>>
>> You sure there's not something taking an exclusive lock on one of these
>> tables every so often?
>>
>
> I am almost sure, I can see this issue only every time when I set a higher
> work mem. I don't see this issue in other cases.
>
>>
>>
What are the values of work_mem and hash_mem_multiplier for the two cases?

David J.


Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread David Rowley
On Thu, 5 May 2022 at 19:26, Pavel Stehule  wrote:
>
> čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak  napsal:
>> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > (gdb) bt
>> > #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
>> > #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
>> > #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
>> > #4  0x00932134 in AllocSetAlloc ()
>> > #5  0x009376cf in MemoryContextAllocExtended ()
>> > #6  0x006ad915 in ExecInitMemoize ()
>>
>> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / 
>> dpkgs(?) so I hope you are basically 1 command away of being able to debug 
>> it further what happens in ExecInitMemoize()
>> Those packages seem to be safe as they modify only /usr/lib/debug so should 
>> not have any impact on production workload.
>
> I just have to wait for admin action - I have no root rights for the server.

Looking at ExecInitMemoize() it's hard to see what could take such a
long time other than the build_hash_table().  Tom did mention this,
but I can't quite see how the size given to that function could be
larger than 91 in your case.

If you get the debug symbols installed, can you use gdb to

break nodeMemoize.c:268
p size

maybe there's something I'm missing following the code and maybe there
is some way that est_entries is not set to what I thought it was.

It would also be good to see the same perf report again after the
debug symbols are installed in order to resolve those unresolved
function names.

David




Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread Pavel Stehule
čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak 
napsal:

> Hi Pavel,
>
> > I have not debug symbols, so I have not more details now
> > Breakpoint 1 at 0x7f557f0c16c0
> > (gdb) c
> > Continuing.
>
> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> > (gdb) bt
> > #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> > #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> > #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> > #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
> > #4  0x00932134 in AllocSetAlloc ()
> > #5  0x009376cf in MemoryContextAllocExtended ()
> > #6  0x006ad915 in ExecInitMemoize ()
>
> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms
> / dpkgs(?) so I hope you are basically 1 command away of being able to
> debug it further what happens in ExecInitMemoize()
> Those packages seem to be safe as they modify only /usr/lib/debug so
> should not have any impact on production workload.
>

I just have to wait for admin action - I have no root rights for the server.



>
> -J.
>
>
>


RE: strange slow query - lost lot of time somewhere

2022-05-05 Thread Jakub Wartak
Hi Pavel,

> I have not debug symbols, so I have not more details now
> Breakpoint 1 at 0x7f557f0c16c0
> (gdb) c
> Continuing.

> Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> (gdb) bt
> #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
> #4  0x00932134 in AllocSetAlloc ()
> #5  0x009376cf in MemoryContextAllocExtended ()
> #6  0x006ad915 in ExecInitMemoize ()

Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / 
dpkgs(?) so I hope you are basically 1 command away of being able to debug it 
further what happens in ExecInitMemoize()
Those packages seem to be safe as they modify only /usr/lib/debug so should not 
have any impact on production workload.

-J.






Re: strange slow query - lost lot of time somewhere

2022-05-04 Thread Pavel Stehule
st 4. 5. 2022 v 16:08 odesílatel Jakub Wartak 
napsal:

>
> Additional three ways to figure that one (all are IMHO production safe):
> a) already mentioned perf with --call-graph dwarf -p PID
> b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having
> MAP_ANONYMOUS flag, size of mmap() request will somehow match work_mem
> sizing
> c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check
> MAP_ANONYMOUS as above
>
>
I have not debug symbols, so I have not more details now

Breakpoint 1 at 0x7f557f0c16c0
(gdb) c
Continuing.

Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
(gdb) bt
#0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
#1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
#2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
#3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
#4  0x00932134 in AllocSetAlloc ()
#5  0x009376cf in MemoryContextAllocExtended ()
#6  0x006ad915 in ExecInitMemoize ()
#7  0x0068dc02 in ExecInitNode ()
#8  0x006b37ff in ExecInitNestLoop ()
#9  0x0068dc56 in ExecInitNode ()
#10 0x006b37ff in ExecInitNestLoop ()
#11 0x0068dc56 in ExecInitNode ()
#12 0x006b37de in ExecInitNestLoop ()
#13 0x0068dc56 in ExecInitNode ()
#14 0x006b37de in ExecInitNestLoop ()
#15 0x0068dc56 in ExecInitNode ()
#16 0x00687e4d in standard_ExecutorStart ()
#17 0x7f5579b5ad2d in pgss_ExecutorStart () from
/usr/pgsql-14/lib/pg_stat_statements.so
#18 0x0062e643 in ExplainOnePlan ()
#19 0x0062e83d in ExplainOneQuery ()
#20 0x0062ee6f in ExplainQuery ()
#21 0x007f9b15 in standard_ProcessUtility ()
#22 0x7f5579b599c7 in pgss_ProcessUtility () from
/usr/pgsql-14/lib/pg_stat_statements.so
#23 0x007f7fef in PortalRunUtility ()
#24 0x007f83af in FillPortalStore ()
#25 0x007f86dd in PortalRun ()
#26 0x007f48cb in exec_simple_query ()
#27 0x007f610e in PostgresMain ()
#28 0x00776b8a in ServerLoop ()
#29 0x00777a03 in PostmasterMain ()
#30 0x004fe413 in main ()
(gdb) p
The history is empty.
(gdb) c
Continuing.

Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
(gdb) bt
#0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
#1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
#2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
#3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
#4  0x00932134 in AllocSetAlloc ()
#5  0x009376cf in MemoryContextAllocExtended ()
#6  0x006ad915 in ExecInitMemoize ()
#7  0x0068dc02 in ExecInitNode ()
#8  0x006b37ff in ExecInitNestLoop ()
#9  0x0068dc56 in ExecInitNode ()
#10 0x006b37ff in ExecInitNestLoop ()
#11 0x0068dc56 in ExecInitNode ()
#12 0x006b37de in ExecInitNestLoop ()
#13 0x0068dc56 in ExecInitNode ()
#14 0x00687e4d in standard_ExecutorStart ()
#15 0x7f5579b5ad2d in pgss_ExecutorStart () from
/usr/pgsql-14/lib/pg_stat_statements.so
#16 0x0062e643 in ExplainOnePlan ()
#17 0x0062e83d in ExplainOneQuery ()
#18 0x0062ee6f in ExplainQuery ()
#19 0x007f9b15 in standard_ProcessUtility ()
#20 0x7f5579b599c7 in pgss_ProcessUtility () from
/usr/pgsql-14/lib/pg_stat_statements.so
#21 0x007f7fef in PortalRunUtility ()
#22 0x007f83af in FillPortalStore ()
#23 0x007f86dd in PortalRun ()
#24 0x007f48cb in exec_simple_query ()
#25 0x007f610e in PostgresMain ()
#26 0x00776b8a in ServerLoop ()
#27 0x00777a03 in PostmasterMain ()
#28 0x004fe413 in main ()
(gdb) c
Continuing.

there was 2 hits of mmap

Regards

Pavel




> [1] -
> https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com
>
> -J.
>


Re: strange slow query - lost lot of time somewhere

2022-05-04 Thread Pavel Stehule
st 4. 5. 2022 v 2:15 odesílatel David Rowley  napsal:

> On Tue, 3 May 2022 at 17:02, Pavel Stehule 
> wrote:
> > út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:
> >> You sure there's not something taking an exclusive lock on one of these
> >> tables every so often?
> >
> > I am almost sure, I can see this issue only every time when I set a
> higher work mem. I don't see this issue in other cases.
>
> hmm, I don't think the query being blocked on a table lock would cause
> this behaviour. As far as I know, all table locks should be obtained
> before the timer starts for the "Execution Time" timer in EXPLAIN
> ANALYZE.  However, locks are obtained on indexes at executor startup,
> so if there was some delay in obtaining a lock on the index it would
> show up this way.  I just don't know of anything that obtains a
> conflicting lock on an index without the same conflicting lock on the
> table that the index belongs to.
>
> I do agree that the perf report does indicate that the extra time is
> taken due to some large amount of memory being allocated. I just can't
> quite see how that would happen in Memoize given that
> estimate_num_groups() clamps the distinct estimate as the number of
> input rows, which is 91 in both cases in your problem query.
>
> Are you able to run the Memoize query in psql with \watch 0.1 for a
> few seconds while you do:
>
> perf record --call-graph dwarf --pid  sleep 2
>
> then send along the perf report.
>
> I locally hacked build_hash_table() in nodeMemoize.c to make the
> hashtable 100 million elements and I see my perf report for a trivial
> Memoize query come up as:
>
> +   99.98% 0.00%  postgres  postgres   [.] _start
> +   99.98% 0.00%  postgres  libc.so.6  [.]
> __libc_start_main_alias_2 (inlined)
> +   99.98% 0.00%  postgres  libc.so.6  [.]
> __libc_start_call_main
> +   99.98% 0.00%  postgres  postgres   [.] main
> +   99.98% 0.00%  postgres  postgres   [.] PostmasterMain
> +   99.98% 0.00%  postgres  postgres   [.] ServerLoop
> +   99.98% 0.00%  postgres  postgres   [.] BackendStartup
> (inlined)
> +   99.98% 0.00%  postgres  postgres   [.] BackendRun (inlined)
> +   99.98% 0.00%  postgres  postgres   [.] PostgresMain
> +   99.98% 0.00%  postgres  postgres   [.] exec_simple_query
> +   99.98% 0.00%  postgres  postgres   [.] PortalRun
> +   99.98% 0.00%  postgres  postgres   [.] FillPortalStore
> +   99.98% 0.00%  postgres  postgres   [.] PortalRunUtility
> +   99.98% 0.00%  postgres  postgres   [.]
> standard_ProcessUtility
> +   99.98% 0.00%  postgres  postgres   [.] ExplainQuery
> +   99.98% 0.00%  postgres  postgres   [.] ExplainOneQuery
> +   99.95% 0.00%  postgres  postgres   [.] ExplainOnePlan
> +   87.87% 0.00%  postgres  postgres   [.]
> standard_ExecutorStart
> +   87.87% 0.00%  postgres  postgres   [.] InitPlan (inlined)
> +   87.87% 0.00%  postgres  postgres   [.] ExecInitNode
> +   87.87% 0.00%  postgres  postgres   [.] ExecInitNestLoop
> +   87.87% 0.00%  postgres  postgres   [.] ExecInitMemoize
> +   87.87% 0.00%  postgres  postgres   [.]
> build_hash_table (inlined) <
> +   87.87% 0.00%  postgres  postgres   [.] memoize_create
> (inlined)
> +   87.87% 0.00%  postgres  postgres   [.]
> memoize_allocate (inlined)
> +   87.87% 0.00%  postgres  postgres   [.]
> MemoryContextAllocExtended
> +   87.87% 0.00%  postgres  postgres   [.] memset (inlined)
>
> Failing that, are you able to pg_dump these tables and load them into
> a PostgreSQL instance that you can play around with and patch?
> Provided you can actually recreate the problem on that instance.
>

+   71,98%14,36%  postmaster  [kernel.kallsyms]  [k] page_fault
 ▒
+   70,19% 6,59%  postmaster  libc-2.28.so   [.]
__memset_avx2_erms  ▒
+   68,20% 0,00%  postmaster  postgres   [.] ExecInitNode
 ▒
+   68,20% 0,00%  postmaster  postgres   [.]
ExecInitNestLoop▒
+   68,13% 0,00%  postmaster  postgres   [.]
ExecInitMemoize ▒
+   68,13% 0,00%  postmaster  postgres   [.]
MemoryContextAllocExtended  ▒
+   63,20% 0,00%  postmaster  postgres   [.]
0x00776b89  ▒
+   63,20% 0,00%  postmaster  postgres   [.] PostgresMain
 ◆
+   63,03% 0,00%  postmaster 

RE: strange slow query - lost lot of time somewhere

2022-05-04 Thread Jakub Wartak
> I do agree that the perf report does indicate that the extra time is taken 
> due to
> some large amount of memory being allocated. I just can't quite see how that
> would happen in Memoize given that
> estimate_num_groups() clamps the distinct estimate as the number of input
> rows, which is 91 in both cases in your problem query.
> 
> Are you able to run the Memoize query in psql with \watch 0.1 for a few 
> seconds
> while you do:
> 
> perf record --call-graph dwarf --pid  sleep 2
> 
> then send along the perf report.
> 
> I locally hacked build_hash_table() in nodeMemoize.c to make the hashtable 100
> million elements and I see my perf report for a trivial Memoize query come up
> as:
> 
[..]
> 
> Failing that, are you able to pg_dump these tables and load them into a
> PostgreSQL instance that you can play around with and patch?
> Provided you can actually recreate the problem on that instance.
> 

+1 to what David says, we need a reproducer. In [1] Pavel wrote that he's 
having a lot of clear_page_erms(), so maybe this will be a little help: I 
recall having similar issue having a lot of minor page faults and high %sys 
when raising work_mem. For me it was different issue some time ago, but it was 
something like build_hash_table() being used by UNION recursive calls -> 
BuildTupleHashTable() -> .. malloc() -> mmap64().  When mmap() is issued with 
MAP_ANONYMOUS the kernel will zero out the memory (more memory -> potentially 
bigger CPU waste visible as minor page faults; erms stands for "Enhanced REP 
MOVSB/STOSB"; this is on kernel side). The culprit was planner allocating 
something that wouldn't be used later.

Additional three ways to figure that one (all are IMHO production safe):
a) already mentioned perf with --call-graph dwarf -p PID
b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having MAP_ANONYMOUS 
flag, size of mmap() request will somehow match work_mem sizing
c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check 
MAP_ANONYMOUS as above

[1] - 
https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com

-J.


Re: strange slow query - lost lot of time somewhere

2022-05-03 Thread David Rowley
On Tue, 3 May 2022 at 17:02, Pavel Stehule  wrote:
> út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:
>> You sure there's not something taking an exclusive lock on one of these
>> tables every so often?
>
> I am almost sure, I can see this issue only every time when I set a higher 
> work mem. I don't see this issue in other cases.

hmm, I don't think the query being blocked on a table lock would cause
this behaviour. As far as I know, all table locks should be obtained
before the timer starts for the "Execution Time" timer in EXPLAIN
ANALYZE.  However, locks are obtained on indexes at executor startup,
so if there was some delay in obtaining a lock on the index it would
show up this way.  I just don't know of anything that obtains a
conflicting lock on an index without the same conflicting lock on the
table that the index belongs to.

I do agree that the perf report does indicate that the extra time is
taken due to some large amount of memory being allocated. I just can't
quite see how that would happen in Memoize given that
estimate_num_groups() clamps the distinct estimate as the number of
input rows, which is 91 in both cases in your problem query.

Are you able to run the Memoize query in psql with \watch 0.1 for a
few seconds while you do:

perf record --call-graph dwarf --pid  sleep 2

then send along the perf report.

I locally hacked build_hash_table() in nodeMemoize.c to make the
hashtable 100 million elements and I see my perf report for a trivial
Memoize query come up as:

+   99.98% 0.00%  postgres  postgres   [.] _start
+   99.98% 0.00%  postgres  libc.so.6  [.]
__libc_start_main_alias_2 (inlined)
+   99.98% 0.00%  postgres  libc.so.6  [.] __libc_start_call_main
+   99.98% 0.00%  postgres  postgres   [.] main
+   99.98% 0.00%  postgres  postgres   [.] PostmasterMain
+   99.98% 0.00%  postgres  postgres   [.] ServerLoop
+   99.98% 0.00%  postgres  postgres   [.] BackendStartup (inlined)
+   99.98% 0.00%  postgres  postgres   [.] BackendRun (inlined)
+   99.98% 0.00%  postgres  postgres   [.] PostgresMain
+   99.98% 0.00%  postgres  postgres   [.] exec_simple_query
+   99.98% 0.00%  postgres  postgres   [.] PortalRun
+   99.98% 0.00%  postgres  postgres   [.] FillPortalStore
+   99.98% 0.00%  postgres  postgres   [.] PortalRunUtility
+   99.98% 0.00%  postgres  postgres   [.] standard_ProcessUtility
+   99.98% 0.00%  postgres  postgres   [.] ExplainQuery
+   99.98% 0.00%  postgres  postgres   [.] ExplainOneQuery
+   99.95% 0.00%  postgres  postgres   [.] ExplainOnePlan
+   87.87% 0.00%  postgres  postgres   [.] standard_ExecutorStart
+   87.87% 0.00%  postgres  postgres   [.] InitPlan (inlined)
+   87.87% 0.00%  postgres  postgres   [.] ExecInitNode
+   87.87% 0.00%  postgres  postgres   [.] ExecInitNestLoop
+   87.87% 0.00%  postgres  postgres   [.] ExecInitMemoize
+   87.87% 0.00%  postgres  postgres   [.]
build_hash_table (inlined) <
+   87.87% 0.00%  postgres  postgres   [.] memoize_create (inlined)
+   87.87% 0.00%  postgres  postgres   [.]
memoize_allocate (inlined)
+   87.87% 0.00%  postgres  postgres   [.]
MemoryContextAllocExtended
+   87.87% 0.00%  postgres  postgres   [.] memset (inlined)

Failing that, are you able to pg_dump these tables and load them into
a PostgreSQL instance that you can play around with and patch?
Provided you can actually recreate the problem on that instance.

David




Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Pavel Stehule
út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > there is really something strange (see attached file). Looks so this
> issue
> > is much more related to planning time than execution time
>
> You sure there's not something taking an exclusive lock on one of these
> tables every so often?
>

I am almost sure, I can see this issue only every time when I set a higher
work mem. I don't see this issue in other cases.

Regards

Pavel



>
> regards, tom lane
>


Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Tom Lane
Pavel Stehule  writes:
> there is really something strange (see attached file). Looks so this issue
> is much more related to planning time than execution time

You sure there's not something taking an exclusive lock on one of these
tables every so often?

regards, tom lane




Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Pavel Stehule
út 3. 5. 2022 v 6:09 odesílatel Pavel Stehule 
napsal:

>
>
> po 2. 5. 2022 v 23:48 odesílatel David Rowley 
> napsal:
>
>> On Mon, 2 May 2022 at 21:00, Pavel Stehule 
>> wrote:
>> > I found a query that is significantly slower with more memory
>>
>> Can you clarify what you mean here?  More memory was installed on the
>> machine? or work_mem was increased? or?
>>
>> > plan 1 - fast https://explain.depesz.com/s/XM1f
>> >
>> > plan 2 - slow https://explain.depesz.com/s/2rBw
>>
>> If it was work_mem you increased, it seems strange that the plan would
>> switch over to using a Nested Loop / Memoize plan.  Only 91 rows are
>> estimated on the outer side of the join. It's hard to imagine that
>> work_mem was so low that the Memoize costing code thought there would
>> ever be cache evictions.
>>
>> > Strange - the time of last row is +/- same, but execution time is 10x
>> worse
>> >
>> > It looks like slow environment cleaning
>>
>> Can you also show EXPLAIN for the Memoize plan without ANALYZE?
>>
>
> yes - it is strange - it is just slow without execution
>
>
>  
> ┌──┐
> │
>  QUERY PLAN
>  │
>
> ├──┤
> │ Nested Loop Anti Join  (cost=59.62..3168.15 rows=1 width=16)
>
> │
> │   ->  Nested Loop Anti Join  (cost=59.34..3165.24 rows=1 width=16)
>
> │
> │ ->  Nested Loop Semi Join  (cost=58.48..3133.09 rows=1 width=16)
>
> │
> │   ->  Bitmap Heap Scan on item itembo0_
>  (cost=57.34..2708.22 rows=11 width=16)
>
>  │
> │ Recheck Cond: ((ending_time IS NULL) OR ((status_id
> = ANY ('{1,4,5}'::bigint[])) AND (CURRENT_TIMESTAMP < ending_time) AND
> (starting_time <= CURRENT_TIMESTAMP)))│
> │ Filter: ((to_expose_date IS NULL) AND (status_id =
> ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP) AND
> ((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time))) │
> │ ->  BitmapOr  (cost=57.34..57.34 rows=1751 width=0)
>
>  │
> │   ->  Bitmap Index Scan on stehule354
>  (cost=0.00..2.08 rows=1 width=0)
>│
> │ Index Cond: (ending_time IS NULL)
>
>  │
> │   ->  Bitmap Index Scan on stehule1010
>  (cost=0.00..55.26 rows=1751 width=0)
>   │
> │ Index Cond: ((status_id = ANY
> ('{1,4,5}'::bigint[])) AND (ending_time > CURRENT_TIMESTAMP) AND
> (starting_time <= CURRENT_TIMESTAMP))
>   │
> │   ->  Nested Loop  (cost=1.14..37.71 rows=91 width=8)
>
>  │
> │ ->  Index Only Scan using uq_isi_itemid_itemimageid
> on item_share_image itemsharei2__1  (cost=0.57..3.80 rows=91 width=16)
>   │
> │   Index Cond: (item_id = itembo0_.id)
>
>  │
> │ ->  Memoize  (cost=0.57..2.09 rows=1 width=8)
>
>  │
> │   Cache Key: itemsharei2__1.item_image_id
>
>  │
> │   Cache Mode: logical
>
>  │
> │   ->  Index Only Scan using pk_item_image on
> item_image itemimageb3__1  (cost=0.56..2.08 rows=1 width=8)
> │
> │ Index Cond: (id =
> itemsharei2__1.item_image_id)
>
>│
> │ ->  Nested Loop  (cost=0.85..32.14 rows=1 width=8)
>
> │
> │   ->  Index Only Scan using uq_isi_itemid_itemimageid on
> item_share_image itemsharei2_  (cost=0.57..3.80 rows=91 width=16)
> │
> │ Index Cond: (item_id = itembo0_.id)
>
>  │
> │   ->  Memoize  (cost=0.29..1.72 rows=1 

Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Pavel Stehule
po 2. 5. 2022 v 23:48 odesílatel David Rowley  napsal:

> On Mon, 2 May 2022 at 21:00, Pavel Stehule 
> wrote:
> > I found a query that is significantly slower with more memory
>
> Can you clarify what you mean here?  More memory was installed on the
> machine? or work_mem was increased? or?
>
> > plan 1 - fast https://explain.depesz.com/s/XM1f
> >
> > plan 2 - slow https://explain.depesz.com/s/2rBw
>
> If it was work_mem you increased, it seems strange that the plan would
> switch over to using a Nested Loop / Memoize plan.  Only 91 rows are
> estimated on the outer side of the join. It's hard to imagine that
> work_mem was so low that the Memoize costing code thought there would
> ever be cache evictions.
>
> > Strange - the time of last row is +/- same, but execution time is 10x
> worse
> >
> > It looks like slow environment cleaning
>
> Can you also show EXPLAIN for the Memoize plan without ANALYZE?
>

yes - it is strange - it is just slow without execution

 
┌──┐
│
   QUERY PLAN
   │
├──┤
│ Nested Loop Anti Join  (cost=59.62..3168.15 rows=1 width=16)

│
│   ->  Nested Loop Anti Join  (cost=59.34..3165.24 rows=1 width=16)

│
│ ->  Nested Loop Semi Join  (cost=58.48..3133.09 rows=1 width=16)

│
│   ->  Bitmap Heap Scan on item itembo0_  (cost=57.34..2708.22
rows=11 width=16)
   │
│ Recheck Cond: ((ending_time IS NULL) OR ((status_id =
ANY ('{1,4,5}'::bigint[])) AND (CURRENT_TIMESTAMP < ending_time) AND
(starting_time <= CURRENT_TIMESTAMP)))│
│ Filter: ((to_expose_date IS NULL) AND (status_id =
ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP) AND
((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time))) │
│ ->  BitmapOr  (cost=57.34..57.34 rows=1751 width=0)

   │
│   ->  Bitmap Index Scan on stehule354
 (cost=0.00..2.08 rows=1 width=0)
   │
│ Index Cond: (ending_time IS NULL)

   │
│   ->  Bitmap Index Scan on stehule1010
 (cost=0.00..55.26 rows=1751 width=0)
  │
│ Index Cond: ((status_id = ANY
('{1,4,5}'::bigint[])) AND (ending_time > CURRENT_TIMESTAMP) AND
(starting_time <= CURRENT_TIMESTAMP))
  │
│   ->  Nested Loop  (cost=1.14..37.71 rows=91 width=8)

   │
│ ->  Index Only Scan using uq_isi_itemid_itemimageid
on item_share_image itemsharei2__1  (cost=0.57..3.80 rows=91 width=16)
  │
│   Index Cond: (item_id = itembo0_.id)

   │
│ ->  Memoize  (cost=0.57..2.09 rows=1 width=8)

   │
│   Cache Key: itemsharei2__1.item_image_id

   │
│   Cache Mode: logical

   │
│   ->  Index Only Scan using pk_item_image on
item_image itemimageb3__1  (cost=0.56..2.08 rows=1 width=8)
│
│ Index Cond: (id =
itemsharei2__1.item_image_id)

   │
│ ->  Nested Loop  (cost=0.85..32.14 rows=1 width=8)

│
│   ->  Index Only Scan using uq_isi_itemid_itemimageid on
item_share_image itemsharei2_  (cost=0.57..3.80 rows=91 width=16)
│
│ Index Cond: (item_id = itembo0_.id)

   │
│   ->  Memoize  (cost=0.29..1.72 rows=1 width=8)

   │
│ Cache Key: itemsharei2_.item_image_id

   │
│ 

Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread David Rowley
On Tue, 3 May 2022 at 15:22, David G. Johnston
 wrote:
> Plugging in:
> n = 2,000
> e = 500
> c = 10,000
>
> proper = 5%
> incorrect = 25%
>
> But of the 10,000 calls we will receive, the first 2,000 will be misses while 
> 2,000 of the remaining 8,000 will be hits, due to sharing 2,000 distinct 
> groups among the available inventory of 500 (25% of 8,000 is 2,000).  2,000 
> hits in 10,000 calls yields 20%.
>
> I believe the correct formula to be:
>
> ((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio
> .80 * .25 = .20

I think you're correct here.  The formula should be that.  However,
two things; 1) this being incorrect is not the cause of the original
problem reported on this thread, and 2) There's just no way we could
consider fixing this in v15, let alone back patch it to v14.

Maybe we should open a new thread about this and put an entry in the
first CF for v16 under bugs and come back to it after we branch.
Thinking the cache hit ratio is lower than it actually is going to be
will reduce the chances of the planner switching to a Nested Loop /
Memoize plan vs a Hash or Merge Join plan.

I was already fairly concerned that Memoize could cause performance
regressions when the ndistinct value or expected cache entry size is
underestimated or the outer side rows are overestimated.  What I've
got to calculate the cache hit ratio does seem incorrect given what
you're showing, however it does add an element of pessimism and
reduces the chances of a bad plan being picked when work_mem is too
low to cache all entries.  Swapping it out for your formula seems like
it would increase the chances of a Memoize plan being picked when the
row, ndistinct and cache entry size estimates are correct, however, it
could also increase the chance of a bad plan when being picked in
cases where the estimates are incorrect.

My problem with changing this now would be that we already often
perform Nested Loop joins when a Hash or Merge join would be a better
option. I'd hate to take us in a direction where we make that problem
even worse.

David




Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread David G. Johnston
On Mon, May 2, 2022 at 7:30 PM David Rowley  wrote:

> On Tue, 3 May 2022 at 13:43, David G. Johnston
>  wrote:
> > hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to
> 0.0
> > I don't understand the adjustment factor ndistinct/calls
>
> I've attached a spreadsheet showing you the impact of subtracting
> (ndistinct / calls).  What this is correcting for is the fact that the
> first scan from each unique value is a cache miss.  The more calls we
> have, the more hits we'll get.  If there was only 1 call per distinct
> value then there'd never be any hits. Without subtracting (ndistinct /
> calls) and assuming there's space in the cache for each ndistinct
> value, we'd assume 100% cache hit ratio if calls == ndistinct.  What
> we should assume in that case is a 0% hit ratio as the first scan for
> each distinct parameter must always be a miss as we've never had a
> chance to cache any tuples for it yet.
>
>
Thank you.  I understand the theory and agree with it - but the math
doesn't seem to be working out.

Plugging in:
n = 2,000
e = 500
c = 10,000

proper = 5%
incorrect = 25%

But of the 10,000 calls we will receive, the first 2,000 will be
misses while 2,000 of the remaining 8,000 will be hits, due to sharing
2,000 distinct groups among the available inventory of 500 (25% of 8,000 is
2,000).  2,000 hits in 10,000 calls yields 20%.

I believe the correct formula to be:

((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio
.80 * .25 = .20

First we recognize that our hit ratio can be at most c-n/c since n misses
are guaranteed.  We take that ratio and scale it by our cache efficiency
since of the remaining hits that fraction will turn into misses due to the
relevant cache not being in memory.

David J.


Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread David G. Johnston
On Mon, May 2, 2022 at 7:13 PM David Rowley  wrote:

> On Tue, 3 May 2022 at 11:02, Tom Lane  wrote:
> >
> > David Rowley  writes:
> > > On Mon, 2 May 2022 at 21:00, Pavel Stehule 
> wrote:
> > >> I found a query that is significantly slower with more memory
> >
> > > If it was work_mem you increased, it seems strange that the plan would
> > > switch over to using a Nested Loop / Memoize plan.
> >
> > Yeah, there's something unexplained there.
> >
> > I think that the most probable explanation for the symptoms is that
> > cost_memoize_rescan is computing some insane value for est_entries,
> > causing ExecInitMemoize to allocate-and-zero a huge hash table,
> > which ExecEndMemoize then frees again.  Neither of those steps
> > gets counted into any plan node's runtime, but EXPLAIN's total
> > execution time will include them.  An insane value for est_entries
> > could perhaps go along with a cost misestimate that convinces the
> > planner to include the memoize even though it seems pointless.
>
> That seems pretty unlikely to me. est_entries is based on the minimum
> value of the expected number of total cache entries and the ndistinct
> value. ndistinct cannot be insane here as ndistinct is never going to
> be higher than the number of calls, which is the row estimate from the
> outer side of the join. That's 91 in both cases here.   As far as I
> can see, that's just going to make a table of 128 buckets.
>

If est_entries goes to zero due to hash_mem_bytes/est_entry_bytes < 1
(hence floor takes it to zero) the executor will use a size value of 1024
instead in build_hash_table.

That seems unlikely but there is no data to support or refute it.


> I'm open to making improvements to the comments in that area. I do
> remember spending quite a bit of time trying to make things as clear
> as possible as it is fairly complex what's going on there.
>
>
A few more intermediate calculation variables, along with descriptions,
would help.

e.g., min(est_cache_entries, ndistinct) is repeated twice after its initial
definition.

retention_ratio per my other reply

The (ndistinct/calls) part of hit_ratio being described specifically.

David J.


Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread David Rowley
On Tue, 3 May 2022 at 13:43, David G. Johnston
 wrote:
> hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0
> I don't understand the adjustment factor ndistinct/calls

I've attached a spreadsheet showing you the impact of subtracting
(ndistinct / calls).  What this is correcting for is the fact that the
first scan from each unique value is a cache miss.  The more calls we
have, the more hits we'll get.  If there was only 1 call per distinct
value then there'd never be any hits. Without subtracting (ndistinct /
calls) and assuming there's space in the cache for each ndistinct
value, we'd assume 100% cache hit ratio if calls == ndistinct.  What
we should assume in that case is a 0% hit ratio as the first scan for
each distinct parameter must always be a miss as we've never had a
chance to cache any tuples for it yet.

> This is a "rescan" so aside from cache management isn't the cost of 
> originally populating the cache already accounted for elsewhere?

The cost of the first scan is calculated in create_memoize_path().
Since the first scan will always be a cache miss, the code there just
adds some cache management surcharges. Namely:

/*
* Add a small additional charge for caching the first entry.  All the
* harder calculations for rescans are performed in cost_memoize_rescan().
*/
pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;

David


memoize_cache_hits.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread David Rowley
On Tue, 3 May 2022 at 11:02, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Mon, 2 May 2022 at 21:00, Pavel Stehule  wrote:
> >> I found a query that is significantly slower with more memory
>
> > If it was work_mem you increased, it seems strange that the plan would
> > switch over to using a Nested Loop / Memoize plan.
>
> Yeah, there's something unexplained there.
>
> I think that the most probable explanation for the symptoms is that
> cost_memoize_rescan is computing some insane value for est_entries,
> causing ExecInitMemoize to allocate-and-zero a huge hash table,
> which ExecEndMemoize then frees again.  Neither of those steps
> gets counted into any plan node's runtime, but EXPLAIN's total
> execution time will include them.  An insane value for est_entries
> could perhaps go along with a cost misestimate that convinces the
> planner to include the memoize even though it seems pointless.

That seems pretty unlikely to me. est_entries is based on the minimum
value of the expected number of total cache entries and the ndistinct
value. ndistinct cannot be insane here as ndistinct is never going to
be higher than the number of calls, which is the row estimate from the
outer side of the join. That's 91 in both cases here.   As far as I
can see, that's just going to make a table of 128 buckets.

See estimate_num_groups_incremental() at:

/*
* We don't ever want to return an estimate of zero groups, as that tends
* to lead to division-by-zero and other unpleasantness.  The input_rows
* estimate is usually already at least 1, but clamp it just in case it
* isn't.
*/
input_rows = clamp_row_est(input_rows);


> I spent some time studying cost_memoize_rescan, and the only
> conclusions I arrived at were that the variable names are poorly
> chosen and the comments are unhelpful.  For instance, one would
> think that est_entry_bytes is the expected size of one cache entry,
> but it seems to actually be the total space that would be occupied
> if the whole input relation were loaded into the cache.

I think you've misunderstood. It *is* the estimated size of a single
entry. I think you might be going wrong in assuming "tuples" is the
expected tuples from all rescans of the inner side of the join. It's
actually from a single scan.  I can add a comment there to help make
that clear.

> And
> the est_cache_entries computation seems nonsensical; if it does
> make sense, the comment sure doesn't illuminate why.  So I am
> quite prepared to buy into the idea that cost_memoize_rescan is
> producing bogus answers, but it's hard to tell what it's coming out
> with in this example.  Too bad EXPLAIN doesn't print est_entries.

I'm wishing I put the initial hash table size and the final hash table
size in EXPLAIN + EXPLAIN ANALYZE now. Perhaps it's not too late for
v15 to do that so that it might help us figure things out in the
future.

I'm open to making improvements to the comments in that area. I do
remember spending quite a bit of time trying to make things as clear
as possible as it is fairly complex what's going on there.

David




Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread David G. Johnston
On Mon, May 2, 2022 at 4:02 PM Tom Lane  wrote:

> David Rowley  writes:
> > On Mon, 2 May 2022 at 21:00, Pavel Stehule 
> wrote:
> >> I found a query that is significantly slower with more memory
>
> > If it was work_mem you increased, it seems strange that the plan would
> > switch over to using a Nested Loop / Memoize plan.
>
> Yeah, there's something unexplained there.
>
> I spent some time studying cost_memoize_rescan, and the only
> conclusions I arrived at were that the variable names are poorly
> chosen and the comments are unhelpful.  For instance, one would
> think that est_entry_bytes is the expected size of one cache entry,
> but it seems to actually be the total space that would be occupied
> if the whole input relation were loaded into the cache.

  And
> the est_cache_entries computation seems nonsensical; if it does
> make sense, the comment sure doesn't illuminate why.


My take on this is that a cache entry is keyed by a parameterization and
any given entry can have, at most, every tuple saved into it (hence the
computation of tuples*per-tuple-size).  So the maximum number of hash keys
is the total available memory divided by input relation size.  This upper
bound is stored in est_cache_entries.  If the number of unique
parameterizations expected (at worst one-per-call) is less than this we use
that value and never evict.  If it is more we use the est_cache_entries and
plan to evict.

What I'm expecting to find but don't see is that by definition each unique
parameterization must positively match a unique subset of the input
relation tuples.  If we remember only those tuples that matched then at no
point should the total memory for the hash table exceed the size of the
input relation.

Now, I'm not completely confident the cache only holds matched tuples...but
if so:

>From that the mpath->est_entries should be "min(hash_mem_bytes /
est_entry_bytes, 1.0) * ndistinct"
i.e., all groups or a fractional subset based upon available hash memory

Then:

ndistinct = estimate_num_groups() || calls
retention_ratio = min(hash_mem_bytes / est_entry_bytes, 1.0)
est_entries = retention_ratio * ndistinct
evict_ratio = 1.0 - retention_ratio

hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0
I don't understand the adjustment factor ndistinct/calls

evictions total cost adjustment also assumes we are evicting all tuples as
opposed to tuples/est_entries

This is a "rescan" so aside from cache management isn't the cost of
originally populating the cache already accounted for elsewhere?

David J.


Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Tom Lane
David Rowley  writes:
> On Mon, 2 May 2022 at 21:00, Pavel Stehule  wrote:
>> I found a query that is significantly slower with more memory

> If it was work_mem you increased, it seems strange that the plan would
> switch over to using a Nested Loop / Memoize plan.

Yeah, there's something unexplained there.

I think that the most probable explanation for the symptoms is that
cost_memoize_rescan is computing some insane value for est_entries,
causing ExecInitMemoize to allocate-and-zero a huge hash table,
which ExecEndMemoize then frees again.  Neither of those steps
gets counted into any plan node's runtime, but EXPLAIN's total
execution time will include them.  An insane value for est_entries
could perhaps go along with a cost misestimate that convinces the
planner to include the memoize even though it seems pointless.

I spent some time studying cost_memoize_rescan, and the only
conclusions I arrived at were that the variable names are poorly
chosen and the comments are unhelpful.  For instance, one would
think that est_entry_bytes is the expected size of one cache entry,
but it seems to actually be the total space that would be occupied
if the whole input relation were loaded into the cache.  And
the est_cache_entries computation seems nonsensical; if it does
make sense, the comment sure doesn't illuminate why.  So I am
quite prepared to buy into the idea that cost_memoize_rescan is
producing bogus answers, but it's hard to tell what it's coming out
with in this example.  Too bad EXPLAIN doesn't print est_entries.

regards, tom lane




Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread David Rowley
On Mon, 2 May 2022 at 21:00, Pavel Stehule  wrote:
> I found a query that is significantly slower with more memory

Can you clarify what you mean here?  More memory was installed on the
machine? or work_mem was increased? or?

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw

If it was work_mem you increased, it seems strange that the plan would
switch over to using a Nested Loop / Memoize plan.  Only 91 rows are
estimated on the outer side of the join. It's hard to imagine that
work_mem was so low that the Memoize costing code thought there would
ever be cache evictions.

> Strange - the time of last row is +/- same, but execution time is 10x worse
>
> It looks like slow environment cleaning

Can you also show EXPLAIN for the Memoize plan without ANALYZE?

Does the slowness present every time that plan is executed?

Can you show the EXPLAIN ANALYZE of the nested loop plan with
enable_memoize = off?  You may ned to disable hash and merge join.

David




Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Pavel Stehule
po 2. 5. 2022 v 16:44 odesílatel Matthias van de Meent <
boekewurm+postg...@gmail.com> napsal:

> On Mon, 2 May 2022 at 16:09, Pavel Stehule 
> wrote:
> >
> >
> >
> > po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule 
> napsal:
> >> there is just shared buffers changed to 32GB and work_mem to 70MB.
> Unfortunately - it is in production environment with customer data, so I
> cannot to play too much
> >>
> >> This is perf of slow
> >>
> >>   25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
> >>   11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
> >>5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
> >>5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
> >>5,15%  postmaster  libc-2.28.so   [.] __memset_avx2_erms
> >>3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
> >>3,07%  postmaster  postgres   [.] hash_search_with_hash_value
> >>2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
> >>2,49%  postmaster  postgres   [.] heap_page_prune_opt
> >>1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
> >>1,85%  postmaster  [kernel.kallsyms]  [k]
> swapgs_restore_regs_and_return_to_usermode
> >>1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
> >>1,73%  postmaster  postgres   [.] _bt_checkkeys
> >>1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
> >>1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
> >>1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
> >>1,08%  postmaster  [kernel.kallsyms]  [k]
> percpu_ref_put_many.constprop.84
> >>1,08%  postmaster  postgres   [.] 0x003c1be6
> >>1,06%  postmaster  [kernel.kallsyms]  [k]
> get_mem_cgroup_from_mm.part.49
> >>0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
> >>0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
> >>0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
> >>0,61%  postmaster  postgres   [.] _bt_checkpage
> >>0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
> >>0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
> >>0,57%  postmaster  postgres   [.] tbm_iterate
> >>0,56%  postmaster  [kernel.kallsyms]  [k]
> __count_memcg_events.part.70
> >>0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
> >>0,52%  postmaster  postgres   [.] 0x0015f6e5
> >>0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
> >>0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
> >>0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
> >>0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page
> >>
> >> This is fast
> >>
> >>   21,13%  postmaster  postgres   [.] hash_search_with_hash_value
> >>   15,33%  postmaster  postgres   [.] heap_page_prune_opt
> >>   10,22%  postmaster  libc-2.28.so   [.] __memset_avx2_erms
> >>   10,00%  postmaster  postgres   [.] _bt_checkkeys
> >>6,23%  postmaster  postgres   [.] 0x003c1be6
> >>4,94%  postmaster  postgres   [.] _bt_checkpage
> >>2,85%  postmaster  postgres   [.] tbm_iterate
> >>2,31%  postmaster  postgres   [.] nocache_index_getattr
> >>2,13%  postmaster  postgres   [.] pg_qsort
> >>1,58%  postmaster  postgres   [.] heap_hot_search_buffer
> >>1,58%  postmaster  postgres   [.] FunctionCall2Coll
> >>1,58%  postmaster  postgres   [.] 0x0015f6e5
> >>1,17%  postmaster  postgres   [.] LWLockRelease
> >>0,85%  postmaster  libc-2.28.so   [.] __memcmp_avx2_movbe
> >>0,64%  postmaster  postgres   [.] 0x003e4233
> >>0,54%  postmaster  postgres   [.] hash_bytes
> >>0,53%  postmaster  postgres   [.] 0x00306fbb
> >>0,53%  postmaster  postgres   [.] LWLockAcquire
> >>0,42%  postmaster  postgres   [.] 0x003c1c6f
> >>0,42%  postmaster  postgres   [.] _bt_compare
> >>
> >
> > It looks so memoization allocate lot of memory - maybe there are some
> temporal memory leak
>
> Memoization doesn't leak memory any more than hash tables do; so I
> doubt that that is the issue.
>
> >  Performance counter stats for process id '4004464':
> >
> >  84,26 msec task-clock#0,012 CPUs
> utilized
> >  3  context-switches  #0,036 K/sec
> >  0  cpu-migrations#0,000 K/sec
> > 19  page-faults   #0,225 K/sec
> >  0  cycles#0,000 GHz
> >106 873 995  instructions
> > 20 225 431  branches  #  240,026 M/sec
> >348 834  branch-misses #1,72% of all
> branches
> >
> >7,106142051 seconds time 

Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Matthias van de Meent
On Mon, 2 May 2022 at 16:09, Pavel Stehule  wrote:
>
>
>
> po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule  
> napsal:
>> there is just shared buffers changed to 32GB and work_mem to 70MB. 
>> Unfortunately - it is in production environment with customer data, so I 
>> cannot to play too much
>>
>> This is perf of slow
>>
>>   25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
>>   11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
>>5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
>>5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
>>5,15%  postmaster  libc-2.28.so   [.] __memset_avx2_erms
>>3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
>>3,07%  postmaster  postgres   [.] hash_search_with_hash_value
>>2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
>>2,49%  postmaster  postgres   [.] heap_page_prune_opt
>>1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
>>1,85%  postmaster  [kernel.kallsyms]  [k] 
>> swapgs_restore_regs_and_return_to_usermode
>>1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
>>1,73%  postmaster  postgres   [.] _bt_checkkeys
>>1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
>>1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
>>1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
>>1,08%  postmaster  [kernel.kallsyms]  [k] percpu_ref_put_many.constprop.84
>>1,08%  postmaster  postgres   [.] 0x003c1be6
>>1,06%  postmaster  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm.part.49
>>0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
>>0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
>>0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
>>0,61%  postmaster  postgres   [.] _bt_checkpage
>>0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
>>0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
>>0,57%  postmaster  postgres   [.] tbm_iterate
>>0,56%  postmaster  [kernel.kallsyms]  [k] __count_memcg_events.part.70
>>0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
>>0,52%  postmaster  postgres   [.] 0x0015f6e5
>>0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
>>0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
>>0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
>>0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page
>>
>> This is fast
>>
>>   21,13%  postmaster  postgres   [.] hash_search_with_hash_value
>>   15,33%  postmaster  postgres   [.] heap_page_prune_opt
>>   10,22%  postmaster  libc-2.28.so   [.] __memset_avx2_erms
>>   10,00%  postmaster  postgres   [.] _bt_checkkeys
>>6,23%  postmaster  postgres   [.] 0x003c1be6
>>4,94%  postmaster  postgres   [.] _bt_checkpage
>>2,85%  postmaster  postgres   [.] tbm_iterate
>>2,31%  postmaster  postgres   [.] nocache_index_getattr
>>2,13%  postmaster  postgres   [.] pg_qsort
>>1,58%  postmaster  postgres   [.] heap_hot_search_buffer
>>1,58%  postmaster  postgres   [.] FunctionCall2Coll
>>1,58%  postmaster  postgres   [.] 0x0015f6e5
>>1,17%  postmaster  postgres   [.] LWLockRelease
>>0,85%  postmaster  libc-2.28.so   [.] __memcmp_avx2_movbe
>>0,64%  postmaster  postgres   [.] 0x003e4233
>>0,54%  postmaster  postgres   [.] hash_bytes
>>0,53%  postmaster  postgres   [.] 0x00306fbb
>>0,53%  postmaster  postgres   [.] LWLockAcquire
>>0,42%  postmaster  postgres   [.] 0x003c1c6f
>>0,42%  postmaster  postgres   [.] _bt_compare
>>
>
> It looks so memoization allocate lot of memory - maybe there are some 
> temporal memory leak

Memoization doesn't leak memory any more than hash tables do; so I
doubt that that is the issue.

>  Performance counter stats for process id '4004464':
>
>  84,26 msec task-clock#0,012 CPUs utilized
>  3  context-switches  #0,036 K/sec
>  0  cpu-migrations#0,000 K/sec
> 19  page-faults   #0,225 K/sec
>  0  cycles#0,000 GHz
>106 873 995  instructions
> 20 225 431  branches  #  240,026 M/sec
>348 834  branch-misses #1,72% of all branches
>
>7,106142051 seconds time elapsed
>

Assuming the above was for the fast query

>  Performance counter stats for process id '4004464':
>
>   1 116,97 msec task-clock#0,214 CPUs utilized
>  4  context-switches  #0,004 K/sec
>

Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Pavel Stehule
po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule 
napsal:

>
>
> po 2. 5. 2022 v 15:28 odesílatel Matthias van de Meent <
> boekewurm+postg...@gmail.com> napsal:
>
>> On Mon, 2 May 2022 at 11:00, Pavel Stehule 
>> wrote:
>> >
>> > Hi
>> >
>> > I found a query that is significantly slower with more memory
>>
>> Which PostgreSQL version did you use? Did you enable assert checking?
>> Do you have an example database setup to work with?
>>
>> > plan 2
>> >  QUERY PLAN
>> > 
>> >  Nested Loop Anti Join  (cost=46.53..2914.58 rows=1 width=16) (actual
>> time=18.306..23.065 rows=32 loops=1)
>> > ...
>> >  Execution Time: 451.161 ms
>>
>> Truly strange; especially the 418ms difference between execution time
>> and the root node's "actual time". I haven't really seen such
>> differences happen, except when concurrent locks were held at the
>> table / index level.
>>
>> > plan 1 - fast https://explain.depesz.com/s/XM1f
>> >
>> > plan 2 - slow https://explain.depesz.com/s/2rBw
>> >
>> > Strange - the time of last row is +/- same, but execution time is 10x
>> worse
>>
>> The only difference between the two plans that I see is that plan 1
>> doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes
>> (one of 66 misses; one of 342 misses). The only "expensive" operation
>> that I see in memoize nodes is the check for memory size in
>> assert-enabled builds; and that should have very low overhead
>> considering that the size of the memoized data is only 8kB and 25kB
>> respectively.
>>
>
> This is PostgreSQL 14 used in production environment
>
>  (2022-05-02 15:37:29) prd_aukro=# show debug_assertions ;
> ┌──┐
> │ debug_assertions │
> ├──┤
> │ off  │
> └──┘
> (1 řádka)
>
> Čas: 0,295 ms
> (2022-05-02 15:37:35) prd_aukro=# select version();
>
> ┌┐
> │version
>   │
>
> ├┤
> │ PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
> 20210514 (Red Hat 8.5.0-4), 64-bit │
>
> └┘
> (1 řádka)
> Čas: 0,629 ms
>
> there is just shared buffers changed to 32GB and work_mem to 70MB.
> Unfortunately - it is in production environment with customer data, so I
> cannot to play too much
>
> This is perf of slow
>
>   25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
>   11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
>5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
>5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
>5,15%  postmaster  libc-2.28.so   [.] __memset_avx2_erms
>3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
>3,07%  postmaster  postgres   [.] hash_search_with_hash_value
>2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
>2,49%  postmaster  postgres   [.] heap_page_prune_opt
>1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
>1,85%  postmaster  [kernel.kallsyms]  [k]
> swapgs_restore_regs_and_return_to_usermode
>1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
>1,73%  postmaster  postgres   [.] _bt_checkkeys
>1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
>1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
>1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
>1,08%  postmaster  [kernel.kallsyms]  [k]
> percpu_ref_put_many.constprop.84
>1,08%  postmaster  postgres   [.] 0x003c1be6
>1,06%  postmaster  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm.part.49
>0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
>0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
>0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
>0,61%  postmaster  postgres   [.] _bt_checkpage
>0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
>0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
>0,57%  postmaster  postgres   [.] tbm_iterate
>0,56%  postmaster  [kernel.kallsyms]  [k] __count_memcg_events.part.70
>0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
>0,52%  postmaster  postgres   [.] 0x0015f6e5
>0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
>0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
>0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
>0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page
>
> This is fast
>
>   21,13%  postmaster  postgres   [.] hash_search_with_hash_value
>   15,33%  postmaster  postgres   [.] heap_page_prune_opt
>   10,22%  

Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Pavel Stehule
po 2. 5. 2022 v 15:28 odesílatel Matthias van de Meent <
boekewurm+postg...@gmail.com> napsal:

> On Mon, 2 May 2022 at 11:00, Pavel Stehule 
> wrote:
> >
> > Hi
> >
> > I found a query that is significantly slower with more memory
>
> Which PostgreSQL version did you use? Did you enable assert checking?
> Do you have an example database setup to work with?
>
> > plan 2
> >  QUERY PLAN
> > 
> >  Nested Loop Anti Join  (cost=46.53..2914.58 rows=1 width=16) (actual
> time=18.306..23.065 rows=32 loops=1)
> > ...
> >  Execution Time: 451.161 ms
>
> Truly strange; especially the 418ms difference between execution time
> and the root node's "actual time". I haven't really seen such
> differences happen, except when concurrent locks were held at the
> table / index level.
>
> > plan 1 - fast https://explain.depesz.com/s/XM1f
> >
> > plan 2 - slow https://explain.depesz.com/s/2rBw
> >
> > Strange - the time of last row is +/- same, but execution time is 10x
> worse
>
> The only difference between the two plans that I see is that plan 1
> doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes
> (one of 66 misses; one of 342 misses). The only "expensive" operation
> that I see in memoize nodes is the check for memory size in
> assert-enabled builds; and that should have very low overhead
> considering that the size of the memoized data is only 8kB and 25kB
> respectively.
>

This is PostgreSQL 14 used in production environment

 (2022-05-02 15:37:29) prd_aukro=# show debug_assertions ;
┌──┐
│ debug_assertions │
├──┤
│ off  │
└──┘
(1 řádka)

Čas: 0,295 ms
(2022-05-02 15:37:35) prd_aukro=# select version();
┌┐
│version
  │
├┤
│ PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-4), 64-bit │
└┘
(1 řádka)
Čas: 0,629 ms

there is just shared buffers changed to 32GB and work_mem to 70MB.
Unfortunately - it is in production environment with customer data, so I
cannot to play too much

This is perf of slow

  25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
  11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
   5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
   5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
   5,15%  postmaster  libc-2.28.so   [.] __memset_avx2_erms
   3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
   3,07%  postmaster  postgres   [.] hash_search_with_hash_value
   2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
   2,49%  postmaster  postgres   [.] heap_page_prune_opt
   1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
   1,85%  postmaster  [kernel.kallsyms]  [k]
swapgs_restore_regs_and_return_to_usermode
   1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
   1,73%  postmaster  postgres   [.] _bt_checkkeys
   1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
   1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
   1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
   1,08%  postmaster  [kernel.kallsyms]  [k]
percpu_ref_put_many.constprop.84
   1,08%  postmaster  postgres   [.] 0x003c1be6
   1,06%  postmaster  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm.part.49
   0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
   0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
   0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
   0,61%  postmaster  postgres   [.] _bt_checkpage
   0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
   0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
   0,57%  postmaster  postgres   [.] tbm_iterate
   0,56%  postmaster  [kernel.kallsyms]  [k] __count_memcg_events.part.70
   0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
   0,52%  postmaster  postgres   [.] 0x0015f6e5
   0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
   0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
   0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
   0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page

This is fast

  21,13%  postmaster  postgres   [.] hash_search_with_hash_value
  15,33%  postmaster  postgres   [.] heap_page_prune_opt
  10,22%  postmaster  libc-2.28.so   [.] __memset_avx2_erms
  10,00%  postmaster  postgres   [.] _bt_checkkeys
   6,23%  postmaster  postgres   [.] 0x003c1be6
   4,94%  postmaster  postgres   [.] _bt_checkpage
   

Re: strange slow query - lost lot of time somewhere

2022-05-02 Thread Matthias van de Meent
On Mon, 2 May 2022 at 11:00, Pavel Stehule  wrote:
>
> Hi
>
> I found a query that is significantly slower with more memory

Which PostgreSQL version did you use? Did you enable assert checking?
Do you have an example database setup to work with?

> plan 2
>  QUERY PLAN
> 
>  Nested Loop Anti Join  (cost=46.53..2914.58 rows=1 width=16) (actual 
> time=18.306..23.065 rows=32 loops=1)
> ...
>  Execution Time: 451.161 ms

Truly strange; especially the 418ms difference between execution time
and the root node's "actual time". I haven't really seen such
differences happen, except when concurrent locks were held at the
table / index level.

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw
>
> Strange - the time of last row is +/- same, but execution time is 10x worse

The only difference between the two plans that I see is that plan 1
doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes
(one of 66 misses; one of 342 misses). The only "expensive" operation
that I see in memoize nodes is the check for memory size in
assert-enabled builds; and that should have very low overhead
considering that the size of the memoized data is only 8kB and 25kB
respectively.