Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread Pavel Stehule
pá 6. 5. 2022 v 1:19 odesílatel David Rowley  napsal:

> On Thu, 5 May 2022 at 19:26, Pavel Stehule 
> wrote:
> >
> > čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak 
> napsal:
> >> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> >> > (gdb) bt
> >> > #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> >> > #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> >> > #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> >> > #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
> >> > #4  0x00932134 in AllocSetAlloc ()
> >> > #5  0x009376cf in MemoryContextAllocExtended ()
> >> > #6  0x006ad915 in ExecInitMemoize ()
> >>
> >> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo)
> rpms / dpkgs(?) so I hope you are basically 1 command away of being able to
> debug it further what happens in ExecInitMemoize()
> >> Those packages seem to be safe as they modify only /usr/lib/debug so
> should not have any impact on production workload.
> >
> > I just have to wait for admin action - I have no root rights for the
> server.
>
> Looking at ExecInitMemoize() it's hard to see what could take such a
> long time other than the build_hash_table().  Tom did mention this,
> but I can't quite see how the size given to that function could be
> larger than 91 in your case.
>
> If you get the debug symbols installed, can you use gdb to
>
> break nodeMemoize.c:268
> p size
>
> maybe there's something I'm missing following the code and maybe there
> is some way that est_entries is not set to what I thought it was.
>
> It would also be good to see the same perf report again after the
> debug symbols are installed in order to resolve those unresolved
> function names.
>

Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at
nodeMemoize.c:268
268 if (size == 0)
(gdb) p size
$1 = 4369066

This is work_mem size

+   99,92% 0,00%  postmaster  postgres   [.] ServerLoop
 ▒
+   99,92% 0,00%  postmaster  postgres   [.] PostgresMain
 ▒
+   99,92% 0,00%  postmaster  postgres   [.]
exec_simple_query
▒
+   99,70% 0,00%  postmaster  postgres   [.] PortalRun
  ▒
+   99,70% 0,00%  postmaster  postgres   [.]
FillPortalStore
▒
+   99,70% 0,02%  postmaster  postgres   [.]
PortalRunUtility
 ▒
+   99,68% 0,00%  postmaster  pg_stat_statements.so  [.]
0x7f5579b599c6
 ▒
+   99,68% 0,00%  postmaster  postgres   [.]
standard_ProcessUtility
▒
+   99,68% 0,00%  postmaster  postgres   [.] ExplainQuery
 ◆
+   99,63% 0,00%  postmaster  postgres   [.]
ExplainOneQuery
▒
+   99,16% 0,00%  postmaster  postgres   [.] ExplainOnePlan
 ▒
+   99,06% 0,00%  postmaster  pg_stat_statements.so  [.]
0x7f5579b5ad2c
 ▒
+   99,06% 0,00%  postmaster  postgres   [.]
standard_ExecutorStart
 ▒
+   99,06% 0,00%  postmaster  postgres   [.] InitPlan
(inlined)  ▒
+   99,06% 0,00%  postmaster  postgres   [.] ExecInitNode
 ▒
+   99,06% 0,00%  postmaster  postgres   [.]
ExecInitNestLoop
 ▒
+   99,00% 0,02%  postmaster  postgres   [.]
ExecInitMemoize
▒
+   98,87%26,80%  postmaster  libc-2.28.so   [.]
__memset_avx2_erms
 ▒
+   98,87% 0,00%  postmaster  postgres   [.]
build_hash_table (inlined)
 ▒
+   98,87% 0,00%  postmaster  postgres   [.] memoize_create
(inlined)▒
+   98,87% 0,00%  postmaster  postgres   [.]
memoize_allocate (inlined)
 ▒
+   98,87% 0,00%  postmaster  postgres   [.]
MemoryContextAllocExtended
 ▒
+   72,08%72,08%  postmaster  [unknown]  [k]
0xbaa010e0
 0,47% 0,00%  postmaster  postgres   [.] pg_plan_query
 0,47% 0,00%  postmaster  pg_stat_statements.so  [.]
0x7f5579b59ba4
 0,47% 0,00%  postmaster  postgres   [.]
standard_planner
 0,47% 0,00%  postmaster  postgres   [.]
subquery_planner
 0,47% 0,00%  postmaster  postgres   [.]
grouping_planner
 0,47% 0,00%  postmaster  postgres   [.] query_planner


>
> David
>


Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread Pavel Stehule
pá 6. 5. 2022 v 1:28 odesílatel David G. Johnston <
david.g.johns...@gmail.com> napsal:

> On Mon, May 2, 2022 at 10:02 PM Pavel Stehule 
> wrote:
>
>>
>>
>> út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:
>>
>>> Pavel Stehule  writes:
>>> > there is really something strange (see attached file). Looks so this
>>> issue
>>> > is much more related to planning time than execution time
>>>
>>> You sure there's not something taking an exclusive lock on one of these
>>> tables every so often?
>>>
>>
>> I am almost sure, I can see this issue only every time when I set a
>> higher work mem. I don't see this issue in other cases.
>>
>>>
>>>
> What are the values of work_mem and hash_mem_multiplier for the two cases?
>

 (2022-05-06 07:35:21) prd_aukro=# show work_mem ;
┌──┐
│ work_mem │
├──┤
│ 400MB│
└──┘
(1 řádka)

Čas: 0,331 ms
(2022-05-06 07:35:32) prd_aukro=# show hash_mem_multiplier ;
┌─┐
│ hash_mem_multiplier │
├─┤
│ 1   │
└─┘
(1 řádka)


> David J.
>


Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Tom Lane
Andres Freund  writes:
> On 2022-05-05 23:36:22 -0400, Tom Lane wrote:
>> So I reluctantly vote for removing 031_recovery_conflict.pl in the
>> back branches for now, with the expectation that we'll fix the
>> infrastructure and put it back after the current release round
>> is done.

> What about instead marking the flapping test TODO? That'd still give us most
> of the coverage...

Are you sure there's just one test that's failing?  I haven't checked
the buildfarm history close enough to be sure of that.  But if it's
true, disabling just that one would be fine (again, as a stopgap
measure).

regards, tom lane




Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Andres Freund
Hi,

On 2022-05-05 23:36:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-05-05 22:07:40 -0400, Tom Lane wrote:
> >> May I ask where we're at on this?  Next week's back-branch release is
> >> getting uncomfortably close, and I'm still seeing various buildfarm
> >> animals erratically failing on 031_recovery_conflict.pl.
> 
> > Looks like the problems are gone on HEAD at least.
> 
> It does look that way, although the number of successes is not large yet.
> 
> >> Should we just remove that test from the back branches for now?
> 
> > That might be the best course, marking the test as TODO perhaps?
> 
> I poked closer and saw that you reverted 5136967f1 et al because
> (I suppose) adjust_conf is not there in the back branches.

Yea. That one was a stupid mistake, working outside my usual environment. It's
easy enough to work around adjust_conf not existing (just appending works),
but then there's subsequent test failures...


> So I reluctantly vote for removing 031_recovery_conflict.pl in the
> back branches for now, with the expectation that we'll fix the
> infrastructure and put it back after the current release round
> is done.

What about instead marking the flapping test TODO? That'd still give us most
of the coverage...

Greetings,

Andres Freund




Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Tom Lane
Andres Freund  writes:
> On 2022-05-05 22:07:40 -0400, Tom Lane wrote:
>> May I ask where we're at on this?  Next week's back-branch release is
>> getting uncomfortably close, and I'm still seeing various buildfarm
>> animals erratically failing on 031_recovery_conflict.pl.

> Looks like the problems are gone on HEAD at least.

It does look that way, although the number of successes is not large yet.

>> Should we just remove that test from the back branches for now?

> That might be the best course, marking the test as TODO perhaps?

I poked closer and saw that you reverted 5136967f1 et al because
(I suppose) adjust_conf is not there in the back branches.  While
I'd certainly support back-patching that functionality, I think
we need to have a discussion about how to do it.  I wonder whether
we shouldn't drop src/test/perl/PostgreSQL/... into the back branches
in toto and make the old test APIs into a wrapper around the new ones
instead of vice versa.  But that's definitely not a task to undertake
three days before a release deadline.

So I reluctantly vote for removing 031_recovery_conflict.pl in the
back branches for now, with the expectation that we'll fix the
infrastructure and put it back after the current release round
is done.

regards, tom lane




RE: bogus: logical replication rows/cols combinations

2022-05-05 Thread houzj.f...@fujitsu.com
On Tuesday, May 3, 2022 11:31 AM Amit Kapila  wrote:
> 
> On Tue, May 3, 2022 at 12:10 AM Tomas Vondra
>  wrote:
> >
> > On 5/2/22 19:51, Alvaro Herrera wrote:
> > >> Why would we need to know publications replicated by other
> walsenders?
> > >> And what if the subscriber is not connected at the moment? In that case
> > >> there'll be no walsender.
> > >
> > > Sure, if the replica is not connected then there's no issue -- as you
> > > say, that replica will fail at START_REPLICATION time.
> > >
> >
> > Right, I got confused a bit.
> >
> > Anyway, I think the main challenge is defining what exactly we want to
> > check, in order to ensure "sensible" behavior, without preventing way
> > too many sensible use cases.
> >
> 
> I could think of below two options:
> 1. Forbid any case where column list is different for the same table
> when combining publications.
> 2. Forbid if the column list and row filters for a table are different
> in the set of publications we are planning to combine. This means we
> will allow combining column lists when row filters are not present or
> when column list is the same (we don't get anything additional by
> combining but the idea is we won't forbid such cases) and row filters
> are different.
> 
> Now, I think the points in favor of (1) are that the main purpose of
> introducing a column list are: (a) the structure/schema of the
> subscriber is different from the publisher, (b) want to hide sensitive
> columns data. In both cases, it should be fine if we follow (1) and
> from Peter E.'s latest email [1] he also seems to be indicating the
> same. If we want to be slightly more relaxed then we can probably (2).
> We can decide on something else as well but I feel it should be such
> that it is easy to explain.

I also think it makes sense to add a restriction like (1). I am planning to
implement the restriction if no one objects.

Best regards,
Hou zj


Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Andres Freund
Hi,

On 2022-05-05 22:07:40 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Attached is a fix for the test that I think should avoid the problem. 
> > Couldn't
> > repro it with it applied, under both rr and valgrind.
> 
> May I ask where we're at on this?  Next week's back-branch release is
> getting uncomfortably close, and I'm still seeing various buildfarm
> animals erratically failing on 031_recovery_conflict.pl.

Yea, sorry. I had crappy internet connectivity / device access for the last
few days, making it hard to make progress.

Looks like the problems are gone on HEAD at least.


> Should we just remove that test from the back branches for now?

That might be the best course, marking the test as TODO perhaps?

Unfortunately a pg_ctl reload isn't processed reliably by the time the next
test steps execute (saw that once when running in a loop), and a restart
causes other problems (throws stats away).


> Also, it appears that the back-patch of pump_until failed to remove
> some pre-existing copies, eg check-world in v14 now reports

> Subroutine pump_until redefined at t/013_crash_restart.pl line 248.
> Subroutine pump_until redefined at t/022_crash_temp_files.pl line 272.
> 
> I didn't check whether these are exact duplicates.

They're not quite identical copies, which is why I left them in-place. But the
warnings clearly make that a bad idea. I somehow mis-extrapolated from
Cluster.pm, where it's not a problem (accessed via object).  I'll remove them.

Greetings,

Andres Freund




Re: Skipping schema changes in publication

2022-05-05 Thread Peter Smith
On Thu, Apr 28, 2022 at 9:32 PM Amit Kapila  wrote:
>
...
>
> Another idea that occurred to me today for tables this is as follows:
> 1. Allow to mention except during create publication ... For All Tables.
> CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE t1,t2;
> 2. Allow to Reset it. This new syntax will reset all objects in the
> publications.
> Alter Publication ... RESET;
> 3. Allow to add it to an existing publication
> Alter Publication ... Add ALL TABLES [EXCEPT TABLE t1,t2];
>
> I think it can be extended in a similar way for schema syntax as well.
>

If the proposed syntax ALTER PUBLICATION ... RESET will reset all the
objects in the publication then there still seems simple way to remove
only the EXCEPT list but leave everything else intact. IIUC to clear
just the EXCEPT list would require a 2 step process - 1) ALTER ...
RESET then 2) ALTER ... ADD ALL TABLES again.

I was wondering if it might be useful to have a variation that *only*
resets the EXCEPT list, but still leaves everything else as-is?

So, instead of:
ALTER PUBLICATION pubname RESET

use a syntax something like:
ALTER PUBLICATION pubname RESET {ALL | EXCEPT}
or
ALTER PUBLICATION pubname RESET [EXCEPT]

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [EXTERNAL] Re: pg_stat_statements

2022-05-05 Thread Julien Rouhaud
On Thu, May 05, 2022 at 12:21:41PM +, Godfrin, Philippe E wrote:
>
> Thanks very much for looking closely at this. To answer your questions:
> I misspoke the query file at the time of the queries above was around 1GB.

Ah, that's clearly big enough to lead to some slowdown.

> I don't believe I am short on RAM, although I will re-examine that aspect. 
> I'm running 32GB
> with a 22GB shared pool, which seems OK to me. The disk are SSD (AWS EBS) and
> the disk volumes are the same as the data volumes. If a regular file on disk 
> at 1GB
> took 6 seconds to read, the rest of the system would be in serious 
> degradation.

I don't know what you mean by shared pool, and you also didn't give any kind of
information about your postgres usage, workload, number of connections or
anything so it's impossible to know.  Note that if your system is quite busy
you could definitely have some IO saturation, especially if that file is
discarded from OS cache, so I wouldn't blindly rule that possibility out.  I
suggested multiple ways to try to figure out if that's the problem though, so
having such answer would be better than guessing if IO or the "AWS EBS" (which
I also don't know anything about) is a problem or not.

> The impact on running queries was observed when the max was set at 1000. I 
> don't
> quite understand what you keep saying about evictions and other things 
> relative to the
> pgss file. Can you refer me to some detailed documentation or a good article 
> which
> describes the processes you're alluding to?

I don't think there's any thorough documentation or article explaining how
pg_stat_statements works internally.  But you have a maximum number of
different (identified by userid, dbid, queryid) entries stored, so if your
workload leads to more entries than the max then pg_stat_statements will have
to evict the least used ones to store the new one, and that process is costly
and done using some exclusive lwlock.  You didn't say which version of postgres
you're using, but one thin you can do to see if you probably have eviction is
to check the number of rows in pg_stat_statements view.  If the number changes
very often and is always close to pg_stat_statements.max then you probably have
frequent evictions.

> Insofar as querying the stats table every 10 seconds, I think that's not 
> aggressive enough as
> I want to have statement monitoring as close to realtime as possible.

What problem are you trying to solve?  Why aren't you using pg_stat_activity if
you want realtime overview of what is happening?

> You are indeed correct insofar as unknowns - the biggest one for me is I 
> don't know
> enough about how the stats extension works - as I asked before, more detail 
> on the
> internals of the extension would be useful. Is my only choice in that regard 
> to browse
> the source code?

I think so.

> Short of running the profile that should deal with the unknowns, any other 
> ideas?

Do you mean using perf, per the "profiling with perf" wiki article?  Other than
that I suggested other ways to try to narrow down the problem, what was the
outcome for those?




Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Tom Lane
Andres Freund  writes:
> Attached is a fix for the test that I think should avoid the problem. Couldn't
> repro it with it applied, under both rr and valgrind.

May I ask where we're at on this?  Next week's back-branch release is
getting uncomfortably close, and I'm still seeing various buildfarm
animals erratically failing on 031_recovery_conflict.pl.  Should we
just remove that test from the back branches for now?

Also, it appears that the back-patch of pump_until failed to remove
some pre-existing copies, eg check-world in v14 now reports

Subroutine pump_until redefined at t/013_crash_restart.pl line 248.
Subroutine pump_until redefined at t/022_crash_temp_files.pl line 272.

I didn't check whether these are exact duplicates.

regards, tom lane




Re: Logical replication timeout problem

2022-05-05 Thread Masahiko Sawada
On Wed, May 4, 2022 at 7:18 PM Amit Kapila  wrote:
>
> On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada  wrote:
> >
> > On Mon, May 2, 2022 at 11:32 AM Amit Kapila  wrote:
> > >
> > >
> > > So, shall we go back to the previous approach of using a separate
> > > function update_replication_progress?
> >
> > Ok, agreed.
> >
>
> Attached, please find the updated patch accordingly. Currently, I have
> prepared it for HEAD, if you don't see any problem with this, we can
> prepare the back-branch patches based on this.

Thank you for updating the patch. Looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread David G. Johnston
On Mon, May 2, 2022 at 10:02 PM Pavel Stehule 
wrote:

>
>
> út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:
>
>> Pavel Stehule  writes:
>> > there is really something strange (see attached file). Looks so this
>> issue
>> > is much more related to planning time than execution time
>>
>> You sure there's not something taking an exclusive lock on one of these
>> tables every so often?
>>
>
> I am almost sure, I can see this issue only every time when I set a higher
> work mem. I don't see this issue in other cases.
>
>>
>>
What are the values of work_mem and hash_mem_multiplier for the two cases?

David J.


Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread David Rowley
On Thu, 5 May 2022 at 19:26, Pavel Stehule  wrote:
>
> čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak  napsal:
>> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > (gdb) bt
>> > #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
>> > #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
>> > #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
>> > #4  0x00932134 in AllocSetAlloc ()
>> > #5  0x009376cf in MemoryContextAllocExtended ()
>> > #6  0x006ad915 in ExecInitMemoize ()
>>
>> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / 
>> dpkgs(?) so I hope you are basically 1 command away of being able to debug 
>> it further what happens in ExecInitMemoize()
>> Those packages seem to be safe as they modify only /usr/lib/debug so should 
>> not have any impact on production workload.
>
> I just have to wait for admin action - I have no root rights for the server.

Looking at ExecInitMemoize() it's hard to see what could take such a
long time other than the build_hash_table().  Tom did mention this,
but I can't quite see how the size given to that function could be
larger than 91 in your case.

If you get the debug symbols installed, can you use gdb to

break nodeMemoize.c:268
p size

maybe there's something I'm missing following the code and maybe there
is some way that est_entries is not set to what I thought it was.

It would also be good to see the same perf report again after the
debug symbols are installed in order to resolve those unresolved
function names.

David




First-draft release notes for next week's minor releases

2022-05-05 Thread Tom Lane
I've completed the first draft of the 14.3 release notes, see

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=66ca1427a4963012fd565b922d0a67a8a8930d1f

As usual, please send comments/corrections by Sunday.

regards, tom lane




Re: Re: Add --{no-,}bypassrls flags to createuser

2022-05-05 Thread Przemysław Sztoch

Dear Shinya,

Too bad there's no --comment parameter to do COMMENT ON ROLE name IS 
'Comment';


As you already make such changes in createuser, I would like to ask for 
an additional --comment parameter
that will allow sysadmins to set a comment with additional information 
about the new DB user.

psql is scary for some. :-)

Overall a very useful patch. I needed bypassrls several times recently.

Shinya Kato wrote on 4/28/2022 8:06 AM:

Thank you for the reviews!

On 2022-04-26 05:19, Nathan Bossart wrote:

-    printf(_("  -g, --role=ROLE   new role will be a member 
of this role\n"));
+    printf(_("  -g, --role=ROLE    new role will be a member of 
this role\n"));

This looks lik an unexpected change.


I fixed it.



I'm ok with -m/--member as well (like with --role only one role can be
specified per switch instance so member, not membership, the later 
meaning,

at least for me, the collective).

That -m doesn't match --role-to is no worse than -g not matching 
--role, a
short option seems worthwhile, and the -m (membership) mnemonic 
should be

simple to pick-up.

I don't see the addition of "-name" to the option name being 
beneficial.


Yes, the standard doesn't use the "TO" prefix for "ROLE" - but 
taking that
liberty for consistency here is very appealing and there isn't 
another SQL

clause that it would be confused with.


+1 for "member".  It might not be perfect, but IMO it's the clearest
option.


Thanks! I changed the option "--membership" to "--member".


For now, I also think "-m / --member" is the best choice, although it 
is ambiguous:(

I'd like to hear others' opinions.

regards


--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-05-05 Thread Przemysław Sztoch

Tom Lane wrote on 5/4/2022 5:32 PM:

Peter Eisentraut  writes:

On 28.04.22 18:50, Przemysław Sztoch wrote:

Current unnaccent dictionary does not include many popular numeric symbols,
in example: "m²" -> "m2"

Seems reasonable.

It kinda feels like this is outside the charter of an "unaccent"
dictionary.  I don't object to having these conversions available
but it seems like it ought to be a separate feature.

regards, tom lane
Tom, I disagree with you because many similar numerical conversions are 
already taking place, e.g. 1/2, 1/4...


Today Unicode is ubiquitous and we use a lot more weird characters.
I just completed these less common characters.

Therefore, the problem of missing characters in unaccent.rules affects 
the correct operation of the FTS mechanisms.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-05-05 Thread Przemysław Sztoch

Peter Eisentraut wrote on 5/4/2022 5:17 PM:

On 28.04.22 18:50, Przemysław Sztoch wrote:
Current unnaccent dictionary does not include many popular numeric 
symbols,

in example: "m²" -> "m2"

Seems reasonable.

Can you explain what your patch does to achieve this?

I used an existing python implementation of the generator.
It is based on ready-made unicode dictionary: 
src/common/unicode/UnicodeData.txt.

The current generator was filtering UnicodeData.txt too much.
I relaxed these conditions, because the previous implementation focused 
only on selected character types.


Browsing the unaccent.rules file is the easiest way to see how many and 
what missing characters have been completed.


For FTS, the addition of these characters is very much needed.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: Add index scan progress to pg_stat_progress_vacuum

2022-05-05 Thread Imseih (AWS), Sami
Thank you for the feedback!

>I think we can pass the progress update function to
>   WaitForParallelWorkersToFinish(), which seems simpler. And we can call

Directly passing the callback to WaitForParallelWorkersToFinish
will require us to modify the function signature.

To me, it seemed simpler and touches less code to have
the caller set the callback in the ParallelContext.

>the function after updating the index status to
>PARALLEL_INDVAC_STATUS_COMPLETED.

I also like this better. Will make the change.

>BTW, currently we don't need a lock for touching index status since
>each worker touches different indexes. But after this patch, the
>leader will touch all index status, do we need a lock for that?

I do not think locking is needed here. The leader and workers
will continue to touch different indexes to update the status.

However, if the process is a leader, it will call the function
which will go through indstats and count how many
Indexes have a status of PARALLEL_INDVAC_STATUS_COMPLETED.
This value is then reported to the leaders backend only.


Regards,

Sami Imseih
Amazon Web Services



[PATCH] Add initial xid/mxid/mxoff to initdb

2022-05-05 Thread Maxim Orlov
Hi!

During work on 64-bit XID patch [1] we found handy to have initdb options
to set initial xid/mxid/mxoff values to arbitrary non default values. It
helps test different scenarios: related to wraparound, pg_upgrade from
32-bit XID to 64-bit XID, etc.

We realize, this patch can be singled out as an independent patch from the
whole patchset in [1] and be useful irrespective of 64-bit XID in cases of
testing of wraparound and so on.

In particular, we employed this patch to test recent changes in logical
replication of subxacts [2] and found no problems in it near the point of
publisher wraparound.

Please share your opinions and reviews are always welcome.

[1]
https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
[2]
https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df...@enterprisedb.com

-- 
Best regards,
Maxim Orlov.
From c752471dab910b3ddfed5b0f3b59e7314165a193 Mon Sep 17 00:00:00 2001
From: Maxim Orlov 
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v1] Add initdb option to initialize cluster with non-standard
 xid/mxid/mxoff.

To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.

Author: Maxim Orlov 
Author: Pavel Borisov 
---
 src/backend/access/transam/clog.c  |   9 +++
 src/backend/access/transam/multixact.c |  35 
 src/backend/access/transam/xlog.c  |  15 ++--
 src/backend/bootstrap/bootstrap.c  |  56 -
 src/backend/main/main.c|   6 ++
 src/backend/postmaster/postmaster.c|  15 +++-
 src/backend/tcop/postgres.c|  59 +-
 src/bin/initdb/initdb.c| 107 -
 src/bin/initdb/t/001_initdb.pl |  86 
 src/include/access/xlog.h  |   3 +
 src/include/catalog/pg_class.h |   2 +-
 11 files changed, 380 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d973a2159d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -713,6 +713,7 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	int			pageno;
 
 	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
 
@@ -723,6 +724,14 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
+	pageno = TransactionIdToPage(XidFromFullTransactionId(ShmemVariableCache->nextXid));
+	if (pageno != 0)
+	{
+		slotno = ZeroCLOGPage(pageno, false);
+		SimpleLruWritePage(XactCtl, slotno);
+		Assert(!XactCtl->shared->page_dirty[slotno]);
+	}
+
 	LWLockRelease(XactSLRULock);
 }
 
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8f7d12950e..00acb955e0 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1894,6 +1894,7 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	int			pageno;
 
 	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
@@ -1904,6 +1905,17 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
+	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	if (pageno != 0)
+	{
+		/* Create and zero the first page of the offsets log */
+		slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+		/* Make sure it's written out */
+		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+	}
+
 	LWLockRelease(MultiXactOffsetSLRULock);
 
 	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
@@ -1915,7 +1927,30 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
+	pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+	if (pageno != 0)
+	{
+		/* Create and zero the first page of the members log */
+		slotno = ZeroMultiXactMemberPage(pageno, false);
+
+		/* Make sure it's written out */
+		SimpleLruWritePage(MultiXactMemberCtl, slotno);
+		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+	}
+
 	LWLockRelease(MultiXactMemberSLRULock);
+
+	/*
+	 * If we're starting not from zero offset, initilize dummy multixact to
+	 * evade too long loop in PerformMembersTruncation().
+	 */
+	if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+	{
+		RecordNewMultiXact(FirstMultiXactId,
+		   MultiXactState->nextOffset, 0, NULL);
+		RecordNewMultiXact(MultiXactState->nextMXact,
+		   MultiXactState->nextOffset, 0, NULL);
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61cda56c6f..7768a3835d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -137,6 +137,10 @@ int			max_slot_wal_keep_size_mb = -1;
 

Re: [PATCH] Log details for client certificate failures

2022-05-05 Thread Jacob Champion
On Wed, 2022-05-04 at 15:53 +0200, Peter Eisentraut wrote:
> Just saying that cutting it off appears to be acceptable.  A bit more
> than 63 bytes should be okay for the log.

Gotcha.

> In terms of aligning what is printed, I meant that pg_stat_ssl uses the
> issuer plus serial number to identify the certificate unambiguously.

Oh, that's a great idea. I'll do that too.

Thanks!
--Jacob


RE: [EXTERNAL] Re: pg_stat_statements

2022-05-05 Thread Godfrin, Philippe E


From: Julien Rouhaud 
Sent: Wednesday, May 4, 2022 10:24 PM
To: Godfrin, Philippe E 
Cc: pgsql-hackers@lists.postgresql.org
Subject: [EXTERNAL] Re: pg_stat_statements

Hi,

On Tue, May 03, 2022 at 01:30:32PM +, Godfrin, Philippe E wrote:
>
> I wasn't exactly clear about the queries. The values clauses themselves are 
> not long -
> We are using repeated values clauses:
>
> INSERT INTO timeseries.dvc_104 (tag_id, event_ts, bool_val, float_val, 
> int_val, string_val, last_updt_id)
> VALUES 
> ($1,$2,$3,$4,$5,$6,$7),($8,$9,$10,$11,$12,$13,$14),($15,$16,$17,$18,$19,$20,$21),
> ($22,$23,$24,$25,$26,$27,$28),($29,$30,$31,$32,$33,$34,$35),($36,$37,$38,$39,$40,$41,$42),
> ($43,$44,$45,$46,$47,$48,$49),($50,$51,$52,$53,$54,$55,$56),($57,$58,$59,$60,$61,$62,$63),
> ($64,$65,$66,$67,$68,$69,$70),($71,$72,$73,$74,$75,$76,$77),($78,$79,$80,$81,$82,$83,$84),
> ($85,$86,$87,$88,$89,$90,$91),($92,$93,$94,$95,$96,$97,$98)
>
> This one's not long, but some 'load statements' have 10,000 values clauses, 
> others add up to 10,000 more
> in an ON CONFLICT clause. I've checked the external Query file and it's 
> currently not large
> at all. But I will keep an eye on that. When I had The settings at 1000 
> statements
> the file was indeed over 1GB. For the record, development is reducing those 
> statement
> lengths.
> [...]
> The first observation is how long a simple query took:
>
> # select count(*) from pg_stat_statements;
> count
> ---
> 971
> Time: 6457.985 ms (00:06.458)
>
> MORE than six seconds for a mere 971 rows! Furthermore, when removing the 
> long queries:
> # select count(*) from pg_stat_statements(showtext:=false);
> count
> ---
> 970
> Time: 10.644 ms
>
> Only 10ms...

Well, 10ms is still quite slow.

You're not removing the long queries texts, you're removing all queries texts.
I don't know if the overhead comes from processing at least some long
statements or is mostly due to having to retrieve the query file. Do you get
the same times if you run the query twice? Maybe you're short on RAM and have
somewhat slow disks, and the text file has to be read from disk rather than OS
cache?

Also I don't know what you mean by "not large at all", so it's hard to compare
or try to reproduce. FWIW on some instance I have around, I have a 140kB file
and querying pg_stat_statements *with* the query text file only takes a few ms.

You could try to query that view with some unprivileged user. This way you
will still retrieve the query text file but will only emit "" rather than processing the query texts, this may narrow down the
problem. Or better, if you could run perf [1] to see where the overhead really
is.

> Second, we have Datadog installed. Datadoq queries the pg_stat_statements 
> table
> every 10 seconds. The real pain point is querying the pg_stat_statements seems
> to have an impact on running queries, specifically inserts in my case.

I think this is a side effect of having a very low pg_stat_statements.max, if
of course you have more queries than the current value.

If the extra time is due to loading the query text file and if it's loaded
after acquiring the lightweight lock, then you will prevent evicting or
creating new entries for a long time, which means that the query execution for
those queries will be blocked until the query on pg_stat_statements ends.

There are unfortunately *a lot* of unknowns here, so I can't do anything apart
from guessing.

> I believe this is an actual impact that needs a solution.

First, if you have an OLTP workload you have to make sure that
pg_stat_statements.max is high enough so that you don't have to evict entries,
or at least not often. Then, I think that querying pg_stat_statements every
10s is *really* aggressive, that's always going to have some noticeable
overhead. For the rest, we need more information to understand where the
slowdown is coming from.

[1] 
https://wiki.postgresql.org/wiki/Profiling_with_perf

Hello Julien,
Thanks very much for looking closely at this. To answer your questions:
I misspoke the query file at the time of the queries above was around 1GB.

I don't believe I am short on RAM, although I will re-examine that aspect. I'm 
running 32GB
with a 22GB shared pool, which seems OK to me. The disk are SSD (AWS EBS) and
the disk volumes are the same as the data volumes. If a regular file on disk at 
1GB
took 6 seconds to read, the rest of the system would be in serious degradation.

The impact on running queries was observed when the max was set at 1000. I don't
quite understand what you keep saying about evictions and other things relative 
to the
pgss file. Can you refer me to some detailed documentation or a good article 
which
describes the processes you're alluding to?

Insofar as querying the stats table every 10 seconds, I think that's not 
aggressive enough as
I want to have statement monitoring as close to realtime as possible.

You are indeed correct insofar as unknowns 

Re: Handle infinite recursion in logical replication setup

2022-05-05 Thread vignesh C
On Thu, May 5, 2022 at 2:41 PM Alvaro Herrera  wrote:
>
> On 2022-May-04, vignesh C wrote:
>
> > On Mon, May 2, 2022 at 5:49 AM Peter Smith  wrote:
>
> > > 1. Commit message
> > >
> > > In another thread using the terminology "multi-master" seems to be
> > > causing some problems. Maybe you should choose different wording in
> > > this commit message to avoid having the same problems?
> >
> > I felt, we can leave this to committer
>
> Please don't.  See
> https://postgr.es/m/20200615182235.x7lch5n6kcjq4...@alap3.anarazel.de

In this thread changing multimaster to multi-source did not get agreed
and the changes did not get committed. The code still uses multimaster
as in [1]. I think we should keep it as "multimaster" as that is
already being used in [1].

[1] - 
https://www.postgresql.org/docs/current/different-replication-solutions.html

Regards,
Vignesh




Re: avoid multiple hard links to same WAL file after a crash

2022-05-05 Thread Michael Paquier
On Mon, May 02, 2022 at 04:06:13PM -0700, Nathan Bossart wrote:
> Here is a new patch set.  For now, I've only removed the file existence
> check in writeTimeLineHistoryFile().  I don't know if I'm totally convinced
> that there isn't a problem here (e.g., due to concurrent .ready file
> creation), but since some platforms have been using rename() for some time,
> I don't know how worried we should be.

That's only about Windows these days, meaning that there is much less
coverage in this code path.

> I thought about adding some kind of
> locking between the WAL receiver and startup processes, but that seems
> excessive.

Agreed.

> Alternatively, we could just fix xlog.c as proposed earlier
> [0].  AFAICT that is the only caller that can experience problems due to
> the multiple-hard-link issue.  All other callers are simply renaming a
> temporary file into place, and the temporary file can be discarded if left
> behind after a crash.

I'd agree with removing all the callers at the end.  pgrename() is
quite robust on Windows, but I'd keep the two checks in
writeTimeLineHistory(), as the logic around findNewestTimeLine() would
consider a past TLI history file as in-use even if we have a crash
just after the file got created in the same path by the same standby,
and the WAL segment init part.  Your patch does that.
--
Michael


signature.asc
Description: PGP signature


Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2022-05-05 Thread Bharath Rupireddy
On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera  wrote:
>
> If I read the code right, this patch emits logs when
> pg_logical_slot_get_changes and pg_replication_slot_advance SQL
> functions are called.  Is this desirable/useful, for the use case you
> stated at start of thread?  I think it is most likely pointless.  If you
> get rid of those, then the only acquisitions that would log messages are
> those in StartReplication and StartLogicalReplication.  So I wonder if
> it would be better to leave the API of ReplicationSlotAcquire() alone,
> and instead make StartReplication and StartLogicalReplication
> responsible for those messages.
>
> I didn't look at the release-side messages you're adding, but I suppose
> it should be symmetrical.

Adding the messages right after ReplicationSlotAcquire in the
StartReplication and StartLogicalReplication looks okay, but, if we
just add "released replication slot \"%s\" for logical/physical
replication". in StartReplication and StartLogicalReplication right
after ReplicationSlotRelease, how about ReplicationSlotRelease in
WalSndErrorCleanup and ReplicationSlotShmemExit?

The whole idea is to get to know when and how much time a slot was
inactive/unused when log_replication_commands is set to true.

We can think of adding a timestamp column to on-disk replication slot
data but that will be too much.

Regards,
Bharath Rupireddy.




Re: Handle infinite recursion in logical replication setup

2022-05-05 Thread Alvaro Herrera
On 2022-May-04, vignesh C wrote:

> On Mon, May 2, 2022 at 5:49 AM Peter Smith  wrote:

> > 1. Commit message
> >
> > In another thread using the terminology "multi-master" seems to be
> > causing some problems. Maybe you should choose different wording in
> > this commit message to avoid having the same problems?
> 
> I felt, we can leave this to committer

Please don't.  See
https://postgr.es/m/20200615182235.x7lch5n6kcjq4...@alap3.anarazel.de

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

2022-05-05 Thread Alvaro Herrera
On 2022-May-05, Bharath Rupireddy wrote:

> On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera  
> wrote:
> >
> > Did we ever consider the idea of using a new pg_stat_wal_activity_progress
> > view or something like that, using the backend_progress.c functionality?
> > I don't see it mentioned in the thread.
> 
> IMO, progress reporting works well on a running server and at the
> moment. The WAL recovery/replay can happen even before the server
> opens up for connections

It's definitely true that you wouldn't be able to use it for when the
server is not accepting connections.

> and the progress report view can't be used
> for later analysis like how much time the restoring WAL files from
> archive location took

This is true too -- progress report doesn't store historical data, only
current status.

> and also the WAL file names can't be reported in progress reporting
> mechanism

Also true.

> (only integers columns, of course if required we can add text columns
> to pg_stat_get_progress_info).

Yeah, I don't think adding text columns is terribly easy, because the
whole point of the progress reporting infrastructure is that it can be
updated very cheaply as atomic operations, and if you want to transmit
text columns, that's no longer possible.

> Having the recovery info in server logs might help.

I suppose it might.

> I think reporting a long-running file processing operation (removing
> or syncing) within postgres is a generic problem (for snapshot,
> mapping, temporary (pgsql_tmp), temp relation files, old WAL file
> processing, WAL file processing during recovery etc.) and needs to be
> solved

I agree up to here.

> in two ways: 1) logging progress into server logs (which helps
> for analysis and report when the server isn't available for
> connections, crash recovery), a generic GUC
> log_file_processing_traffic = {none, medium, high} might help here
> (also proposed in [1]) and 2) pg_stat_file_processing_progress
> (extending progress reporting pg_stat_get_progress_info to have few
> text columns for current file name and directory path).

I think using the server log to store telemetry data is not a great fit.
It bloats the log files and can be so slow as to block other operations
in the server.  Server logs should normally be considered critical info
that's not okay to lose; telemetry tends to be of secondary importance
and in a pinch you can drop a few messages without hurting too much.

We've done moderately okay so far with having some system views where
some telemetry readings can be obtained, but there several drawbacks to
that approach that we should at some point solve.  My opinion on this is
that we need to bite the bullet and develop separate infrastructure for
reporting server metrics.

That said, I'm not opposed to having a patch somewhat as posted.  I just
think that we should look into a new mechanism going forward.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

2022-05-05 Thread Bharath Rupireddy
On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera  wrote:
>
> Did we ever consider the idea of using a new pg_stat_wal_activity_progress
> view or something like that, using the backend_progress.c functionality?
> I don't see it mentioned in the thread.

IMO, progress reporting works well on a running server and at the
moment. The WAL recovery/replay can happen even before the server
opens up for connections and the progress report view can't be used
for later analysis like how much time the restoring WAL files from
archive location took and also the WAL file names can't be reported in
progress reporting mechanism (only integers columns, of course if
required we can add text columns to pg_stat_get_progress_info). Having
the recovery info in server logs might help.

> It's not an *exact* fit, since this is not about one "command" being
> executed by a "backend", but it seems a near enough fit, and it would
> unobtrusive enough that we don't need to worry about the progress-update
> rate or have a GUC to determine how frequently to update (with a gauge
> that's heavily workload-dependant and is likely to make little sense to
> some users -- consider differently-sized WAL segments, for one thing).

I think reporting a long-running file processing operation (removing
or syncing) within postgres is a generic problem (for snapshot,
mapping, temporary (pgsql_tmp), temp relation files, old WAL file
processing, WAL file processing during recovery etc.) and needs to be
solved in two ways: 1) logging progress into server logs (which helps
for analysis and report when the server isn't available for
connections, crash recovery), a generic GUC
log_file_processing_traffic = {none, medium, high} might help here
(also proposed in [1]) and 2) pg_stat_file_processing_progress
(extending progress reporting pg_stat_get_progress_info to have few
text columns for current file name and directory path).

Thoughts?

Regards,
Bharath Rupireddy.




Re: Costing elided SubqueryScans more nearly correctly

2022-05-05 Thread Richard Guo
On Thu, May 5, 2022 at 7:03 AM Tom Lane  wrote:

> I wrote:
> > I instrumented the code in setrefs.c, and found that during the
> > core regression tests this patch estimates correctly in 2103
> > places while guessing wrongly in 54, so that seems like a pretty
> > good step forward.
>
> On second thought, that's not a terribly helpful summary.  Breaking
> things down to the next level, there were
>
>1088 places where we correctly guessed a subquery isn't trivial
> (so no change from current behavior, which is correct)
>
>1015 places where we correctly guessed a subquery is trivial
> (hence, improving the cost estimate from before)
>
>  40 places where we incorrectly guessed a subquery isn't trivial
> (so no change from current behavior, although that's wrong)
>
>  14 places where we incorrectly guessed a subquery is trivial
> (hence, incorrectly charging zero for the SubqueryScan)
>
> 1015 improvements to 14 disimprovements isn't a bad score.  I'm
> a bit surprised there are that many removable SubqueryScans TBH;
> maybe that's an artifact of all the "SELECT *" queries.
>

The patch looks sane to me. 1015 vs 14 is a good win.

Thanks
Richard


Re: strange slow query - lost lot of time somewhere

2022-05-05 Thread Pavel Stehule
čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak 
napsal:

> Hi Pavel,
>
> > I have not debug symbols, so I have not more details now
> > Breakpoint 1 at 0x7f557f0c16c0
> > (gdb) c
> > Continuing.
>
> > Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> > (gdb) bt
> > #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> > #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> > #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> > #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
> > #4  0x00932134 in AllocSetAlloc ()
> > #5  0x009376cf in MemoryContextAllocExtended ()
> > #6  0x006ad915 in ExecInitMemoize ()
>
> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms
> / dpkgs(?) so I hope you are basically 1 command away of being able to
> debug it further what happens in ExecInitMemoize()
> Those packages seem to be safe as they modify only /usr/lib/debug so
> should not have any impact on production workload.
>

I just have to wait for admin action - I have no root rights for the server.



>
> -J.
>
>
>


Re: Handle infinite recursion in logical replication setup

2022-05-05 Thread Peter Smith
Thanks for making changes from all my past reviews.

1. I checked the cfbot result is OK

2. Both patches apply cleanly in my VM

3. Patch changes
V13-0001 - No diffs; it is identical to v12-0001
V13-0002 - only changed for pg docs. So I rebuilt v13 docs and checked
the HTML rendering. Looked OK.

So I have no more review comments. LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: strange slow query - lost lot of time somewhere

2022-05-05 Thread Jakub Wartak
Hi Pavel,

> I have not debug symbols, so I have not more details now
> Breakpoint 1 at 0x7f557f0c16c0
> (gdb) c
> Continuing.

> Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> (gdb) bt
> #0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> #1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> #2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> #3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
> #4  0x00932134 in AllocSetAlloc ()
> #5  0x009376cf in MemoryContextAllocExtended ()
> #6  0x006ad915 in ExecInitMemoize ()

Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / 
dpkgs(?) so I hope you are basically 1 command away of being able to debug it 
further what happens in ExecInitMemoize()
Those packages seem to be safe as they modify only /usr/lib/debug so should not 
have any impact on production workload.

-J.






Re: Progress report removal of temp files and temp relation files using ereport_startup_progress

2022-05-05 Thread Bharath Rupireddy
On Mon, May 2, 2022 at 6:26 PM Ashutosh Bapat
 wrote:
>
> Hi Bharath,
>
>
> On Sat, Apr 30, 2022 at 11:08 AM Bharath Rupireddy
>  wrote:
> >
> > Hi,
> >
> > At times, there can be many temp files (under pgsql_tmp) and temp
> > relation files (under removal which after crash may take longer during
> > which users have no clue about what's going on in the server before it
> > comes up online.
> >
> > Here's a proposal to use ereport_startup_progress to report the
> > progress of the file removal.
> >
> > Thoughts?
>
> The patch looks good to me.
>
> With this patch, the user would at least know which directory is being
> scanned and how much time has elapsed.

There's a problem with the patch, the timeout mechanism isn't being
used by the postmaster process. Postmaster doesn't
InitializeTimeouts() and doesn't register STARTUP_PROGRESS_TIMEOUT, I
tried to make postmaster do that (attached a v2 patch) but make check
fails.

Now, I'm thinking if it's a good idea to let postmaster use timeouts at all?

> It would be better to know how
> much work is remaining. I could not find a way to estimate the number
> of files in the directory so that we can extrapolate elapsed time and
> estimate the remaining time. Well, we could loop the output of
> opendir() twice, first to estimate and then for the actual work. This
> might actually work, if the time to delete all the files is very high
> compared to the time it takes to scan all the files/directories.
>
> Another possibility is to scan the sorted output of opendir() thus
> using the current file name to estimate remaining files in a very
> crude and inaccurate way. That doesn't look attractive either. I can't
> think of any better way to estimate the remaining time.

I think 'how much work/how many files remaining to process' is a
generic problem, for instance, snapshot, mapping files, old WAL file
processing and so on. I don't think we can do much about it.

> But at least with this patch, a user knows which files have been
> deleted, guessing how far, in the directory structure, the process has
> reached. S/he can then take a look at the remaining contents of the
> directory to estimate how much it should wait.

Not sure we will be able to use the timeout mechanism within
postmaster. Another idea is to have a generic GUC something like
log_file_processing_traffic = {none, medium, high} (similar idea is
proposed for WAL files processing while replaying/recovering at [1]),
default being none, when set to medium a log message gets emitted for
every say 128 or 256 (just a random number) files processed. when set
to high, log messages get emitted for every file processed (too
verbose). I think this generic GUC log_file_processing_traffic can be
used in many other file processing areas.

Thoughts?

[1] 
https://www.postgresql.org/message-id/CALj2ACVnhbx4pLZepvdqOfeOekvZXJ2F%3DwJeConGzok%2B6kgCVA%40mail.gmail.com

Regards,
Bharath Rupireddy.


v2-0001-pgsql_tmp-ereport_startup_progress.patch
Description: Binary data