Re: BufferAlloc: don't take two simultaneous locks

2022-04-13 Thread Yura Sokolov
В Пт, 08/04/2022 в 16:46 +0900, Kyotaro Horiguchi пишет:
> At Thu, 07 Apr 2022 14:14:59 +0300, Yura Sokolov  
> wrote in 
> > В Чт, 07/04/2022 в 16:55 +0900, Kyotaro Horiguchi пишет:
> > > Hi, Yura.
> > > 
> > > At Wed, 06 Apr 2022 16:17:28 +0300, Yura Sokolov 
> > >  wrot
> > > e in 
> > > > Ok, I got access to stronger server, did the benchmark, found weird
> > > > things, and so here is new version :-)
> > > 
> > > Thanks for the new version and benchmarking.
> > > 
> > > > First I found if table size is strictly limited to NBuffers and FIXED,
> > > > then under high concurrency get_hash_entry may not find free entry
> > > > despite it must be there. It seems while process scans free lists, other
> > > > concurrent processes "moves etry around", ie one concurrent process
> > > > fetched it from one free list, other process put new entry in other
> > > > freelist, and unfortunate process missed it since it tests freelists
> > > > only once.
> > > 
> > > StrategyGetBuffer believes that entries don't move across freelists
> > > and it was true before this patch.
> > 
> > StrategyGetBuffer knows nothing about dynahash's freelist.
> > It knows about buffer manager's freelist, which is not partitioned.
> 
> Yeah, right. I meant get_hash_entry.

But entries doesn't move.
One backends takes some entry from one freelist, other backend puts
other entry to other freelist. 

> > > I don't think that causes significant performance hit, but I don't
> > > understand how it improves freelist hit ratio other than by accident.
> > > Could you have some reasoning for it?
> > 
> > Since free_reused_entry returns entry into random free_list, this
> > probability is quite high. In tests, I see stabilisa
> 
> Maybe.  Doesn't it improve the efficiency if we prioritize emptied
> freelist on returning an element?  I tried it with an atomic_u32 to
> remember empty freelist. On the uin32, each bit represents a freelist
> index.  I saw it eliminated calls to element_alloc.  I tried to
> remember a single freelist index in an atomic but there was a case
> where two freelists are emptied at once and that lead to element_alloc
> call.

I thought on bitmask too.
But doesn't it return contention which many freelists were against?
Well, in case there are enough entries to keep it almost always "all
set", it would be immutable.

> > > By the way the change of get_hash_entry looks something wrong.
> > > 
> > > If I understand it correctly, it visits num_freelists/4 freelists at
> > > once, then tries element_alloc. If element_alloc() fails (that must
> > > happen), it only tries freeList[freelist_idx] and gives up, even
> > > though there must be an element in other 3/4 freelists.
> > 
> > No. If element_alloc fails, it tries all NUM_FREELISTS again.
> > - condition: `ntries || !allocFailed`. `!allocFailed` become true,
> >   so `ntries` remains.
> > - `ntries = num_freelists;` regardless of `allocFailed`.
> > Therefore, all `NUM_FREELISTS` are retried for partitioned table.
> 
> Ah, okay. ntries is set to num_freelists after calling element_alloc.
> I think we (I?) need more comments.
> 
> By the way, why it is num_freelists / 4 + 1?

Well, num_freelists could be 1 or 32.
If num_freelists is 1 then num_freelists / 4 == 0 - not good :-) 

--

regards

Yura Sokolov





Re: make MaxBackends available in _PG_init

2022-04-13 Thread Julien Rouhaud
On Wed, Apr 13, 2022 at 11:30:40AM -0700, Nathan Bossart wrote:
> On Wed, Apr 13, 2022 at 12:05:08PM -0400, Tom Lane wrote:
> > Robert Haas  writes:
> >> It may be too much to hope that we're going to completely get rid of
> >> process_shared_preload_libraries_in_progress tests.
> > 
> > Perhaps, but this is definitely an area that could stand to have some
> > serious design thought put into it.  You're quite right that it's
> > largely a mess today, but I think proposals like "forbid setting GUCs
> > during _PG_init" would add to the mess rather than clean things up.
> 
> +1.  The simplest design might be to just make a separate preload hook.
> _PG_init() would install hooks, and then this preload hook would do
> anything that needs to happen when the library is loaded via s_p_l.
> However, I think we still want a new shmem request hook where MaxBackends
> is initialized.

Yeah this one seems needed no matter what.

> If we do move forward with the shmem request hook, do we want to disallow
> shmem requests anywhere else, or should we just leave it up to extension
> authors to do the right thing?  Many shmem requests in _PG_init() are
> probably okay unless they depend on MaxBackends or another GUC that someone
> might change.  Given that, I think I currently prefer the latter (option B
> from upthread).

I'd be in favor of a hard break.  There are already multiple extensions that
relies on non final value of GUCs to size their shmem request.  And as an
extension author it can be hard to realize that, as those extensions work just
fine until someone wants to try it with some other extension that changes some
GUC.  Forcing shmem request in a new hook will make sure that it's *always*
correct, and that only requires very minimal work on the extension side.




Re: Printing backtrace of postgres processes

2022-04-13 Thread vignesh C
On Wed, Apr 6, 2022 at 12:29 PM vignesh C  wrote:
>
> On Tue, Apr 5, 2022 at 9:18 PM Robert Haas  wrote:
> >
> > On Wed, Mar 30, 2022 at 12:03 PM Justin Pryzby  wrote:
> > > On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote:
> > > > Sadly the cfbot is showing a patch conflict again. It's just a trivial
> > > > conflict in the regression test schedule so I'm not going to update
> > > > the status but it would be good to rebase it so we continue to get
> > > > cfbot testing.
> > >
> > > Done.  No changes.
> >
> > + if (chk_auxiliary_proc)
> > + ereport(WARNING,
> > + errmsg("PID %d is not a PostgreSQL server process", pid));
> > + else
> > + ereport(WARNING,
> > + errmsg("PID %d is not a PostgreSQL backend process", pid));
> >
> > This doesn't look right to me. I don't think that PostgreSQL server
> > processes are one kind of thing and PostgreSQL backend processes are
> > another kind of thing. I think they're two terms that are pretty
> > nearly interchangeable, or maybe at best you want to argue that
> > backend processes are some subset of server processes. I don't see
> > this sort of thing adding any clarity.
>
> I have changed it to "PID %d is not a PostgreSQL server process" which
> is similar to the existing warning message in
> pg_log_backend_memory_contexts.
>
> > -static void
> > +void
> >  set_backtrace(ErrorData *edata, int num_skip)
> >  {
> >   StringInfoData errtrace;
> > @@ -978,7 +978,18 @@ set_backtrace(ErrorData *edata, int num_skip)
> >  "backtrace generation is not supported by this installation");
> >  #endif
> >
> > - edata->backtrace = errtrace.data;
> > + if (edata)
> > + edata->backtrace = errtrace.data;
> > + else
> > + {
> > + /*
> > + * LOG_SERVER_ONLY is used to make sure the backtrace is never
> > + * sent to client. We want to avoid messing up the other client
> > + * session.
> > + */
> > + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> > + pfree(errtrace.data);
> > + }
> >  }
> >
> > This looks like a grotty hack.
>
> I have changed it so that the backtrace is set and returned to the
> caller. The caller will take care of logging or setting it to the
> error data context. Thoughts?
>
> > - PGPROC*proc = BackendPidGetProc(pid);
> > -
> > - /*
> > - * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
> > - * we reach kill(), a process for which we get a valid proc here might
> > - * have terminated on its own.  There's no way to acquire a lock on an
> > - * arbitrary process to prevent that. But since so far all the callers of
> > - * this mechanism involve some request for ending the process anyway, that
> > - * it might end on its own first is not a problem.
> > - *
> > - * Note that proc will also be NULL if the pid refers to an auxiliary
> > - * process or the postmaster (neither of which can be signaled via
> > - * pg_signal_backend()).
> > - */
> > - if (proc == NULL)
> > - {
> > - /*
> > - * This is just a warning so a loop-through-resultset will not abort
> > - * if one backend terminated on its own during the run.
> > - */
> > - ereport(WARNING,
> > - (errmsg("PID %d is not a PostgreSQL backend process", pid)));
> > + PGPROC  *proc;
> >
> > + /* Users can only signal valid backend or an auxiliary process. */
> > + proc = CheckPostgresProcessId(pid, false, NULL);
> > + if (!proc)
> >   return SIGNAL_BACKEND_ERROR;
> > - }
> >
> > Incidentally changing the behavior of pg_signal_backend() doesn't seem
> > like a great idea. We can do that as a separate commit, after
> > considering whether documentation changes are needed. But it's not
> > something that should get folded into a commit on another topic.
>
> Agreed. I have kept the logic of pg_signal_backend as it is.
> pg_log_backtrace and pg_log_backend_memory_contexts use the same
> functionality to check and send signal. I have added a new function
> "CheckProcSendDebugSignal" which has the common code implementation
> and will be called by both pg_log_backtrace and
> pg_log_backend_memory_contexts.
>
> > + /*
> > + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
> > + * isn't valid; but by the time we reach kill(), a process for which we
> > + * get a valid proc here might have terminated on its own.  There's no way
> > + * to acquire a lock on an arbitrary process to prevent that. But since
> > + * this mechanism is usually used to debug a backend or an auxiliary
> > + * process running and consuming lots of memory, that it might end on its
> > + * own first without logging the requested info is not a problem.
> > + */
> >
> > This comment made a lot more sense where it used to be than it does
> > where it is now. I think more work is required here than just cutting
> > and pasting.
>
> This function was called from pg_signal_backend earlier, this logic is
> now moved to CheckProcSendDebugSignal which will be called only by
> pg_log_backtrace and pg_log_backend_memory_contexts. This looks
> appropriate in CheckProcSendDebugSignal with 

RE: Perform streaming logical transactions by background workers and parallel apply

2022-04-13 Thread houzj.f...@fujitsu.com
On Friday, April 8, 2022 5:14 PM houzj.f...@fujitsu.com 
 wrote:
> On Wednesday, April 6, 2022 1:20 PM Amit Kapila 
> wrote:
> 
> > In this email, I would like to discuss allowing streaming logical
> > transactions (large in-progress transactions) by background workers
> > and parallel apply in general. The goal of this work is to improve the
> > performance of the apply work in logical replication.
> >
> > Currently, for large transactions, the publisher sends the data in
> > multiple streams (changes divided into chunks depending upon
> > logical_decoding_work_mem), and then on the subscriber-side, the apply
> > worker writes the changes into temporary files and once it receives
> > the commit, it read from the file and apply the entire transaction. To
> > improve the performance of such transactions, we can instead allow
> > them to be applied via background workers. There could be multiple
> > ways to achieve this:
> >
> > Approach-1: Assign a new bgworker (if available) as soon as the xact's
> > first stream came and the main apply worker will send changes to this
> > new worker via shared memory. We keep this worker assigned till the
> > transaction commit came and also wait for the worker to finish at
> > commit. This preserves commit ordering and avoid writing to and
> > reading from file in most cases. We still need to spill if there is no
> > worker available. We also need to allow stream_stop to complete by the
> > background worker to finish it to avoid deadlocks because T-1's
> > current stream of changes can update rows in conflicting order with
> > T-2's next stream of changes.
> >
> 
> Attach the POC patch for the Approach-1 of "Perform streaming logical
> transactions by background workers". The patch is still a WIP patch as
> there are serval TODO items left, including:
> 
> * error handling for bgworker
> * support for SKIP the transaction in bgworker
> * handle the case when there is no more worker available
>   (might need spill the data to the temp file in this case)
> * some potential bugs
> 
> The original patch is borrowed from an old thread[1] and was rebased and
> extended/cleaned by me. Comments and suggestions are welcome.

Attach a new version patch which improved the error handling and handled the 
case
when there is no more worker available (will spill the data to the temp file in 
this case).

Currently, it still doesn't support skip the streamed transaction in bgworker, 
because
in this approach, we don't know the last lsn for the streamed transaction being 
applied,
so cannot get the lsn to SKIP. I will think more about it and keep testing the 
patch.

Best regards,
Hou zj





v2-0001-Perform-streaming-logical-transactions-by-background.patch
Description:  v2-0001-Perform-streaming-logical-transactions-by-background.patch


Re: Column Filtering in Logical Replication

2022-04-13 Thread Amit Kapila
On Thu, Apr 14, 2022 at 8:32 AM Masahiko Sawada  wrote:
>
> On Wed, Apr 13, 2022 at 6:45 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 13, 2022 at 1:41 PM Masahiko Sawada  
> > wrote:
> > >
> > > I've looked at this issue and had the same analysis. Also, I could
> > > reproduce this issue with the steps shared by Amit.
> > >
> > > As I mentioned in another thread[1], the fact that the tablesync
> > > worker doesn't check the return value from
> > > wait_for_worker_state_change() seems a bug to me. So my initial
> > > thought of the solution is that we can have the tablesync worker check
> > > the return value and exit if it's false. That way, the apply worker
> > > can restart and request to launch the tablesync worker again. What do
> > > you think?
> > >
> >
> > I think that will fix this symptom but I am not sure if that would be
> > the best way to deal with this because we have a mechanism where the
> > sync worker can continue even if we don't do anything as a result of
> > wait_for_worker_state_change() provided apply worker restarts.
>
> I think we can think this is a separate issue. That is, if tablesync
> worker can start streaming changes even without waiting for the apply
> worker to set SUBREL_STATE_CATCHUP, do we really need the wait? I'm
> not sure it's really safe. If it's safe, the tablesync worker will no
> longer need to wait there.
>

As per my understanding, it is safe, whatever is streamed by tablesync
worker will be skipped later by apply worker. The wait here avoids
streaming the same data both by the apply worker and table sync worker
which I think is good even if it is not a must.

> >
> > The other part of the puzzle is the below check in the code:
> > /*
> > * If we reached the sync worker limit per subscription, just exit
> > * silently as we might get here because of an otherwise harmless race
> > * condition.
> > */
> > if (nsyncworkers >= max_sync_workers_per_subscription)
> >
> > It is not clear to me why this check is there, if this wouldn't be
> > there, the user would have got either a WARNING to increase the
> > max_logical_replication_workers or the apply worker would have been
> > restarted. Do you have any idea about this?
>
> Yeah, I'm also puzzled with this check. It seems that this function
> doesn't work well when the apply worker is not running and some
> tablesync workers are running. I initially thought that the apply
> worker calls to this function as many as tables that needs to be
> synced, but it checks the max_sync_workers_per_subscription limit
> before calling to logicalrep_worker_launch(). So I'm not really sure
> we need this check.
>

I just hope that the original author Petr J. responds to this point. I
have added him to this email. This will help us to find the best
solution for this problem.

Note: I'll be away for the remaining week, so will join the discussion
next week unless we reached the conclusion by that time.

-- 
With Regards,
Amit Kapila.




Re: PG DOCS - logical replication filtering

2022-04-13 Thread Amit Kapila
On Thu, Apr 14, 2022 at 1:29 AM Euler Taveira  wrote:
>
> On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:
>
> PSA patch v10 which addresses the remaining review comments from Euler [1]
>
> Looks good to me.
>

Thanks, this looks good to me as well. I'll check this again early
next week and push unless I find something or there are more comments.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-04-13 Thread Amit Kapila
On Wed, Apr 13, 2022 at 10:01 PM Alvaro Herrera  wrote:
>
> BTW I just noticed that AlterPublicationOptions acquires only
> ShareAccessLock on the publication object.  I think this is too lax ...
> what if two of them run concurrently? (say to specify different
> published actions)  Do they overwrite the other's update?
>

No, they won't overwrite. Firstly the AccessShareLock on the
publication object is not related to concurrent change of the
publication object. They will be protected by normal update-row rules
(like till the first transaction finishes, the other will wait). See
an example below:

Session-1
postgres=# Begin;
BEGIN
postgres=*# Alter publication pub1 set (publish = 'insert');
ALTER PUBLICATION

Session-2:
postgres=# Begin;
BEGIN
postgres=*# Alter publication pub1 set (publish = 'update');

The Alter in Session-2 will wait till we end the transaction in Session-1.


-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-04-13 Thread Masahiko Sawada
(

On Wed, Apr 13, 2022 at 6:45 PM Amit Kapila  wrote:
>
> On Wed, Apr 13, 2022 at 1:41 PM Masahiko Sawada  wrote:
> >
> > I've looked at this issue and had the same analysis. Also, I could
> > reproduce this issue with the steps shared by Amit.
> >
> > As I mentioned in another thread[1], the fact that the tablesync
> > worker doesn't check the return value from
> > wait_for_worker_state_change() seems a bug to me. So my initial
> > thought of the solution is that we can have the tablesync worker check
> > the return value and exit if it's false. That way, the apply worker
> > can restart and request to launch the tablesync worker again. What do
> > you think?
> >
>
> I think that will fix this symptom but I am not sure if that would be
> the best way to deal with this because we have a mechanism where the
> sync worker can continue even if we don't do anything as a result of
> wait_for_worker_state_change() provided apply worker restarts.

I think we can think this is a separate issue. That is, if tablesync
worker can start streaming changes even without waiting for the apply
worker to set SUBREL_STATE_CATCHUP, do we really need the wait? I'm
not sure it's really safe. If it's safe, the tablesync worker will no
longer need to wait there.

>
> The other part of the puzzle is the below check in the code:
> /*
> * If we reached the sync worker limit per subscription, just exit
> * silently as we might get here because of an otherwise harmless race
> * condition.
> */
> if (nsyncworkers >= max_sync_workers_per_subscription)
>
> It is not clear to me why this check is there, if this wouldn't be
> there, the user would have got either a WARNING to increase the
> max_logical_replication_workers or the apply worker would have been
> restarted. Do you have any idea about this?

Yeah, I'm also puzzled with this check. It seems that this function
doesn't work well when the apply worker is not running and some
tablesync workers are running. I initially thought that the apply
worker calls to this function as many as tables that needs to be
synced, but it checks the max_sync_workers_per_subscription limit
before calling to logicalrep_worker_launch(). So I'm not really sure
we need this check.

>
> Yet another option is that we ensure that before launching sync
> workers (say in process_syncing_tables_for_apply->FetchTableStates,
> when we have to start a new transaction) we again call
> maybe_reread_subscription(), which should also fix this symptom. But
> again, I am not sure why it should be compulsory to call
> maybe_reread_subscription() in such a situation, there are no comments
> which suggest it,

Yes, it will fix this issue.

>
> Now, the reason why it appeared recently in commit c91f71b9dc is that
> I think we have increased the number of initial table syncs in that
> test, and probably increasing
> max_sync_workers_per_subscription/max_logical_replication_workers
> should fix that test.

I think so too.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Noah Misch
On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> wrasse?

Done.  Also forced hourly builds.




Re: allow specifying action when standby encounters incompatible parameter settings

2022-04-13 Thread Kyotaro Horiguchi
At Wed, 13 Apr 2022 14:35:21 -0700, Nathan Bossart  
wrote in 
> Hi hackers,
> 
> As of 15251c0, when a standby encounters an incompatible parameter change,
> it pauses replay so that read traffic can continue while the administrator
> fixes the parameters.  Once the server is restarted, replay can continue.
> Before this change, such incompatible parameter changes caused the standby
> to immediately shut down.
> 
> I noticed that there was some suggestion in the thread associated with
> 15251c0 [0] for making this behavior configurable, but there didn't seem to
> be much interest at the time.  I am interested in allowing administrators
> to specify the behavior before 15251c0 (i.e., immediately shut down the
> standby when an incompatible parameter change is detected).  The use-case I
> have in mind is when an administrator has automation in place for adjusting
> these parameters and would like to avoid stopping replay any longer than
> necessary.  FWIW this is what we do in RDS.
> 
> I've attached a patch that adds a new GUC where users can specify the
> action to take when an incompatible parameter change is detected on a
> standby.  For now, there are just two options: 'pause' and 'shutdown'.
> This new GUC is largely modeled after recovery_target_action.

The overall direction of going to shutdown without needing user
interaction seems fine.  I think the same can be done by
timeout. That is, we provide a GUC named like
insufficient_standby_setting_shutdown_timeout (mmm. too long..), then
recovery sits down for the duration then shuts down. -1 means the
current behavior, 0 means what this patch is going to
introduce. However I don't see a concrete use case of the timeout.

> I initially set out to see if it was possible to automatically adjust these
> parameters on a standby, but that is considerably more difficult.  It isn't
> enough to just hook into the restart_after_crash functionality since it
> doesn't go back far enough in the postmaster logic.  IIUC we'd need to
> reload preloaded libraries (which there is presently no support for),
> recalculate MaxBackends, etc.  Another option I considered was to

Sure.

> automatically adjust the parameters during startup so that you just need to
> restart the server.  However, we need to know for sure that the server is
> going to be a hot standby, and I don't believe we have that information
> where such GUC changes would need to occur (I could be wrong about this).

Conldn't we use AlterSystemSetConfigFile for this purpose in
CheckRequiredParameterValues?

> Anyway, for now I'm just proposing the modest change described above, but
> I'd welcome any discussion about improving matters further in this area.
> 
> [0] https://postgr.es/m/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b%402ndquadrant.com

Is the reason for the enum the extensibility to add a new choice like
"auto-adjust"?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Andres Freund  writes:
> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> wrasse?

+1

> Does sparc have wider alignment rules for some types? Perhaps that'd be
> enough to put some tables to be sufficiently larger to trigger parallel
> vacuum?

No, the configure results on wrasse look pretty ordinary:

checking size of void *... 8
checking size of size_t... 8
checking size of long... 8
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 8
checking alignment of double... 8

I wondered for a moment about force_parallel_mode, but wrasse doesn't
appear to be setting that, and in any case I'm pretty sure it only
affects plannable statements.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi,

Noah, any chance you could enable log_autovacuum_min_duration=0 on
wrasse?


On 2022-04-13 21:23:12 -0400, Tom Lane wrote:
> I'm still suspicious of the pgstat changes, though.  I checked into
> things here by doing
> 
>   initdb
>   edit postgresql.conf to set log_autovacuum_min_duration = 0
>   pg_ctl start && make installcheck-parallel
> 
> and what I see is that the first reported autovacuum activity begins
> exactly one minute after the postmaster starts, which is what I'd
> expect given the autovacuum naptime rules.

It'd not necessarily have to be autovacuum. A CREATE INDEX or VACUUM
using parallelism, could also cause this, I think. It'd be a narrow
window, of course...

Does sparc have wider alignment rules for some types? Perhaps that'd be
enough to put some tables to be sufficiently larger to trigger parallel
vacuum?

Greetings,

Andres Freund




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Andres Freund  writes:
> On 2022-04-13 20:35:50 -0400, Tom Lane wrote:
>> It seems like a SQL-accessible function could be written
>> and then called before any problematic VACUUM.  I like this better
>> for something we're thinking of jamming in post-feature-freeze;
>> we'd not be committing to the feature quite as much as if we
>> added a VACUUM option.

> We could otherwise just disable IOS for that query, for now.

The entire point of that test case is to verify the shape of the
IOS plan, so no that's not an acceptable answer.  But if we're
looking for quick hacks, we could do

update pg_class set relallvisible = relpages where relname = 'tenk1';

just before that test.

I'm still suspicious of the pgstat changes, though.  I checked into
things here by doing

initdb
edit postgresql.conf to set log_autovacuum_min_duration = 0
pg_ctl start && make installcheck-parallel

and what I see is that the first reported autovacuum activity begins
exactly one minute after the postmaster starts, which is what I'd
expect given the autovacuum naptime rules.  On my machine, of course,
the installcheck-parallel run is long gone by then.  But even on the
much slower wrasse, we should be well past create_index by the time any
autovac worker launches --- you can see from wrasse's reported test
runtimes that only about 10 seconds have elapsed when it get to the end
of create_index.

This suggests to me that what is holding the (presumed) conflicting
snapshot must be the autovac launcher, because what else could it be?
So I'm suspicious that we broke something in that area, though I am
baffled why only wrasse would be telling us so.

regards, tom lane




Re: VPath Build Errors

2022-04-13 Thread Julien Rouhaud
On Wed, Apr 13, 2022 at 05:48:49PM -0700, David G. Johnston wrote:
> On Wed, Apr 13, 2022 at 5:44 PM Tom Lane  wrote:
> 
> > "David G. Johnston"  writes:
> > > The attached log is result of (while in the versioned directory, a
> > sibling
> > > of the git repo)
> > > `../postgresql/configure`
> > > `make`
> > > `tree`
> >
> > The VPATH buildfarm members haven't been complaining, and I can't
> > reproduce a failure here, so I'm inclined to suspect pilot error.
> >
> > One point that's not very clearly documented is that your source
> > directory needs to be clean; no build products in it, except
> > possibly those that are included in tarballs.
> >
> >
> I'll double-check but that would explain it.  I know it was not clean when
> I tried this.

Note that if you what you want is building multiple major versions at the same
time, the easiest way to do that is probably to use git worktrees [1] and
checkout multiple branches of the same repo at the same times, and then build
each one as you'd normally do.

[1] https://git-scm.com/docs/git-worktree




Re: shared-memory based stats collector - v70

2022-04-13 Thread Michael Paquier
On Wed, Apr 13, 2022 at 04:56:45PM -0700, David G. Johnston wrote:
> Sorry, apparently this "2000-01-01" behavior only manifests after crash
> recovery on v15 (didn't check v14); after a clean initdb on v15 I got the
> same initdb timestamp.
> 
> Feels like we should still report the "end of crash recovery timestamp" for
> these instead of 2000-01-01 (which I guess is derived from 0) if we are not
> willing to produce null (and it seems other parts of the system using these
> stats assumes non-null).

I can see this timestamp as well after crash recovery.  This seems
rather misleading to me.  I have added an open item.
--
Michael


signature.asc
Description: PGP signature


Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 6:05 PM Andres Freund  wrote:
> I think most of those we've ended up replacing by using temp tables in
> those tests instead, since they're not affected by the global horizon
> anymore.

Maybe, but it's a pain to have to work that way. You can't do it in
cases like this, because a temp table is not workable. So that's not
an ideal long term solution.

> > We'd not necessarily have to embed wait-for-horizon into VACUUM
> > itself.
>
> I'm not sure it'd be quite reliable outside of vacuum though, due to the
> horizon potentially going backwards (in otherwise harmless ways)?

I agree, since vacuumlazy.c would need to either be given its own
OldestXmin, or knowledge of a wait-up-to XID. Either way we have to
make non-trivial changes to vacuumlazy.c.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan  wrote:
> I think that it's more likely that FREEZE will correct problems, out of the 
> two:
>
> * FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
> cutoff value as possible (FreezeLimit will be equal to OldestXmin).

The reason why that might have helped (at least in the past) is that
it's enough to force us to wait for a cleanup lock to prune and
freeze, if necessary. Which was never something that
DISABLE_PAGE_SKIPPING could do.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi,

On 2022-04-13 20:35:50 -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
> > On Wed, Apr 13, 2022 at 4:51 PM Tom Lane  wrote:
> >> Yeah, we have band-aided around this type of problem repeatedly.
> >> Making a fix that's readily accessible from any test script
> >> seems like a good idea.
> 
> > We might even be able to consistently rely on this new option, given
> > *any* problem involving test stability and VACUUM. Having a
> > one-size-fits-all solution to these kinds of stability problems would
> > be nice -- no more DISABLE_PAGE_SKIPPING bandaids.
> 
> My guess is that you'd need both this new wait-for-horizon behavior
> *and* DISABLE_PAGE_SKIPPING.  But the two together ought to make
> for pretty reproducible behavior.  I noticed while scanning the
> commit log that some patches have tried adding a FREEZE option,
> which seems more like waving a dead chicken than anything that'd
> improve stability.

I think most of those we've ended up replacing by using temp tables in
those tests instead, since they're not affected by the global horizon
anymore.


> We'd not necessarily have to embed wait-for-horizon into VACUUM
> itself.

I'm not sure it'd be quite reliable outside of vacuum though, due to the
horizon potentially going backwards (in otherwise harmless ways)?


> It seems like a SQL-accessible function could be written
> and then called before any problematic VACUUM.  I like this better
> for something we're thinking of jamming in post-feature-freeze;
> we'd not be committing to the feature quite as much as if we
> added a VACUUM option.

We could otherwise just disable IOS for that query, for now.

Greetings,

Andres Freund




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 5:35 PM Tom Lane  wrote:
> My guess is that you'd need both this new wait-for-horizon behavior
> *and* DISABLE_PAGE_SKIPPING.  But the two together ought to make
> for pretty reproducible behavior.  I noticed while scanning the
> commit log that some patches have tried adding a FREEZE option,
> which seems more like waving a dead chicken than anything that'd
> improve stability.

I think that it's more likely that FREEZE will correct problems, out of the two:

* FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
cutoff value as possible (FreezeLimit will be equal to OldestXmin).

* DISABLE_PAGE_SKIPPING also forces an aggressive VACUUM. But unlike
FREEZE it makes VACUUM not use the visibility map, even in the case of
all-frozen pages. And it changes nothing about FreezeLimit.

It's also a certainty that VACUUM(FREEZE, DISABLE_PAGE_SKIPPING) is
not a sensible remedy for any problem with test stability, but there
are still some examples of that combination in the regression tests.
The only way it could make sense is if the visibility map might be
corrupt, but surely we're not expecting that anyway (and if we were
we'd be testing it more directly).

I recently argued that DISABLE_PAGE_SKIPPING should have nothing to do
with aggressive vacuuming -- that should all be left up to VACUUM
FREEZE. It seems more logical to make DISABLE_PAGE_SKIPPING mean
"don't use the visibility map to skip anything", without bringing
aggressiveness into it at all. That would be less confusing.

> We'd not necessarily have to embed wait-for-horizon into VACUUM
> itself.  It seems like a SQL-accessible function could be written
> and then called before any problematic VACUUM.  I like this better
> for something we're thinking of jamming in post-feature-freeze;
> we'd not be committing to the feature quite as much as if we
> added a VACUUM option.

Hmm. I would say that the feature has zero appeal to users anyway.
Maybe it can and should be done through an SQL function for other
reasons, though. Users already think that there are several different
flavors of VACUUM, which isn't really true.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi,

On 2022-04-13 18:54:06 -0400, Tom Lane wrote:
> We used to have a rate limit on how often stats reports would be sent
> to the collector, which'd ensure half a second or so delay before a
> transaction's change counts became visible to the autovac daemon.

Just for posterity: That's not actually gone. But what is gone is the
rate limiting in autovacuum about requesting recent stats for a table /
autovac seeing slightly older stats.

Greetings,

Andres Freund




RE: PG DOCS - logical replication filtering

2022-04-13 Thread houzj.f...@fujitsu.com
On Wednesday, April 13, 2022 11:25 AM Peter Smith  wrote:
> 
> PSA patch v10 which addresses the remaining review comments from Euler [1]

Thanks for the patch, it looks good to me.

Best regards,
Hou zj


Re: shared-memory based stats collector - v70

2022-04-13 Thread Andres Freund
Hi,

On 2022-04-13 16:56:45 -0700, David G. Johnston wrote:
> On Wed, Apr 13, 2022 at 4:34 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
> Sorry, apparently this "2000-01-01" behavior only manifests after crash
> recovery on v15 (didn't check v14); after a clean initdb on v15 I got the
> same initdb timestamp.

> Feels like we should still report the "end of crash recovery timestamp" for
> these instead of 2000-01-01 (which I guess is derived from 0) if we are not
> willing to produce null (and it seems other parts of the system using these
> stats assumes non-null).

Yes, that's definitely not correct. I see the bug (need to call
pgstat_reset_after_failure(); in pgstat_discard_stats()). Stupid, but
easy to fix - too fried to write a test tonight, but will commit the fix
tomorrow.

Thanks for catching!

Greetings,

Andres Freund




Re: VPath Build Errors

2022-04-13 Thread David G. Johnston
On Wed, Apr 13, 2022 at 5:44 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > The attached log is result of (while in the versioned directory, a
> sibling
> > of the git repo)
> > `../postgresql/configure`
> > `make`
> > `tree`
>
> The VPATH buildfarm members haven't been complaining, and I can't
> reproduce a failure here, so I'm inclined to suspect pilot error.
>
> One point that's not very clearly documented is that your source
> directory needs to be clean; no build products in it, except
> possibly those that are included in tarballs.
>
>
I'll double-check but that would explain it.  I know it was not clean when
I tried this.

David J.


Re: VPath Build Errors

2022-04-13 Thread Tom Lane
"David G. Johnston"  writes:
> The attached log is result of (while in the versioned directory, a sibling
> of the git repo)
> `../postgresql/configure`
> `make`
> `tree`

The VPATH buildfarm members haven't been complaining, and I can't
reproduce a failure here, so I'm inclined to suspect pilot error.

One point that's not very clearly documented is that your source
directory needs to be clean; no build products in it, except
possibly those that are included in tarballs.

regards, tom lane




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-13 Thread Andres Freund
Hi,

On 2022-04-13 15:28:19 -0500, Justin Pryzby wrote:
> On Wed, Mar 30, 2022 at 05:58:24PM +0900, Kyotaro Horiguchi wrote:
> > I'm not still confident on this, but it should be better than the v1.
> 
> +Andres as this seems to be related to 277692220.

FWIW, that'd just mean it's an old bug that wasn't easily noticeable
before, not that it's the fault of 277692220.


> I added it as an Opened Item.

IOW, it'd belong in "Older bugs affecting stable branches".

Greetings,

Andres Freund




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Apr 13, 2022 at 4:51 PM Tom Lane  wrote:
>> Yeah, we have band-aided around this type of problem repeatedly.
>> Making a fix that's readily accessible from any test script
>> seems like a good idea.

> We might even be able to consistently rely on this new option, given
> *any* problem involving test stability and VACUUM. Having a
> one-size-fits-all solution to these kinds of stability problems would
> be nice -- no more DISABLE_PAGE_SKIPPING bandaids.

My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING.  But the two together ought to make
for pretty reproducible behavior.  I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.

We'd not necessarily have to embed wait-for-horizon into VACUUM
itself.  It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM.  I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi,

On 2022-04-13 16:45:44 -0700, Peter Geoghegan wrote:
> On Wed, Apr 13, 2022 at 4:38 PM Tom Lane  wrote:
> > So what seems to be happening on wrasse is that a background
> > autovacuum (or really autoanalyze?) is preventing pages from
> > being marked all-visible not only during test_setup.sql but
> > also create_index.sql; but it's gone by the time sanity_check.sql
> > runs.
> 
> I agree that it would need to be an autoanalyze (due to the
> PROC_IN_VACUUM optimization).

That's not a realiable protection - the snapshot is established normally
at first, only after a while we set PROC_IN_VACUUM...

Greetings,

Andres Freund




Re: typos

2022-04-13 Thread Justin Pryzby
On Thu, Apr 14, 2022 at 09:39:42AM +1200, David Rowley wrote:
> On Thu, 14 Apr 2022 at 05:40, Justin Pryzby  wrote:
> > There's (only) a few remaining.
> 
> I've pushed 0001 and 0002 of the 3rd batch of patches. I left 0003 as

Thanks

> I just didn't feel it was a meaningful enough improvement.
> 
> From docs/, if I do:
> 
> $ git grep ", which is the default" | wc -l
> 9
> 
> $ git grep ", the default" | wc -l
> 64
> 
> You're proposing to make the score 10, 63.  I'm not sure if that's a
> good direction to go in.

Well, I'm proposing to change the only instance of this:

$ git grep -F ", the default)"
doc/src/sgml/ref/create_publication.sgml:   partition's row filter (if the 
parameter is false, the default) or the root

Maybe what's needed is more like this.

   If the publication contains a partitioned table, and the publication 
parameter
   publish_via_partition_root is false (the default), then 
the
   row filter is taken from the partition; otherwise, the row filter is taken
   from the root partitioned table.

I'll plan to keep this around and may come back to it later.

On Thu, Apr 14, 2022 at 08:56:22AM +1200, David Rowley wrote:
> I've left out the following change as it does not seem to be bringing
> any sort of consistency to the docs overall. It only brings
> consistency to a single source file in the docs.
> 
> -  You need zstd, if you want to support
> +  You need ZSTD, if you want to support
> 
> See: git grep -i ">zstd<"

It may not be worth changing just this one line, but the reason I included it
here is for consistency:

$ git grep 'zstd.*product' doc
doc/src/sgml/config.sgml:zstd (if 
PostgreSQL
$ git grep 'ZSTD.*product' doc
doc/src/sgml/install-windows.sgml: 
ZSTD
doc/src/sgml/install-windows.sgml:  Required for supporting 
ZSTD compression
doc/src/sgml/installation.sgml:  You need ZSTD, 
if you want to support
doc/src/sgml/installation.sgml: Build with 
ZSTD compression support.

If we were to change it, maybe they should all say "Zstandard (zstd)".  ZSTD
looks like an acronym, which I think it is not, and Zstandard indicates how to
pronounce it.




VPath Build Errors

2022-04-13 Thread David G. Johnston
Hey,

I've been building in the git repo just fine but wanted to use vpath builds
so I could keep both "maked" v14 and v15 binaries around, ready to be
installed.

The attached log is result of (while in the versioned directory, a sibling
of the git repo)
`../postgresql/configure`
`make`
`tree`

stdout and stderr output tee'd to a file.

Per the instructions here:

https://www.postgresql.org/docs/current/install-procedure.html

The last handful of lines for make are below:

Thanks!

David J.

cat: ../../src/timezone/objfiles.txt: No such file or directory
cat: jit/objfiles.txt: No such file or directory
gcc -Wall -Wmissing-prototypes [...see file...] -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  -Wl,-E -lz -lpthread
-lrt -ldl -lm -o postgres
gcc: error: replication/backup_manifest.o: No such file or directory
gcc: error: replication/basebackup.o: No such file or directory
gcc: error: replication/basebackup_copy.o: No such file or directory
gcc: error: replication/basebackup_gzip.o: No such file or directory
gcc: error: replication/basebackup_lz4.o: No such file or directory
gcc: error: replication/basebackup_zstd.o: No such file or directory
gcc: error: replication/basebackup_progress.o: No such file or directory
gcc: error: replication/basebackup_server.o: No such file or directory
gcc: error: replication/basebackup_sink.o: No such file or directory
gcc: error: replication/basebackup_target.o: No such file or directory
gcc: error: replication/basebackup_throttle.o: No such file or directory
gcc: error: replication/repl_gram.o: No such file or directory
gcc: error: replication/slot.o: No such file or directory
gcc: error: replication/slotfuncs.o: No such file or directory
gcc: error: replication/syncrep.o: No such file or directory
gcc: error: replication/syncrep_gram.o: No such file or directory
gcc: error: replication/walreceiver.o: No such file or directory
gcc: error: replication/walreceiverfuncs.o: No such file or directory
gcc: error: replication/walsender.o: No such file or directory
gcc: error: utils/fmgrtab.o: No such file or directory
make[2]: *** [Makefile:66: postgres] Error 1
make[2]: Leaving directory '/home/vagrant/pgsql15/src/backend'
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make[1]: Leaving directory '/home/vagrant/pgsql15/src'
make: *** [GNUmakefile:11: all-src-recurse] Error 2


Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 4:51 PM Tom Lane  wrote:
> Yeah, we have band-aided around this type of problem repeatedly.
> Making a fix that's readily accessible from any test script
> seems like a good idea.

We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.

That general approach will be possible provided an inability to
acquire a cleanup lock during VACUUM (which can more or less occur at
random in most environments) doesn't ever lead to unexpected test
results. There is good reason to think that it might work out that
way. Simulating problems with acquiring cleanup locks during VACUUM
left me with the impression that that could really work:

https://postgr.es/m/cah2-wzkib-qcsbmwrpzp0nxvrqexouts1d7tyshg_drkohe...@mail.gmail.com

--
Peter Geoghegan




Re: shared-memory based stats collector - v70

2022-04-13 Thread David G. Johnston
On Wed, Apr 13, 2022 at 4:34 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Apr 5, 2022 at 8:00 PM Andres Freund  wrote:
>
>> Here comes v70:
>>
>>
> One thing I just noticed while peeking at pg_stat_slru:
>
> The stats_reset column for my newly initdb'd cluster is showing me
> "2000-01-01 00:00:00" (v15).  I was expecting null, though a non-null value
> restriction does make sense.  Neither choice is documented though.
>
> Based upon my expectation I checked to see if v14 reported null, and thus
> this was a behavior change.  v14 reports the initdb timestamp (e.g.,
> 2022-04-13 23:26:48.349115+00)
>
> Can we document the non-null aspect of this value (pg_stat_database is
> happy being null, this seems to be a "fixed" type behavior) but have it
> continue to report initdb as its initial value?
>
>
Sorry, apparently this "2000-01-01" behavior only manifests after crash
recovery on v15 (didn't check v14); after a clean initdb on v15 I got the
same initdb timestamp.

Feels like we should still report the "end of crash recovery timestamp" for
these instead of 2000-01-01 (which I guess is derived from 0) if we are not
willing to produce null (and it seems other parts of the system using these
stats assumes non-null).

David J.

David J.


Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Apr 13, 2022 at 4:38 PM Tom Lane  wrote:
>> It seems like a reliable fix might require test_setup to wait
>> for any background autovac to exit before it does its own
>> vacuums.  Ick.

> This is hardly a new problem, really. I wonder if it's worth inventing
> a comprehensive solution.

Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 4:38 PM Tom Lane  wrote:
> So what seems to be happening on wrasse is that a background
> autovacuum (or really autoanalyze?) is preventing pages from
> being marked all-visible not only during test_setup.sql but
> also create_index.sql; but it's gone by the time sanity_check.sql
> runs.

I agree that it would need to be an autoanalyze (due to the
PROC_IN_VACUUM optimization).

> It seems like a reliable fix might require test_setup to wait
> for any background autovac to exit before it does its own
> vacuums.  Ick.

This is hardly a new problem, really. I wonder if it's worth inventing
a comprehensive solution. Some kind of infrastructure that makes
VACUUM establish a next XID up-front (by calling
ReadNextTransactionId()), and then find a way to run with an
OldestXmin that's >= the earleir "next" XID value. If necessary by
waiting.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Apr 13, 2022 at 4:13 PM Andres Freund  wrote:
>> IIRC the problem in matter isn't skipped pages, but that the horizon simply 
>> isn't new enough to mark pages as all visible.

> Sometimes OldestXmin can go backwards in VACUUM operations that are
> run in close succession against the same table, due to activity from
> other databases in the same cluster (perhaps other factors are
> involved at times).

I've been doing some testing locally by inserting commands to
manually set tenk1's relallvisible to zero.  I first did that
in test_setup.sql ... and it had no effect whatsoever.  Further
experimentation showed that the "CREATE INDEX ON tenk1" steps
in create_index.sql itself generally suffice to fix relallvisible;
although if you force it back to zero after the last such command,
you get the same plan diffs wrasse is showing.  And you don't
get any others, which I thought curious until I realized that
sanity_check.sql's database-wide VACUUM offers yet another
opportunity to heal the incorrect value.  If you force it back
to zero again after that, a bunch of later tests start to show
plan differences, which is what I'd been expecting.

So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs.  Which is odd in itself because not that much time elapses
between create_index and sanity_check, certainly less than the
time from test_setup to create_index.

It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums.  Ick.

And we still lack an explanation of why this only now broke.
I remain suspicious that pgstats is behaving unexpectedly.

regards, tom lane




Re: shared-memory based stats collector - v70

2022-04-13 Thread David G. Johnston
On Tue, Apr 5, 2022 at 8:00 PM Andres Freund  wrote:

> Here comes v70:
>
>
One thing I just noticed while peeking at pg_stat_slru:

The stats_reset column for my newly initdb'd cluster is showing me
"2000-01-01 00:00:00" (v15).  I was expecting null, though a non-null value
restriction does make sense.  Neither choice is documented though.

Based upon my expectation I checked to see if v14 reported null, and thus
this was a behavior change.  v14 reports the initdb timestamp (e.g.,
2022-04-13 23:26:48.349115+00)

Can we document the non-null aspect of this value (pg_stat_database is
happy being null, this seems to be a "fixed" type behavior) but have it
continue to report initdb as its initial value?

David J.


Re: Temporary file access API

2022-04-13 Thread Bruce Momjian
On Wed, Apr 13, 2022 at 06:54:13PM -0400, Robert Haas wrote:
> On Wed, Apr 13, 2022 at 6:25 PM Bruce Momjian  wrote:
> > I don't think we want to be encrypting pg_xact/, so they can get the
> > transaction commit rate from there.
> 
> I think it would be a good idea to eventually encrypt SLRU data,
> including pg_xact. Maybe not in the first version.

I assume that would be very hard or slow, and of limited value.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 4:13 PM Andres Freund  wrote:
> IIRC the problem in matter isn't skipped pages, but that the horizon simply 
> isn't new enough to mark pages as all visible.

Sometimes OldestXmin can go backwards in VACUUM operations that are
run in close succession against the same table, due to activity from
other databases in the same cluster (perhaps other factors are
involved at times).

That's why the following assertion that I recently added to
vacuumlazy.c will fail pretty quickly without the
"vacrel->NewRelfrozenXid == OldestXmin" part of its test:

Assert(vacrel->NewRelfrozenXid == OldestXmin ||
   TransactionIdPrecedesOrEquals(aggressive ? FreezeLimit :
 vacrel->relfrozenxid,
 vacrel->NewRelfrozenXid));

If you remove "vacrel->NewRelfrozenXid == OldestXmin", and run the
regression tests, the remaining assertion will fail quite easily.
Though perhaps not with a serial "make check".

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi, 

On April 13, 2022 7:06:33 PM EDT, David Rowley  wrote:
>On Thu, 14 Apr 2022 at 10:54, Tom Lane  wrote:
>> After a bit more navel-contemplation I see a way that the pgstats
>> work could have changed timing in this area.  We used to have a
>> rate limit on how often stats reports would be sent to the
>> collector, which'd ensure half a second or so delay before a
>> transaction's change counts became visible to the autovac daemon.
>> I've not looked at the new code, but I'm betting that that's gone
>> and the autovac launcher might start a worker nearly immediately
>> after some foreground process finishes inserting some rows.
>> So that could result in autovac activity occurring concurrently
>> with test_setup where it didn't before.
>
>It's not quite clear to me why the manual vacuum wouldn't just cancel
>the autovacuum and complete the job.  I can't quite see how there's
>room for competing page locks here. Also, see [1].  One of the
>reported failing tests there is the same as one of the failing tests
>on wrasse. My investigation for the AIO branch found that
>relallvisible was not equal to relpages. I don't recall the reason why
>that was happening now.
>
>> As to what to do about it ... maybe apply the FREEZE and
>> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
>> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
>> not sufficient.
>
>We should likely try and confirm it's due to relallvisible first.

We had this issue before, and not just on the aio branch. On my phone right 
now, so won't look up references.

IIRC the problem in matter isn't skipped pages, but that the horizon simply 
isn't new enough to mark pages as all visible.  An independent autovac worker 
starting is enough for that, for example. Previously the data load and vacuum 
were further apart, preventing this kind of issue.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Temporary file access API

2022-04-13 Thread Stephen Frost
Greetings,

On Wed, Apr 13, 2022 at 18:54 Robert Haas  wrote:

> On Wed, Apr 13, 2022 at 6:25 PM Bruce Momjian  wrote:
> > I don't think we want to be encrypting pg_xact/, so they can get the
> > transaction commit rate from there.
>
> I think it would be a good idea to eventually encrypt SLRU data,
> including pg_xact. Maybe not in the first version.


I agree with Robert, on both parts.

I do think getting checksums for that data may be the first sensible step.

Thanks,

Stephen

>


Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 3:54 PM Tom Lane  wrote:
> After a bit more navel-contemplation I see a way that the pgstats
> work could have changed timing in this area.  We used to have a
> rate limit on how often stats reports would be sent to the
> collector, which'd ensure half a second or so delay before a
> transaction's change counts became visible to the autovac daemon.
> I've not looked at the new code, but I'm betting that that's gone
> and the autovac launcher might start a worker nearly immediately
> after some foreground process finishes inserting some rows.
> So that could result in autovac activity occurring concurrently
> with test_setup where it didn't before.

But why should it matter? The test_setup.sql VACUUM of tenk1 should
leave relallvisible and relpages in the same state, either way (or
very close to it).

The only way that it seems like it could matter is if OldestXmin was
held back during test_setup.sql's execution of the VACUUM command.

> As to what to do about it ... maybe apply the FREEZE and
> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
> not sufficient.

BTW, the work on VACUUM for Postgres 15 probably makes VACUUM test
flappiness issues less of a problem -- unless they're issues involving
something holding back OldestXmin when it shouldn't (in which case it
won't have any effect on test stability). I would expect that to be
the case, at least, since VACUUM now does almost all of the same work
for any individual page that it cannot get a cleanup lock on. There is
surprisingly little difference between a page that gets processed by
lazy_scan_prune and a page that gets processed by lazy_scan_noprune.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread David Rowley
On Thu, 14 Apr 2022 at 10:54, Tom Lane  wrote:
> After a bit more navel-contemplation I see a way that the pgstats
> work could have changed timing in this area.  We used to have a
> rate limit on how often stats reports would be sent to the
> collector, which'd ensure half a second or so delay before a
> transaction's change counts became visible to the autovac daemon.
> I've not looked at the new code, but I'm betting that that's gone
> and the autovac launcher might start a worker nearly immediately
> after some foreground process finishes inserting some rows.
> So that could result in autovac activity occurring concurrently
> with test_setup where it didn't before.

It's not quite clear to me why the manual vacuum wouldn't just cancel
the autovacuum and complete the job.  I can't quite see how there's
room for competing page locks here. Also, see [1].  One of the
reported failing tests there is the same as one of the failing tests
on wrasse. My investigation for the AIO branch found that
relallvisible was not equal to relpages. I don't recall the reason why
that was happening now.

> As to what to do about it ... maybe apply the FREEZE and
> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
> not sufficient.

We should likely try and confirm it's due to relallvisible first.

David

[1] 
https://www.postgresql.org/message-id/20220224153339.pqn64kseb5gpg...@alap3.anarazel.de




Re: PG DOCS - logical replication filtering

2022-04-13 Thread Peter Smith
I've changed the CF entry [1] status to "ready for committer".

--
[1] https://commitfest.postgresql.org/38/3605/

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Temporary file access API

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 6:25 PM Bruce Momjian  wrote:
> I don't think we want to be encrypting pg_xact/, so they can get the
> transaction commit rate from there.

I think it would be a good idea to eventually encrypt SLRU data,
including pg_xact. Maybe not in the first version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Apr 13, 2022 at 3:08 PM Tom Lane  wrote:
>> I'm tempted to add something like
>> SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
>> so that we can confirm or refute the theory that relallvisible is
>> the driving factor.

> It would be fairly straightforward to commit a temporary debugging
> patch that has the autovacuum logging stuff report directly on how
> VACUUM set new_rel_allvisible in pg_class. We should probably be doing
> that already, just because it's useful information that is already
> close at hand.

Doesn't look like wrasse has autovacuum logging enabled, though.

After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area.  We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.

As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.

regards, tom lane




Re: Temporary file access API

2022-04-13 Thread Bruce Momjian
On Mon, Apr 11, 2022 at 04:34:18PM -0400, Robert Haas wrote:
> On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska  wrote:
> > There are't really that many kinds of files to encrypt:
> >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> >
> > (And pg_stat/* files should be removed from the list.)
> 
> This kind of gets into some theoretical questions. Like, do we think
> that it's an information leak if people can look at how many
> transactions are committing and aborting in pg_xact_status? In theory
> it could be, but I know it's been argued that that's too much of a
> side channel. I'm not sure I believe that, but it's arguable.
> Similarly, the argument that global/pg_internal.init doesn't contain
> user data relies on the theory that the only table data that will make
> its way into the file is for system catalogs. I guess that's not user
> data *exactly* but ... are we sure that's how we want to roll here?

I don't think we want to be encrypting pg_xact/, so they can get the
transaction commit rate from there.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 3:08 PM Tom Lane  wrote:
> I'm tempted to add something like
>
> SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
>
> so that we can confirm or refute the theory that relallvisible is
> the driving factor.

It would be fairly straightforward to commit a temporary debugging
patch that has the autovacuum logging stuff report directly on how
VACUUM set new_rel_allvisible in pg_class. We should probably be doing
that already, just because it's useful information that is already
close at hand.

Might be a bit trickier to make sure that wrasse reliably reported on
all relevant VACUUMs, since that would have to include manual VACUUMs
(which would really have to use VACUUM VERBOSE), as well as
autovacuums.

-- 
Peter Geoghegan




Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
For the past five days or so, wrasse has been intermittently
failing due to unexpectedly not using an Index Only Scan plan
in the create_index test [1], eg

@@ -1910,11 +1910,15 @@
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
 ORDER BY unique1;
-  QUERY PLAN   

- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+QUERY PLAN 
+---
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+ Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+ ->  Bitmap Index Scan on tenk1_unique1
+   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)
 
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)

The most probable explanation for this seems to be that tenk1's
pg_class.relallvisible value hasn't been set high enough to make an IOS
look cheaper than the alternatives.  Where that ought to be getting set
is the "VACUUM ANALYZE tenk1" step in test_setup.sql.  It's plausible
I guess that a background autovacuum is preventing that command from
setting relallvisible as high as it ought to be --- but if so, why
are we only seeing two plans changing, on only one animal?

But what I'm really confused about is that this test arrangement has
been stable since early February.  Why has wrasse suddenly started
showing a 25% failure rate when it never failed this way before that?
Somebody has to have recently committed a change that affects this.
Checking the commit log up to the onset of the failures on 8 April,
I only see two plausible candidates:

* shared-memory pgstats
* Peter's recent VACUUM changes

Any connection to pgstats is, um, pretty obscure.  I'd finger the VACUUM
changes as a more likely trigger except that the last interesting-looking
one was f3c15cbe5 on 3 April, and wrasse got through "make check" 38 times
after that before its first failure of this kind.  That doesn't square with
the 25% failure rate since then, so I'm kind of forced to the conclusion
that the pgstats work changed some behavior that it should not have.

Any ideas?

I'm tempted to add something like

SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';

so that we can confirm or refute the theory that relallvisible is
the driving factor.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-08%2003%3A48%3A30




Re: Improving the "Routine Vacuuming" docs

2022-04-13 Thread David G. Johnston
On Wed, Apr 13, 2022 at 2:19 PM Peter Geoghegan  wrote:

> On Wed, Apr 13, 2022 at 1:25 PM Robert Haas  wrote:
> > On Wed, Apr 13, 2022 at 12:34 PM Peter Geoghegan  wrote:
> > > What do you think of the idea of relating freezing to removing tuples
> > > by VACUUM at this point? This would be a basis for explaining how
> > > freezing and tuple removal are constrained by the same cutoff.
>
> > I think something like that could be useful, if we can find a way to
> > word it sufficiently clearly.
>
> What if the current "25.1.5. Preventing Transaction ID
> Wraparound Failures" section was split into two parts? The first part
> would cover freezing, the second part would cover
> relfrozenxid/relminmxid advancement.
>
> Freezing can sensibly be discussed before introducing relfrozenxid.
> Freezing is a maintenance task that makes tuples self-contained
> things, suitable for long term storage. Freezing makes tuples not rely
> on transient transaction metadata (mainly clog), and so is an overhead
> of storing data in Postgres long term.
>
> That's how I think of it, at least. That definition seems natural to me.
>

I was trying to do that with my second try, and partially failed.  I'm on
board with the idea though.


> > Those all sound pretty reasonable.
>
> Great.
>
> I have two more things that I see as problems. Would be good to get
> your thoughts here, too. They are:
>
> 1. We shouldn't really be discussing VACUUM FULL here at all, except
> to say that it's out of scope, and probably a bad idea.
>
> You once wrote about the problem of how VACUUM FULL is perceived by
> users (VACUUM FULL doesn't mean "VACUUM, but better"), expressing an
> opinion of VACUUM FULL that I agree with fully. The docs definitely
> contributed to that problem.
>

I agree.  I would remove VACUUM FULL from the "Vacuuming Basics" and
"Recovering Disk Space" section aside from adding a warning box saying it
exists, it is not part of routine maintenance (hence out-of-scope for the
"Routine Vacuuming" Chapter), and to look elsewhere for details.

>
> 2. We don't go far enough in emphasizing the central role of autovacuum.
>
> Technically the entire section assumes that its primary audience are
> those users that have opted to not use autovacuum. This seems entirely
> backwards to me.
>

I would be on board with having the language of the entire section written
with the assumption that autovacuum is enabled, with a single statement
upfront that this is the case.  Most of the content remains as-is but we
remove a non-trivial number of sentences and fragments of the form "The
autovacuum daemon, if enabled, will..." and "For those not using
autovacuum,..."

If the basic content is deemed worthy of preservation, relocating all of
those kinds of hints and whatnot to a single "supplementing or disabling
auto-vacuum" section.


> > There's a little bit of doubt in my
> > mind about the third one; I think it could possibly be useful to
> > explain that the XID space is circular and 0-2 are special, but maybe
> > not.
>
> I understand the concern. I'm not saying that this kind of information
> doesn't have any business being in the docs. Just that it has no
> business being in this particular chapter of the docs. In fact, it
> doesn't even belong in "III. Server Administration". If it belongs
> anywhere, it should be in some chapter from "VII. Internals".
>

I think we do want to relocate some of this material elsewhere, and
Internals seems probable, and we'd want to have a brief sentence or two
here before pointing the reader to more information.  I'm sure we'll come
to some conclusion on the level of detail that lead-in should include.
Less is more to start with.  Unless the rest of the revised chapter is
going to lean heavily into it.


>
> Why shouldn't single-user mode also refuse to allocate new XIDs when
> we reach xidWrapLimit (as opposed to when we reach xidStopLimit)?
>
>
I lack the familiarity with the details here to comment on this last major
point.

I do think the "Server Administration" section is missing a chapter though
- "Error Handling and Recovery".

With that chapter in place I would mention the warning threshold in the
routine maintenance chapter as something that might be seen if routine
maintenance is misconfigured or encounters problems.  Then direct the user
to "Error Handling and Recovery" for discussion about the warning and
whatever else may happen if it is ignored; and how to go about fixing the
problem(s) that caused the warning.

David J.


Re: typos

2022-04-13 Thread David Rowley
(For the future, just to make discussions easier, it would be good if
you could have git format-patch -v N to give a unique version number
to these patches)

On Thu, 14 Apr 2022 at 05:40, Justin Pryzby  wrote:
> There's (only) a few remaining.

I've pushed 0001 and 0002 of the 3rd batch of patches. I left 0003 as
I just didn't feel it was a meaningful enough improvement.

>From docs/, if I do:

$ git grep ", which is the default" | wc -l
9

$ git grep ", the default" | wc -l
64

You're proposing to make the score 10, 63.  I'm not sure if that's a
good direction to go in.

David




allow specifying action when standby encounters incompatible parameter settings

2022-04-13 Thread Nathan Bossart
Hi hackers,

As of 15251c0, when a standby encounters an incompatible parameter change,
it pauses replay so that read traffic can continue while the administrator
fixes the parameters.  Once the server is restarted, replay can continue.
Before this change, such incompatible parameter changes caused the standby
to immediately shut down.

I noticed that there was some suggestion in the thread associated with
15251c0 [0] for making this behavior configurable, but there didn't seem to
be much interest at the time.  I am interested in allowing administrators
to specify the behavior before 15251c0 (i.e., immediately shut down the
standby when an incompatible parameter change is detected).  The use-case I
have in mind is when an administrator has automation in place for adjusting
these parameters and would like to avoid stopping replay any longer than
necessary.  FWIW this is what we do in RDS.

I've attached a patch that adds a new GUC where users can specify the
action to take when an incompatible parameter change is detected on a
standby.  For now, there are just two options: 'pause' and 'shutdown'.
This new GUC is largely modeled after recovery_target_action.

I initially set out to see if it was possible to automatically adjust these
parameters on a standby, but that is considerably more difficult.  It isn't
enough to just hook into the restart_after_crash functionality since it
doesn't go back far enough in the postmaster logic.  IIUC we'd need to
reload preloaded libraries (which there is presently no support for),
recalculate MaxBackends, etc.  Another option I considered was to
automatically adjust the parameters during startup so that you just need to
restart the server.  However, we need to know for sure that the server is
going to be a hot standby, and I don't believe we have that information
where such GUC changes would need to occur (I could be wrong about this).
Anyway, for now I'm just proposing the modest change described above, but
I'd welcome any discussion about improving matters further in this area.

[0] https://postgr.es/m/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b%402ndquadrant.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 754429b5ad4c9c8b40b66c9c0ede0a7572f0e071 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 4 Apr 2022 11:56:21 -0700
Subject: [PATCH v1 1/1] Introduce insufficient_standby_setting_action.

As of 15251c0, when a standby encounters an incompatible parameter
change when replaying WAL, it will pause replay to allow read
traffic to continue while the administrator figures out the next
steps.  After fixing the parameters, the server must be restarted.

This change introduces a new GUC to allow users to indicate that
the server should immediately shut down when it encounters such an
incompatible parameter change (i.e., the behavior before 15251c0).
This may be desirable when the administrator has automation for
adjusting incompatible parameter settings and wants to avoid
stopping replay any longer than necessary.
---
 doc/src/sgml/config.sgml  | 21 +
 doc/src/sgml/high-availability.sgml   | 14 +++---
 src/backend/access/transam/xlogrecovery.c | 12 -
 src/backend/utils/misc/guc.c  | 12 +
 src/backend/utils/misc/postgresql.conf.sample | 47 ++-
 src/include/access/xlog_internal.h|  9 
 src/include/access/xlogrecovery.h |  1 +
 7 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a1682f6d4d..54095e56e6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4901,6 +4901,27 @@ ANY num_sync ( 
+  insufficient_standby_setting_action (enum)
+  
+   insufficient_standby_setting_action configuration parameter
+  
+  
+  
+   
+Specifies what action a hot standby server should take when it
+encounters an incompatible parameter change (see
+).  The default is
+'pause', which means recovery will be paused.  After
+recovery is paused due to an incompatible parameter change, unpausing
+will cause the server to shut down.  'shutdown' means
+that the server should immediately shut down without pausing.  This
+parameter can only be set in the postgresql.conf
+file or on the server command line.
+   
+  
+ 
+
  
 
 
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b0a653373d..39baf07adf 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2043,9 +2043,11 @@ LOG:  database system is ready to accept read-only connections
 

 The WAL tracks changes to these parameters on the
-primary.  If a hot standby processes WAL that indicates that the current
-value on the primary is higher than its own value, it will log a warning
-and pause recovery, for example:
+  

Re: Improving the "Routine Vacuuming" docs

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 1:25 PM Robert Haas  wrote:
> On Wed, Apr 13, 2022 at 12:34 PM Peter Geoghegan  wrote:
> > What do you think of the idea of relating freezing to removing tuples
> > by VACUUM at this point? This would be a basis for explaining how
> > freezing and tuple removal are constrained by the same cutoff.

> I think something like that could be useful, if we can find a way to
> word it sufficiently clearly.

What if the current "25.1.5. Preventing Transaction ID
Wraparound Failures" section was split into two parts? The first part
would cover freezing, the second part would cover
relfrozenxid/relminmxid advancement.

Freezing can sensibly be discussed before introducing relfrozenxid.
Freezing is a maintenance task that makes tuples self-contained
things, suitable for long term storage. Freezing makes tuples not rely
on transient transaction metadata (mainly clog), and so is an overhead
of storing data in Postgres long term.

That's how I think of it, at least. That definition seems natural to me.

> Those all sound pretty reasonable.

Great.

I have two more things that I see as problems. Would be good to get
your thoughts here, too. They are:

1. We shouldn't really be discussing VACUUM FULL here at all, except
to say that it's out of scope, and probably a bad idea.

You once wrote about the problem of how VACUUM FULL is perceived by
users (VACUUM FULL doesn't mean "VACUUM, but better"), expressing an
opinion of VACUUM FULL that I agree with fully. The docs definitely
contributed to that problem.

2. We don't go far enough in emphasizing the central role of autovacuum.

Technically the entire section assumes that its primary audience are
those users that have opted to not use autovacuum. This seems entirely
backwards to me.

We should make it clear that technically autovacuum isn't all that
different from running your own VACUUM commands, because that's an
important part of understanding autovacuum. But that's all. ISTM that
anybody that *entirely* opts out of using autovacuum is just doing it
wrong (besides, it's kind of impossible to do it anyway, what with
anti-wraparound autovacuum being impossible to disable).

There is definitely a role for using tools like cron to schedule
off-hours VACUUM operations, and that's still worth pointing out
prominently. But that should be a totally supplementary thing, used
when the DBA understands that running VACUUM off-hours is less
disruptive.

> There's a little bit of doubt in my
> mind about the third one; I think it could possibly be useful to
> explain that the XID space is circular and 0-2 are special, but maybe
> not.

I understand the concern. I'm not saying that this kind of information
doesn't have any business being in the docs. Just that it has no
business being in this particular chapter of the docs. In fact, it
doesn't even belong in "III. Server Administration". If it belongs
anywhere, it should be in some chapter from "VII. Internals".

Discussing it here just seems inappropriate (and would be even if it
wasn't how we introduce discussion of wraparound). It's really only
tangentially related to VACUUM anyway. It seems like it should be
covered when discussing the heapam on-disk representation.

> I think it is probably important to discuss this, but along the lines
> of: it is possible to bypass all of these safeguards and cause a true
> wraparound by running in single-user mode. Don't do that. There's no
> wraparound situation that can't be addressed just fine in multi-user
> mode, and here's how to do that. In previous releases, we used to
> sometimes recommend single user mode, but that's no longer necessary
> and not a good idea, so steer clear.

Yeah, that should probably happen somewhere.

On the other hand...why do we even need to tolerate wraparound in
single-user mode? I do see some value in reserving extra XIDs that can
be used in single-user mode (apparently single-user mode can be used
in scenarios where you have buggy event triggers, things like that).
But that in itself does not justify allowing single-user mode to
exceed xidWrapLimit.

Why shouldn't single-user mode also refuse to allocate new XIDs when
we reach xidWrapLimit (as opposed to when we reach xidStopLimit)?

Maybe there is a good reason to believe that allowing single-user mode
to corrupt the database is the lesser evil, but if there is then I'd
like to know the reason.


--
Peter Geoghegan




Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}

2022-04-13 Thread Michael Paquier
On Wed, Apr 13, 2022 at 02:58:28PM +, gkokola...@pm.me wrote:
> It's really not hard to add compression level. However we had briefly
> discussed it in the original thread [1] and decided against. That is why
> I did not write that code. If the community thinks differently now, let
> me know if you would like me to offer a patch for it.

The issue back then was how to design the option set to handle all
that (right?  My memories may be short on that), and pg_basebackup
takes care of that with its option design.

This is roughly what has been done here, except that this was for the
contentSize:
https://www.postgresql.org/message-id/rYyZ3Fj9qayyY9-egNsV_kkLbL_BSWcOEdi3Mb6M9eQRTkcA2jrqFEHglLUEYnzWR_wttCqn7VI94MZ2p7mwNje51lHTvWYnJ1jHdOceen4=@pm.me

Do you think that the extra test coverage is going to be too much of a
burden?  I was thinking about just adding a level to the main lz4
command, with an extra negative test in the SKIP block with a level
out of range
--
Michael


signature.asc
Description: PGP signature


Re: timezones BCE

2022-04-13 Thread chap

On 2022-04-13 14:13, Dave Cramer wrote:


Oh please don't do something bespoke. I'm trying to make this work with 
the

JDBC driver.
So it has to be at least compatible with other libraries.


Looks like Java agrees with the offset, prior to Toronto's 1895 adoption
of the hour-wide zone:

jshell> java.time.ZoneId.of("America/Toronto").
   ...> getRules().
   ...> nextTransition(java.time.Instant.parse("0101-01-01T00:00:00Z"))
$1 ==> Transition[Gap at 1895-01-01T00:00-05:17:32 to -05:00]

Regards,
-Chap




Re: CLUSTER on partitioned index

2022-04-13 Thread Michael Paquier
On Wed, Apr 13, 2022 at 05:52:14AM -0500, Justin Pryzby wrote:
> Are you sure?  The ownership re-check in cluster_rel() occurs after acquiring
> locks.

Yep, you are right.  However, the SQL test does not check for this
blocking scenario.  In short, removing the new ACL check in
get_tables_to_cluster_partitioned() makes the test behave the same
way.  Could you implement an isolation check to make sure that the
difference is visible?  The SQL check looks useful in itself, either
way.
--
Michael


signature.asc
Description: PGP signature


Re: typos

2022-04-13 Thread David Rowley
On Mon, 11 Apr 2022 at 22:10, Justin Pryzby  wrote:
> Thanks for amending and pushing those.  There's some more less obvious ones
> attached.

Here are my notes from yesterday that I made when reviewing and
pushing many of the 2nd batch of patches.

0001: Pushed and back patched to v12

0002: Didn't push. Compression method/algorithm.

0003: Pushed and backpatched to v13

0004: Pushed (reviewed by Robert)

0005: Alvaro Pushed
0006: Alvaro Pushed

0007: Not pushed. No space after comment and closing */  pgindent
fixed one of these but not the other 2.  I've not looked into why
pgindent does 1 and not the other 2.

0008: Pushed

I've left out the following change as it does not seem to be bringing
any sort of consistency to the docs overall. It only brings
consistency to a single source file in the docs.

-  You need zstd, if you want to support
+  You need ZSTD, if you want to support

See: git grep -i ">zstd<"

0009:

This contains a few fixes that look correct. Not sure if the following
has any use as a change:

-See the description of the respective commands and programs for the
-respective details.  Note that you can mix locale providers on different
+See the description of the respective commands and programs for
+details.  Note that you can mix locale providers at different

0010: Pushed

0011: Not pushed. Not sure if this is worth the change.

0012: Amit Pushed

0013: Not pushed. Adds a missing comma.

David




SQL JSON compliance

2022-04-13 Thread Andrew Dunstan


Simon has just pointed out to me that as a result of recent commits, a
number of things now should move from the unsupported table to the
supported table in features.sgml. In particular, it looks to me like all
of these should move:

T811           Basic SQL/JSON constructor functions      
T812           SQL/JSON: JSON_OBJECTAGG      
T813           SQL/JSON: JSON_ARRAYAGG with ORDER BY      
T814           Colon in JSON_OBJECT or JSON_OBJECTAGG      
T821           Basic SQL/JSON query operators      
T822           SQL/JSON: IS JSON WITH UNIQUE KEYS predicate      
T823           SQL/JSON: PASSING clause      
T824           JSON_TABLE: specific PLAN clause      
T825           SQL/JSON: ON EMPTY and ON ERROR clauses      
T826           General value expression in ON ERROR or ON EMPTY clauses
     
T827           JSON_TABLE: sibling NESTED COLUMNS clauses      
T828           JSON_QUERY      
T829           JSON_QUERY: array wrapper options      
T830           Enforcing unique keys in SQL/JSON constructor functions      
T838           JSON_TABLE: PLAN DEFAULT clause


If there's no objection I'll make it so.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-13 Thread Justin Pryzby
On Wed, Mar 30, 2022 at 05:58:24PM +0900, Kyotaro Horiguchi wrote:
> I'm not still confident on this, but it should be better than the v1.

+Andres as this seems to be related to 277692220.
I added it as an Opened Item.




Re: Improving the "Routine Vacuuming" docs

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 12:34 PM Peter Geoghegan  wrote:
> What do you think of the idea of relating freezing to removing tuples
> by VACUUM at this point? This would be a basis for explaining how
> freezing and tuple removal are constrained by the same cutoff. A very
> old snapshot can hold up cleanup, but it can also hold up freezing to
> the same degree (it's just not as obvious because we are less eager
> about freezing by default).

I think something like that could be useful, if we can find a way to
word it sufficiently clearly.

> Perhaps we can agree on some (or even all) of the following specific points:
>
> * We shouldn't mention "4 billion XIDs" at all.
>
> * We should say that the issue is an issue of distances between
> unfrozen XIDs. The maximum distance that can ever be allowed to emerge
> between any two unfrozen XIDs in a cluster is about 2 billion XIDs.
>
> * We don't need to say anything about how XIDs are compared, normal vs
> permanent XIDs, etc.
>
> * The system takes drastic intervention to prevent this implementation
> restriction from becoming a problem, starting with anti-wraparound
> autovacuums. Then there's the failsafe. Finally, there's the
> xidStopLimit mechanism, our last line of defense.

Those all sound pretty reasonable. There's a little bit of doubt in my
mind about the third one; I think it could possibly be useful to
explain that the XID space is circular and 0-2 are special, but maybe
not.

> > I think it is wrong to conflate wraparound with xidStopLimit.
> > xidStopLimit is the final defense against an actual wraparound, and
> > like I say, an actual wraparound is quite possible if you put the
> > system in single user mode and then do something like this:
>
> I forget to emphasize one aspect of the problem that seems quite
> important: the document itself seems to conflate the xidStopLimit
> mechanism with true wraparound. At least I thought so. Last year's
> thread on this subject ('What is "wraparound failure", really?') was
> mostly about that confusion. I personally found that very confusing,
> and I doubt that I'm the only one.

OK.

> There is no good reason to use single user mode anymore (a related
> problem with the docs is that we still haven't made that point). And

Agreed.

> the pg_upgrade bug that led to invalid relfrozenxid values was
> flagrantly just a bug (adding a WARNING for this recently, in commit
> e83ebfe6). So while I accept that the distinction you're making here
> is valid, maybe we can fix the single user mode doc bug too, removing
> the need to discuss "true wraparound" as a general phenomenon. You
> shouldn't ever see it in practice anymore. If you do then either
> you've done something that "invalidated the warranty", or you've run
> into a legitimate bug.

I think it is probably important to discuss this, but along the lines
of: it is possible to bypass all of these safeguards and cause a true
wraparound by running in single-user mode. Don't do that. There's no
wraparound situation that can't be addressed just fine in multi-user
mode, and here's how to do that. In previous releases, we used to
sometimes recommend single user mode, but that's no longer necessary
and not a good idea, so steer clear.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PG DOCS - logical replication filtering

2022-04-13 Thread Euler Taveira
On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:
> PSA patch v10 which addresses the remaining review comments from Euler [1]
Looks good to me.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Pluggable toaster

2022-04-13 Thread Nikita Malakhov
Hi,
For a pluggable toaster - in previous patch set part 7 patch file contains
invalid string.
Fixup (v2 file should used instead of previous) patch:
7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast
pointer's
alignment required by bytea toaster by Nikita Glukhov;


On Wed, Apr 13, 2022 at 9:55 PM Nikita Malakhov  wrote:

> Hi,
> I reworked previous patch set according to recommendations. Patches
> are generated by format-patch and applied by git am. Patches are based on
> master from 03.11. Also, now we've got clean branch with incremental
> commits
> which could be easily rebased onto a fresh master.
>
> Currently, there are 8 patches:
>
> 1) 0001_create_table_storage_v3.patch - SET STORAGE option for CREATE
> TABLE command by Teodor Sigaev which is required by all the following
> functionality;
>
> 2) 0002_toaster_interface_v6.patch - Toaster API (SQL syntax for toasters
> + API)
> with Dummy toaster as an example of how this API should be used, but with
> default
> toaster left 'as-is';
>
> 3) 0003_toaster_default_v5.patch - default (regular) toaster is implemented
> via new API;
>
> 4) 0004_toaster_snapshot_v5.patch - refactoring of default toaster and
> support
> of versioned toasted rows;
>
> 5) 0005_bytea_appendable_toaster_v5.patch - bytea toaster by Nikita Glukhov
> Custom toaster for bytea data with support of appending (instead of
> rewriting)
> stored data;
>
> 6) 0006_toasterapi_docs_v1.patch - brief documentation on Toaster API in
> Pg docs;
>
> 7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast
> pointer's
> alignment required by bytea toaster by Nikita Glukhov;
>
> 8) 0008_fix_toast_tuple_externalize.patch - fixes toast_tuple_externalize
> function
> not to call toast if old data is the same as new one.
>
> I would be grateful for feedback on the reworked patch set.
>
> On Mon, Apr 4, 2022 at 11:18 PM Robert Haas  wrote:
>
>> On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov  wrote:
>> > - Is 'git apply' not a valid way to apply such patches?
>>
>> I have found that it never works. This case is no exception:
>>
>> [rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
>> /Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing
>> whitespace.
>> toasterapi.o
>> /Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing
>> whitespace.
>> {
>> /Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing
>> whitespace.
>>  * CREATE TOASTER name HANDLER handler_name
>> /Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing
>> whitespace.
>>  * va_toasterdata could contain varatt_external structure for old Toast
>> /Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing
>> whitespace.
>> SELECT attnum, attname, atttypid, attstorage, tsrname
>> error: patch failed: src/backend/commands/tablecmds.c:42
>> error: src/backend/commands/tablecmds.c: patch does not apply
>> error: patch failed: src/backend/commands/tablecmds.c:943
>> error: src/backend/commands/tablecmds.c: patch does not apply
>> error: patch failed: src/backend/commands/tablecmds.c:973
>> error: src/backend/commands/tablecmds.c: patch does not apply
>> error: patch failed: src/backend/commands/tablecmds.c:44
>> error: src/backend/commands/tablecmds.c: patch does not apply
>>
>> I would really encourage you to use 'git format-patch' to generate a
>> stack of patches. But there is no point in reposting 30+ patches that
>> haven't been properly refactored into separate chunks. You need to
>> maintain a branch, periodically rebased over master, with some
>> probably-small number of patches on it, each of which is a logically
>> independent patch with its own commit message, its own clear purpose,
>> etc. And then generate patches to post from there using 'git
>> format-patch'. Look into using 'git rebase -i --autosquash' and 'git
>> commit --fixup' to maintain the branch, if you're not already familiar
>> with those things.
>>
>> Also, it is a really good idea when you post the patch set to include
>> in the email a clear description of the overall purpose of the patch
>> set and what each patch does toward that goal. e.g. "The overall goal
>> of this patch set is to support faster-than-light travel. Currently,
>> PostgreSQL does not know anything about the speed of light, so 0001
>> adds some code for speed-of-light detection. Building on this, 0002
>> adds general support for disabling physical laws of the universe.
>> Then, 0003 makes use of this support to disable specifically the speed
>> of light." Perhaps you want a little more text than that for each
>> patch, depending on the situation, but this gives you the idea, I
>> hope.
>>
>> --
>> Robert Haas
>> EDB: http://www.enterprisedb.com
>>
>
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> https://postgrespro.ru/
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


0007_fix_alignment_of_custom_toast_pointers_v2.patch.gz
Description: GNU Zip 

Re: How about a psql backslash command to show GUCs?

2022-04-13 Thread Tom Lane
"Jonathan S. Katz"  writes:
> On 4/12/22 1:00 PM, Tom Lane wrote:
>> Yeah, most of what shows up in a minimally-configured installation is
>> postmaster-computed settings like config_file, rather than things
>> that were actually set by the DBA.  Personally I'd rather hide the
>> ones that have source = 'override', but that didn't seem to be the
>> consensus.

> The list seems more reasonable now, though now that I'm fully in the 
> "less is more" camp based on the "non-defaults" description, I think 
> anything we can do to further prune is good.

Hearing no further comments, I pushed this version.  There didn't seem
to be a need to adjust the docs.

> We may be at a point where it's "good enough" and let more people chime 
> in during beta.

Right, lots of time yet for bikeshedding in beta.

regards, tom lane




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Nathan Bossart
On Wed, Apr 13, 2022 at 12:05:08PM -0400, Tom Lane wrote:
> Robert Haas  writes:
>> It may be too much to hope that we're going to completely get rid of
>> process_shared_preload_libraries_in_progress tests.
> 
> Perhaps, but this is definitely an area that could stand to have some
> serious design thought put into it.  You're quite right that it's
> largely a mess today, but I think proposals like "forbid setting GUCs
> during _PG_init" would add to the mess rather than clean things up.

+1.  The simplest design might be to just make a separate preload hook.
_PG_init() would install hooks, and then this preload hook would do
anything that needs to happen when the library is loaded via s_p_l.
However, I think we still want a new shmem request hook where MaxBackends
is initialized.

If we do move forward with the shmem request hook, do we want to disallow
shmem requests anywhere else, or should we just leave it up to extension
authors to do the right thing?  Many shmem requests in _PG_init() are
probably okay unless they depend on MaxBackends or another GUC that someone
might change.  Given that, I think I currently prefer the latter (option B
from upthread).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: timezones BCE

2022-04-13 Thread Dave Cramer
On Wed, 13 Apr 2022 at 14:10, Tom Lane  wrote:

> c...@anastigmatix.net writes:
> > On 2022-04-13 12:33, Dave Cramer wrote:
> >> Specifically why the -05:17:32
>
> > Timezones were regularized into their (typically hour-wide) chunks
> > during a period around the late nineteenth century IIRC.
>
> > If you decompile the zoneinfo database to look at America/Toronto,
> > you will probably find an entry for dates earlier than when the
> > regularized zones were established there, and that entry will have
> > an offset reflecting Toronto's actual longitude.
>
> Yeah, you'll see these weird offsets in just about every zone for dates
> earlier than the late 1800s.  I've got my doubts about how useful it is
> to do that, but that's the policy the tzdb guys have.
>
> At one point I was considering whether we could project the oldest
> recorded "standard time" offset backwards instead of believing the LMT
> offsets.  This would confuse many fewer people, and it's no less
> logically defensible than applying the Gregorian calendar to years
> centuries before Pope Gregory was born.  But I fear that horse may
> have left the barn already --- changing this behavior would have
> its own downsides, and I do not think any other tzdb consumers do it.
>

Oh please don't do something bespoke. I'm trying to make this work with the
JDBC driver.
So it has to be at least compatible with other libraries.

Dave


Re: timezones BCE

2022-04-13 Thread Tom Lane
c...@anastigmatix.net writes:
> On 2022-04-13 12:33, Dave Cramer wrote:
>> Specifically why the -05:17:32

> Timezones were regularized into their (typically hour-wide) chunks
> during a period around the late nineteenth century IIRC.

> If you decompile the zoneinfo database to look at America/Toronto,
> you will probably find an entry for dates earlier than when the
> regularized zones were established there, and that entry will have
> an offset reflecting Toronto's actual longitude.

Yeah, you'll see these weird offsets in just about every zone for dates
earlier than the late 1800s.  I've got my doubts about how useful it is
to do that, but that's the policy the tzdb guys have.

At one point I was considering whether we could project the oldest
recorded "standard time" offset backwards instead of believing the LMT
offsets.  This would confuse many fewer people, and it's no less
logically defensible than applying the Gregorian calendar to years
centuries before Pope Gregory was born.  But I fear that horse may
have left the barn already --- changing this behavior would have
its own downsides, and I do not think any other tzdb consumers do it.

regards, tom lane




Re: typos

2022-04-13 Thread Justin Pryzby
On Wed, Apr 13, 2022 at 07:29:34PM +0200, Alvaro Herrera wrote:
> On 2022-Apr-11, David Rowley wrote:
> 
> > and also skipped:
> > 0016 (unsure if we should change these of pgindent is not touching it)
> > 0017 (unsure if we should change these of pgindent is not touching it)
> 
> I verified that pgindent will indeed not touch these changes by running
> before and after.  (I accepted one comment placement from that run that
> touched a neighboring line.)
> 
> I think pgindent is right not to modify vertical space very much, since
> in many cases it amounts to a subjective decision.  The patch seemed a
> (small) improvement, and it seems hard to make too much of a fuss about
> such things.  Pushed them as a single commit.
> 
> I hadn't noticed that Justin had posted a refreshed patch series, so I
> don't know if the new ones match what I pushed.

There were no changes - I had resent the patches that removed blank lines so it
was apparent that they were "outstanding" / under discussion.

There's (only) a few remaining.
>From 543b9d77763da814cf1a99938252eb16a0a7d131 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 12 Mar 2022 14:55:18 -0600
Subject: [PATCH 1/4] comment spaces

---
 src/backend/storage/file/fd.c | 2 +-
 src/include/replication/message.h | 2 +-
 src/include/tsearch/ts_type.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 14b77f28617..24704b6a023 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -912,7 +912,7 @@ InitFileAccess(void)
 void
 InitTemporaryFileAccess(void)
 {
-	Assert(SizeVfdCache != 0);	/* InitFileAccess() needs to have run*/
+	Assert(SizeVfdCache != 0);	/* InitFileAccess() needs to have run */
 	Assert(!temporary_files_allowed);	/* call me only once */
 
 	/*
diff --git a/src/include/replication/message.h b/src/include/replication/message.h
index 7d7785292f1..b9686c28550 100644
--- a/src/include/replication/message.h
+++ b/src/include/replication/message.h
@@ -32,7 +32,7 @@ typedef struct xl_logical_message
 extern XLogRecPtr LogLogicalMessage(const char *prefix, const char *message,
 	size_t size, bool transactional);
 
-/* RMGR API*/
+/* RMGR API */
 #define XLOG_LOGICAL_MESSAGE	0x00
 void		logicalmsg_redo(XLogReaderState *record);
 void		logicalmsg_desc(StringInfo buf, XLogReaderState *record);
diff --git a/src/include/tsearch/ts_type.h b/src/include/tsearch/ts_type.h
index a2008f5504b..689b2d1cfb6 100644
--- a/src/include/tsearch/ts_type.h
+++ b/src/include/tsearch/ts_type.h
@@ -171,7 +171,7 @@ typedef struct
 
 extern PGDLLIMPORT const int tsearch_op_priority[OP_COUNT];
 
-/* get operation priority  by its code*/
+/* get operation priority by its code */
 #define OP_PRIORITY(x)	( tsearch_op_priority[(x) - 1] )
 /* get QueryOperator priority */
 #define QO_PRIORITY(x)	OP_PRIORITY(((QueryOperator *) (x))->oper)
-- 
2.17.1

>From e84373bd78d6634781536a96a083cf26c87aa53e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 25 Mar 2022 13:04:48 -0500
Subject: [PATCH 3/4] doc review: locales

f2553d43060edb210b36c63187d52a632448e1d2
---
 doc/src/sgml/charset.sgml| 12 ++--
 doc/src/sgml/ref/initdb.sgml |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index d60d3207fd4..b95303fb893 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -314,7 +314,7 @@ initdb --locale=sv_SE
   A locale can be selected separately for each database.  The SQL command
   CREATE DATABASE and its command-line equivalent
   createdb have options for that.  Use this for example
-  if a database cluster houses databases for multiple tennants with
+  if a database cluster houses databases for multiple tenants with
   different requirements.
  
 
@@ -346,7 +346,7 @@ initdb --locale=sv_SE
 providers.  This specifies which library supplies the locale
 data.  One standard provider name is libc, which uses
 the locales provided by the operating system C library.  These are the
-locales that most tools provided by the operating system use.  Another
+locales used by most tools provided by the operating system.  Another
 provider is icu, which uses the external
 ICUICU library.  ICU locales can
 only be used if support for ICU was configured when PostgreSQL was built.
@@ -361,8 +361,8 @@ initdb --locale=sv_SE
 
 initdb --locale-provider=icu --icu-locale=en
 
-See the description of the respective commands and programs for the
-respective details.  Note that you can mix locale providers on different
+See the description of the respective commands and programs for
+details.  Note that you can mix locale providers at different
 granularities, for example use libc by default for the
 cluster but have one database that uses the icu
 provider, and then have collation objects 

Re: typos

2022-04-13 Thread Alvaro Herrera
On 2022-Apr-11, David Rowley wrote:

> and also skipped:
> 0016 (unsure if we should change these of pgindent is not touching it)
> 0017 (unsure if we should change these of pgindent is not touching it)

I verified that pgindent will indeed not touch these changes by running
before and after.  (I accepted one comment placement from that run that
touched a neighboring line.)

I think pgindent is right not to modify vertical space very much, since
in many cases it amounts to a subjective decision.  The patch seemed a
(small) improvement, and it seems hard to make too much of a fuss about
such things.  Pushed them as a single commit.

I hadn't noticed that Justin had posted a refreshed patch series, so I
don't know if the new ones match what I pushed.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Postgres is bloatware by design: it was built to house
 PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)




Re: timezones BCE

2022-04-13 Thread chap

On 2022-04-13 12:33, Dave Cramer wrote:

test=# set timezone to 'America/Toronto';
SET
test=# select '0101-01-01'::timestamptz;
 timestamptz
--
 0101-01-01 00:00:00-05:17:32

Specifically why the -05:17:32


Timezones were regularized into their (typically hour-wide) chunks
during a period around the late nineteenth century IIRC.

If you decompile the zoneinfo database to look at America/Toronto,
you will probably find an entry for dates earlier than when the
regularized zones were established there, and that entry will have
an offset reflecting Toronto's actual longitude.

Regards,
-Chap




Re: Improving the "Routine Vacuuming" docs

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 8:40 AM Robert Haas  wrote:
> > Something along the lines of the following seems more useful: "A tuple
> > whose xmin is frozen (and xmax is unset) is considered visible to
> > every possible MVCC snapshot. In other words, the transaction that
> > inserted the tuple is treated as if it ran and committed at some point
> > that is now *infinitely* far in the past."
>
> I agree with this idea.

Cool. Maybe I should write a doc patch just for this part, then.

What do you think of the idea of relating freezing to removing tuples
by VACUUM at this point? This would be a basis for explaining how
freezing and tuple removal are constrained by the same cutoff. A very
old snapshot can hold up cleanup, but it can also hold up freezing to
the same degree (it's just not as obvious because we are less eager
about freezing by default).

> > The alarming language isn't proportionate to the true danger
> > (something I complained about in a dedicated thread last year [1]).
>
> I mostly agree with this, but not entirely. The section needs some
> rephrasing, but xidStopLimit doesn't apply in single-user mode, and
> relfrozenxid and datfrozenxid values can and do get corrupted. So it's
> not a purely academic concern.

I accept the distinction that you want to make is valid. More on that below.

> > * XID space isn't really a precious resource -- it isn't even a
> > resource at all IMV.
>
> I disagree with this. Usable XID space is definitely a resource, and
> if you're in the situation where you care deeply about this section of
> the documentation, it's probably one in short supply. Being careful
> not to expend too many XIDs while fixing the problems that have cause
> you to be short of safe XIDs is *definitely* a real thing.

I may have gone too far with this metaphor. My point was mostly that
XID space has a highly unpredictable cost (paid in freezing).

Perhaps we can agree on some (or even all) of the following specific points:

* We shouldn't mention "4 billion XIDs" at all.

* We should say that the issue is an issue of distances between
unfrozen XIDs. The maximum distance that can ever be allowed to emerge
between any two unfrozen XIDs in a cluster is about 2 billion XIDs.

* We don't need to say anything about how XIDs are compared, normal vs
permanent XIDs, etc.

* The system takes drastic intervention to prevent this implementation
restriction from becoming a problem, starting with anti-wraparound
autovacuums. Then there's the failsafe. Finally, there's the
xidStopLimit mechanism, our last line of defense.

> I think it is wrong to conflate wraparound with xidStopLimit.
> xidStopLimit is the final defense against an actual wraparound, and
> like I say, an actual wraparound is quite possible if you put the
> system in single user mode and then do something like this:

I forget to emphasize one aspect of the problem that seems quite
important: the document itself seems to conflate the xidStopLimit
mechanism with true wraparound. At least I thought so. Last year's
thread on this subject ('What is "wraparound failure", really?') was
mostly about that confusion. I personally found that very confusing,
and I doubt that I'm the only one.

There is no good reason to use single user mode anymore (a related
problem with the docs is that we still haven't made that point). And
the pg_upgrade bug that led to invalid relfrozenxid values was
flagrantly just a bug (adding a WARNING for this recently, in commit
e83ebfe6). So while I accept that the distinction you're making here
is valid, maybe we can fix the single user mode doc bug too, removing
the need to discuss "true wraparound" as a general phenomenon. You
shouldn't ever see it in practice anymore. If you do then either
you've done something that "invalidated the warranty", or you've run
into a legitimate bug.

-- 
Peter Geoghegan




timezones BCE

2022-04-13 Thread Dave Cramer
Can someone help me understand

select '0101-01-01'::timestamptz;
  timestamptz

 0101-01-01 00:00:00+00
(1 row)

test=# set timezone to 'America/Toronto';
SET
test=# select '0101-01-01'::timestamptz;
 timestamptz
--
 0101-01-01 00:00:00-05:17:32
(1 row)

select 'now()'::timestamptz;
  timestamptz
---
 2022-04-13 12:31:57.271967-04
(1 row)

Specifically why the -05:17:32


Dave Cramer


Re: row filtering for logical replication

2022-04-13 Thread Alvaro Herrera
On 2022-Apr-13, Amit Kapila wrote:

> Thanks, this will work and fix the issue. I think this looks better
> than the current code, 

Thanks for looking!  Pushed.

> however, I am not sure if the handling for the
> concurrently dropped tables is better (both get_rel_relkind() and
> get_rel_name() can fail due to those reasons). I understand this won't
> fail because of the protection you have in the patch,

Well, the point is that these routines return NULL if the relation
cannot be found in the cache, so just doing "continue" (without raising
any error) if any of those happens is sufficient for correct behavior.

BTW I just noticed that AlterPublicationOptions acquires only
ShareAccessLock on the publication object.  I think this is too lax ...
what if two of them run concurrently? (say to specify different
published actions)  Do they overwrite the other's update?  I think it'd
be better to acquire ShareUpdateExclusive to ensure only one is running
at a time.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"This is a foot just waiting to be shot"(Andrew Dunstan)




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Tom Lane
Robert Haas  writes:
> What's a little wonky right now is that it's fairly common for
> extensions to just return straight-off without doing *anything at all*
> if !process_shared_preload_libraries_in_progress. See
> pg_stat_statements for example. It seems kind of strange to me to make
> registering GUCs dependent on
> process_shared_preload_libraries_in_progress; why shouldn't that just
> happen always? It's understandable that we don't want to install the
> hook functions if we're not being loaded from shared_preload_libaries,
> though.

Yeah, I was just investigating that.  The problem that pg_stat_statements
has is that it wants to create PGC_POSTMASTER GUCs, which is something
that guc.c specifically forbids:

/*
 * Only allow custom PGC_POSTMASTER variables to be created during shared
 * library preload; any later than that, we can't ensure that the value
 * doesn't change after startup.  This is a fatal elog if it happens; just
 * erroring out isn't safe because we don't know what the calling loadable
 * module might already have hooked into.
 */
if (context == PGC_POSTMASTER &&
!process_shared_preload_libraries_in_progress)
elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");

The key reason guc.c does that is that if it allowed the case, then there
would be no guarantee that a "PGC_POSTMASTER" GUC has the same value in
every backend of the cluster, which'd likely break most use-cases for
such a GUC (it'd certainly break pg_stat_statements, which assumes that
the local setting of that GUC reflects the size of its shmem area).
Perhaps we can improve on that situation with some more thought, but
I'm not very clear on how.

> It may be too much to hope that we're going to completely get rid of
> process_shared_preload_libraries_in_progress tests.

Perhaps, but this is definitely an area that could stand to have some
serious design thought put into it.  You're quite right that it's
largely a mess today, but I think proposals like "forbid setting GUCs
during _PG_init" would add to the mess rather than clean things up.

regards, tom lane




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 11:39 AM Tom Lane  wrote:
> Is there anything else typically done in _PG_init that has to be
> conditional on process_shared_preload_libraries_in_progress?  I recall
> something about reserving LWLocks, which probably should get the same
> treatment.  If we can get to a point where _PG_init shouldn't need to
> care about process_shared_preload_libraries_in_progress, because all
> the stuff that would care is moved to this new hook, then that would
> be very clear cleanup.

What's a little wonky right now is that it's fairly common for
extensions to just return straight-off without doing *anything at all*
if !process_shared_preload_libraries_in_progress. See
pg_stat_statements for example. It seems kind of strange to me to make
registering GUCs dependent on
process_shared_preload_libraries_in_progress; why shouldn't that just
happen always? It's understandable that we don't want to install the
hook functions if we're not being loaded from shared_preload_libaries,
though.

It may be too much to hope that we're going to completely get rid of
process_shared_preload_libraries_in_progress tests. But even given
that, I think having a request-shared-mem hook makes sense and would
make things cleaner than they are today. Maybe in the future we'll end
up with other hooks as well, like a "this is the place to register
GUCs" hook and a perhaps a "this is the place to register you custom
rmgr" hook. I'm not really sure that we need those and I don't want to
make things complicated just for the heck of it, but it's not shocking
that different operations need to happen at different times in the
startup sequence, and I don't think we should be afraid to split up
the monolithic _PG_init() into as many separate things as are required
to make it work right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Temporary file access API

2022-04-13 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska  wrote:
> > Robert Haas  wrote:
> > > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska  wrote:
> > > > There are't really that many kinds of files to encrypt:
> > > >
> > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> > > >
> > > > (And pg_stat/* files should be removed from the list.)
> > >
> > > This kind of gets into some theoretical questions. Like, do we think
> > > that it's an information leak if people can look at how many
> > > transactions are committing and aborting in pg_xact_status? In theory
> > > it could be, but I know it's been argued that that's too much of a
> > > side channel. I'm not sure I believe that, but it's arguable.
> >
> > I was referring to the fact that the statistics are no longer stored in 
> > files:
> >
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd
> 
> Oh, yeah, I agree with that. I was saying that I'm not altogether a
> believer in the idea that we need not encrypt SLRU files.
> 
> TBH, I think that no matter what we do there are going to be side
> channel leaks, where some security researcher demonstrates that they
> can infer something based on file length or file creation rate or
> something. It seems inevitable. But the more stuff we don't encrypt,
> the easier those attacks are going to be. On the other hand,
> encrypting more stuff also makes the project harder. It's hard for me
> to judge what the right balance is here. Maybe it's OK to exclude that
> stuff for an initial version and just disclaim the heck out of it, but
> I don't think that should be our long term strategy.

I agree that there's undoubtably going to be side-channel attacks, some
of which we may never be able to address, but we should be looking to
try and do as much as we can while also disclaiming that which we can't.

I'd lay this out in steps along these lines:

- Primary data is encrypted (and, ideally, optionally authenticated)

This is basically what the current state of things are with the
patches that have been posted in the past and would be a fantastic first
step that would get us past a lot of the auditors and others who are
unhappy that they can simply 'grep' a PG data directory for user data.
This also generally addresses off-line attacks, backups, etc.  This also
puts us on a similar level as other vendors who offer TDE, as I
understand it.

- Encryption / authentication of metadata (SLRUs, maybe more)

This would be a great next step and would move us in the direction of
having a system that's resiliant against online storage-level attacks.
As for how to get there, I would think we'd start with coming up with a
way to at least have good checksums for SLRUs that are just generally
available and users are encouraged to enable, and then have that done in
a way that we could store an authentication tag instead in the future.

> > > I really don't know how you can argue that pg_dynshmem/mmap.NNN
> > > doesn't contain user data - surely it can.
> >
> > Good point. Since postgres does not control writing into this file, it's a
> > special case though. (Maybe TDE will have to reject to start if
> > dynamic_shared_memory_type is set to mmap and the instance is encrypted.)
> 
> Interesting point.

Yeah, this is an interesting case and it really depends on what attack
vector is being considered and how the system is configured.  I don't
think it's too much of an issue to say that you shouldn't use TDE with
mmap (I'm not huge on the idea of explicitly preventing someone from
doing it if they really want though..).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Improving the "Routine Vacuuming" docs

2022-04-13 Thread Robert Haas
On Tue, Apr 12, 2022 at 5:53 PM Peter Geoghegan  wrote:
> My high level concerns are:
>
> * Instead of discussing FrozenTransactionId (and then explaining how
> that particular magic value is not really used anymore anyway), why
> not describe freezing in terms of the high level rules?
>
> Something along the lines of the following seems more useful: "A tuple
> whose xmin is frozen (and xmax is unset) is considered visible to
> every possible MVCC snapshot. In other words, the transaction that
> inserted the tuple is treated as if it ran and committed at some point
> that is now *infinitely* far in the past."

I agree with this idea.

> * The description of wraparound sounds terrifying, implying that data
> corruption can result.
>
> The alarming language isn't proportionate to the true danger
> (something I complained about in a dedicated thread last year [1]).

I mostly agree with this, but not entirely. The section needs some
rephrasing, but xidStopLimit doesn't apply in single-user mode, and
relfrozenxid and datfrozenxid values can and do get corrupted. So it's
not a purely academic concern.

> * XID space isn't really a precious resource -- it isn't even a
> resource at all IMV.

I disagree with this. Usable XID space is definitely a resource, and
if you're in the situation where you care deeply about this section of
the documentation, it's probably one in short supply. Being careful
not to expend too many XIDs while fixing the problems that have cause
you to be short of safe XIDs is *definitely* a real thing.

> * We don't cleanly separate discussion of anti-wraparound autovacuums,
> and aggressive vacuums, and the general danger of wraparound (by which
> I actually mean the danger of having the xidStopLimit stop limit kick
> in).

I think it is wrong to conflate wraparound with xidStopLimit.
xidStopLimit is the final defense against an actual wraparound, and
like I say, an actual wraparound is quite possible if you put the
system in single user mode and then do something like this:

backend> VACUUM FULL;

Big ouch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Tom Lane
Robert Haas  writes:
> I guess my feeling about this is that _PG_init() is sort of a grab
> bag, and that's not very good. You're supposed to register new GUCs
> there, and request shared memory space, and install any hook functions
> you want to use, and maybe some other stuff. But why is it appropriate
> for all of those things to happen at the same time? I think it pretty
> clearly isn't, and that's why you see _PG_init() functions testing
> process_shared_preload_libraries_in_progress, and why
> RequestAddinShmemSpace() is a no-op if it's not the right time for
> such things. To me, all of that is just a sign that the system is
> badly designed.

Hmm, that's a good point.  If we can replace "RequestAddinShmemSpace
does nothing if called at the wrong time" with "RequestAddinShmemSpace
throws error if called at the wrong time", that seems like a pretty
clear improvement in robustness.

Is there anything else typically done in _PG_init that has to be
conditional on process_shared_preload_libraries_in_progress?  I recall
something about reserving LWLocks, which probably should get the same
treatment.  If we can get to a point where _PG_init shouldn't need to
care about process_shared_preload_libraries_in_progress, because all
the stuff that would care is moved to this new hook, then that would
be very clear cleanup.

regards, tom lane




Re: Atomic rename feature for Windows.

2022-04-13 Thread Andres Freund
Hi, 

On April 13, 2022 8:30:33 AM PDT, Robert Haas  wrote:
>On Wed, Apr 13, 2022 at 11:03 AM Andres Freund  wrote:
>> > Next decade's hot new processor design might do things
>> > differently enough that it matters that we use SpinLockInit()
>> > not memset-to-zero.  This is not academic either, as we've had
>> > exactly such bugs in the past.
>>
>> FWIW, I'l like to make spinlocks and atomics assert out if they've not
>> been initialized (which'd include preventing uninitialized use of
>> lwlocks). It's easy to accidentally zero out the state or start out
>> uninitialized. Right now nothing will complain on platforms created
>> after 1700 or using --disable-spinlocks --disable-atomics. That should
>> be caught well before running on the buildfarm...
>
>I don't understand this bit about platforms created after 1700. Before
>1700, they didn't even have computers.
>
>Am I being really dense here?

It was a sarcastic reference to the age of pa-risc (the only platform detecting 
zeroed out spinlocks).

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Atomic rename feature for Windows.

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 11:03 AM Andres Freund  wrote:
> > Next decade's hot new processor design might do things
> > differently enough that it matters that we use SpinLockInit()
> > not memset-to-zero.  This is not academic either, as we've had
> > exactly such bugs in the past.
>
> FWIW, I'l like to make spinlocks and atomics assert out if they've not
> been initialized (which'd include preventing uninitialized use of
> lwlocks). It's easy to accidentally zero out the state or start out
> uninitialized. Right now nothing will complain on platforms created
> after 1700 or using --disable-spinlocks --disable-atomics. That should
> be caught well before running on the buildfarm...

I don't understand this bit about platforms created after 1700. Before
1700, they didn't even have computers.

Am I being really dense here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 10:43 AM Tom Lane  wrote:
> Yeah, there is something to be said for preventing subtle interactions
> between extensions.
>
> > So in the end I see basically four possibilities here:
>
> > A. No hard compatibility break, but invent a set-a-GUC hook that runs 
> > earlier.
> > B. No hard compatibility break, but invent a request-shmem hook that runs 
> > later.
> > C. Invent a set-a-GUC hook that runs earlier AND refuse to set GUCs in 
> > _PG_init.
> > D. Invent a request-shmem hook that runs later AND refuse to accept
> > shmem requests in _PG_init.
>
> I dislike A for the reason I already stated: _PG_init should be the
> first code we run in an extension.  Not doing that is just too hacky
> for words.

That seems like a fair position, even if I don't really understand why
it would be as bad as "too hacky for words."

> While B breaks the least stuff in the short run, I agree
> that it leaves extension authors on the hook to avoid unpleasant
> interactions, and that the only way they can be sure to do so is
> to move their shmem requests to the new hook.  So if we're willing
> to accept a hard compatibility break to prevent such bugs, then
> I too prefer D to C.  What I'm not quite convinced about is whether
> the problem is big enough to warrant a compatibility break.

It's sort of a philosophical question. How do you measure the size of
such a problem? What units do you even use for such a size
measurement? How big does it have to be to justify a compatibility
break? Presumably it depends on the size of the compatibility break,
which is also not subject to any sort of objective measurement.

I guess my feeling about this is that _PG_init() is sort of a grab
bag, and that's not very good. You're supposed to register new GUCs
there, and request shared memory space, and install any hook functions
you want to use, and maybe some other stuff. But why is it appropriate
for all of those things to happen at the same time? I think it pretty
clearly isn't, and that's why you see _PG_init() functions testing
process_shared_preload_libraries_in_progress, and why
RequestAddinShmemSpace() is a no-op if it's not the right time for
such things. To me, all of that is just a sign that the system is
badly designed. Imagine if someone proposed to make XLogInsert() or
SetConfigOption() or LockBuffer() sometimes just return without doing
anything. We would just call those functions in places where those
actions weren't appropriate, and the function would just do nothing
silently without signalling an error. Surely such a proposal would be
shot down as an awful idea, and the only reason the _PG_init() case is
any different is because it's not new. But it doesn't seem to me that
it's any better of an idea here than it would be there.

And under proposal D we'd actually be fixing that, because we'd have a
hook that is the right place to request shared memory and we'd
complain if those functions were called from anywhere else.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Atomic rename feature for Windows.

2022-04-13 Thread Tom Lane
Andres Freund  writes:
> On 2022-04-13 10:19:44 -0400, Tom Lane wrote:
>> Next decade's hot new processor design might do things
>> differently enough that it matters that we use SpinLockInit()
>> not memset-to-zero.  This is not academic either, as we've had
>> exactly such bugs in the past.

> FWIW, I'l like to make spinlocks and atomics assert out if they've not
> been initialized (which'd include preventing uninitialized use of
> lwlocks). It's easy to accidentally zero out the state or start out
> uninitialized. Right now nothing will complain on platforms created
> after 1700 or using --disable-spinlocks --disable-atomics. That should
> be caught well before running on the buildfarm...

Yeah, even just doing that in --disable-spinlocks builds would be
enough for the purpose, and be much more accessible to Joe Developer.

> Then the zero-state assumption wouldn't require continuing to support
> HPPA.

I wouldn't mind retiring that machine once v11 is EOL.  (It's also one
of very few animals testing pre-C99 compilers, so not before then.)

regards, tom lane




Re: GSoC 2022: Proposal of pgmoneta on-disk encryption

2022-04-13 Thread Jesper Pedersen

Hi Jichen,

On 4/13/22 10:40, Solo Kyo wrote:

To whom it may concern:

Hi, I am Jichen Xu. I am a first year master computer science student at
Waseda University.
Here is a link to my proposal:
https://docs.google.com/document/d/1vdgPY5wvhjhrX9aSUw5mTDOnXwEQNcr4cxfzuSHMWlg
Looking forward to working with you in next few months.



Thanks for your proposal to Google Summer of Code 2022 !


We'll follow up off-list to get this finalized.


Best regards,

 Jesper





Re: Atomic rename feature for Windows.

2022-04-13 Thread Andres Freund
Hi,

On 2022-04-13 10:19:44 -0400, Tom Lane wrote:
> Meh.  I agree that it seems unlikely that anyone will come out with a
> new processor design that lacks the ability to do spinlocks or atomics.
> It's substantially more likely though that someone would want those
> configure switches temporarily while in the process of porting
> Postgres to a new processor, so that they don't have to make
> absolutely everything work before they can test anything.

I still think we ought to provide a compiler intrinsics spinlock
implementation for that...


> Independently of that, I think that our interest in weird old
> processors is mostly about checking our assumptions about exactly
> what processor-dependent facilities look like.  For example,
> although I agree that spinlocks should be possible on everything
> we care about supporting, I missed the stone tablet on which it is
> graven that thou shalt use zero for the unlocked state of a spinlock.
> The main reason I keep my old HPPA dinosaur alive is because it is
> (I think) our only remaining architecture in which that isn't true,
> and I think we need to keep ourselves honest about that sort of
> detail.

The other thing it currently has is the weird wide spinlock state where
we don't know which byte is going to be modified ... I don't think
that's likely to be needed again though.


> Next decade's hot new processor design might do things
> differently enough that it matters that we use SpinLockInit()
> not memset-to-zero.  This is not academic either, as we've had
> exactly such bugs in the past.

FWIW, I'l like to make spinlocks and atomics assert out if they've not
been initialized (which'd include preventing uninitialized use of
lwlocks). It's easy to accidentally zero out the state or start out
uninitialized. Right now nothing will complain on platforms created
after 1700 or using --disable-spinlocks --disable-atomics. That should
be caught well before running on the buildfarm...

Then the zero-state assumption wouldn't require continuing to support
HPPA.


> The situation for OSes is a bit different, because IMV we generally
> prefer to restrict ourselves to POSIX-compliant system calls,
> and to the extent we can do that all OSes look alike.  The reason
> that Windows is such a grade-A pain in the rear is exactly that
> their POSIX compliance sucks, and yet we committed to supporting
> them anyway.  If some new OS that is not POSIX-compliant comes
> down the pike, I think we're far more likely to decline to support
> it than otherwise.

Our windows support is not in a great state. Part of that is that we
just plaster random hacks over issues. Which often are only needed on
windows version that nobody has access to. As you say that's different
from most of the hackiness to support some random old unix platform,
which most of the time much more localized (with the exception of not
relying on threads in some places due to old platforms).


> But to tie this back to the point of the thread --- anytime we
> can reasonably start to rely on POSIX behavior in newer versions
> of Windows, I'm for that.

Yea. Same imo is true for msvc specific compiler oddities. If we can
simplify things by requiring a halfway modern msvc version, we shouldn't
hesitate.

I think it might be worth noting somewhere developer oriented that we're
ok with dropping support in HEAD for windows versions that aren't
"fully" supported anymore. Even if one can procure extended support for
a gazillion or two, they're not going to do that to run PG 19.

Greetings,

Andres Freund




Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}

2022-04-13 Thread gkokolatos



--- Original Message ---
On Wednesday, April 13th, 2022 at 7:25 AM, Michael Paquier 
 wrote:


>
>
> On Tue, Apr 12, 2022 at 06:22:48PM +0900, Michael Paquier wrote:
>
> > On Mon, Apr 11, 2022 at 12:46:02PM +, gkokola...@pm.me wrote:
> >
> > > It looks good. If you choose to discard the comment regarding the use of
> > > 'method' over 'algorithm' from above, can you please use the full word in 
> > > the
> > > variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I 
> > > can
> > > not really explain it, the later reads a bit rude. Then again that may be 
> > > just
> > > me.
> >
> > Thanks. I have been able to do an extra pass on 0001 and 0002, fixing
> > those naming inconsistencies with "algo" vs "algorithm" that you and
> > Robert have reported, and applied them. For 0003, I'll look at it
> > later. Attached is a rebase with improvements about the variable
> > names.
>
> This has been done with the proper renames. With that in place, I see
> no reason now to not be able to set the compression level as it is
> possible to pass it down with the options available. This requires
> only a couple of lines, as of the attached. LZ4 has a dummy structure
> called LZ4F_INIT_PREFERENCES to initialize LZ4F_preferences_t, that
> holds the compression level before passing it down to
> LZ4F_compressBegin(), but that's available only in v1.8.3. Using it
> risks lowering down the minimal version of LZ4 we are able to use now,
> but replacing that with a set of memset()s is also a way to set up
> things as per its documentation.
>
> Thoughts?

It's really not hard to add compression level. However we had briefly
discussed it in the original thread [1] and decided against. That is why
I did not write that code. If the community thinks differently now, let
me know if you would like me to offer a patch for it.

Cheers,
//Georgios


[1] 
https://www.postgresql.org/message-id/flat/CABUevEwuq7XXyd4fA0W3jY9MsJu9B2WRbHumAA%2B3WzHrGAQjsg%40mail.gmail.com#b6456fa2adc1cdb049a57bf3587666b9




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Tom Lane
Robert Haas  writes:
> I wouldn't say that approach is wrong. I think we're just prioritizing
> different things. I think that the pattern of wanting to request
> shared memory in _PG_init() is common, and there are lots of
> extensions that already do it that way, and if we pursue this plan,
> then all of those extensions really ought to be updated to use the new
> hook, including pg_prewarm and pg_stat_statements. They may continue
> to work if they don't, but they'll be doing something that is not
> really best practice any more but will happen to work until you make
> your request for shared memory dependent on the wrong thing, or until
> you load up another module at the same time that tweaks some GUC upon
> which your calculation depends (imagine an autotuner that adjusts
> pg_stat_statements.max at startup time). So I think we'll be
> inflicting subtle bugs on extension authors that will never really get
> fully sorted out.

Yeah, there is something to be said for preventing subtle interactions
between extensions.

> So in the end I see basically four possibilities here:

> A. No hard compatibility break, but invent a set-a-GUC hook that runs earlier.
> B. No hard compatibility break, but invent a request-shmem hook that runs 
> later.
> C. Invent a set-a-GUC hook that runs earlier AND refuse to set GUCs in 
> _PG_init.
> D. Invent a request-shmem hook that runs later AND refuse to accept
> shmem requests in _PG_init.

I dislike A for the reason I already stated: _PG_init should be the
first code we run in an extension.  Not doing that is just too hacky
for words.  While B breaks the least stuff in the short run, I agree
that it leaves extension authors on the hook to avoid unpleasant
interactions, and that the only way they can be sure to do so is
to move their shmem requests to the new hook.  So if we're willing
to accept a hard compatibility break to prevent such bugs, then
I too prefer D to C.  What I'm not quite convinced about is whether
the problem is big enough to warrant a compatibility break.

regards, tom lane




GSoC 2022: Proposal of pgmoneta on-disk encryption

2022-04-13 Thread Solo Kyo
To whom it may concern:

Hi, I am Jichen Xu. I am a first year master computer science student at
Waseda University.
Here is a link to my proposal:
https://docs.google.com/document/d/1vdgPY5wvhjhrX9aSUw5mTDOnXwEQNcr4cxfzuSHMWlg
Looking forward to working with you in next few months.

Best regards,
Jichen


Re: Atomic rename feature for Windows.

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 10:19 AM Tom Lane  wrote:
> Meh.  I agree that it seems unlikely that anyone will come out with a
> new processor design that lacks the ability to do spinlocks or atomics.
> It's substantially more likely though that someone would want those
> configure switches temporarily while in the process of porting
> Postgres to a new processor, so that they don't have to make
> absolutely everything work before they can test anything.

It's possible. I don't think it's super-likely. If someone is
introducing a new architecture, they're probably going to make getting
the Linux kernel and gcc working on it a pretty high priority, and
they'll probably make the gcc intrinsics work, too. But you never
know. Humans are unpredictable like that.

> Independently of that, I think that our interest in weird old
> processors is mostly about checking our assumptions about exactly
> what processor-dependent facilities look like.  For example,
> although I agree that spinlocks should be possible on everything
> we care about supporting, I missed the stone tablet on which it is
> graven that thou shalt use zero for the unlocked state of a spinlock.
> The main reason I keep my old HPPA dinosaur alive is because it is
> (I think) our only remaining architecture in which that isn't true,
> and I think we need to keep ourselves honest about that sort of
> detail.  Next decade's hot new processor design might do things
> differently enough that it matters that we use SpinLockInit()
> not memset-to-zero.  This is not academic either, as we've had
> exactly such bugs in the past.

Here again, I think it's definitely possible that that could happen,
but I don't think it's super-likely. Nobody's really implementing
spinlocks as a primitive any more; they implement atomics, and you can
decide for yourself how to build spinlocks on top of that and what
values you want to use. And if you did decide to provide spinlocks but
not atomics for some reason, you'd probably use 0 and 1 rather than 17
and 42 just because otherwise a lot of software wouldn't work on your
brand new hardware, which as a hardware manufacturer is a thing you
really don't want. We can be as rigorous as we like about this sort of
thing, but I bet that in 2022 there is a huge amount of code out that
assumes memset(, 0, ...) is good enough. And, like, nobody's going
to be that excited about building a machine where PostgreSQL works
because we've carefully avoided this assumption, but 5000 other
software packages that haven't been as careful all break.

> The situation for OSes is a bit different, because IMV we generally
> prefer to restrict ourselves to POSIX-compliant system calls,
> and to the extent we can do that all OSes look alike.  The reason
> that Windows is such a grade-A pain in the rear is exactly that
> their POSIX compliance sucks, and yet we committed to supporting
> them anyway.  If some new OS that is not POSIX-compliant comes
> down the pike, I think we're far more likely to decline to support
> it than otherwise.

Yeah.

> But to tie this back to the point of the thread --- anytime we
> can reasonably start to rely on POSIX behavior in newer versions
> of Windows, I'm for that.

Sure, makes sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 9:27 AM Tom Lane  wrote:
> Robert Haas  writes:
> > Hmm. I suppose I was thinking that we'd go the other way around: move
> > the call to InitializeMaxBackends() earlier, as proposed previously,
> > and add a hook to allow extensions to get control before that point.
> > The reason that I like that approach is that I suspect that it's more
> > common for extensions to want to size shared memory data structures
> > than it is for them to want to change GUCs, and therefore I thought
> > that approach would fix things for the most amount of people with the
> > least amount of code change. But it seems like maybe Tom thinks I'm
> > incorrect about the relative frequency of those things, so I don't
> > know.
>
> Maybe I'm missing something, but I figured we'd keep the _PG_init
> calls where they are to minimize side-effects, and then add an optional
> hook just before/after shared memory size is determined.  Cases that
> work well now continue to work well, and cases that don't work so
> well can be fixed by making use of the hook.  In particular you
> can still do RequestAddinShmemSpace() in _PG_init as long as the
> request size doesn't depend on factors that other extensions might
> change.  If you're doing something funny then you might need to
> postpone RequestAddinShmemSpace till the new hook call.

I wouldn't say that approach is wrong. I think we're just prioritizing
different things. I think that the pattern of wanting to request
shared memory in _PG_init() is common, and there are lots of
extensions that already do it that way, and if we pursue this plan,
then all of those extensions really ought to be updated to use the new
hook, including pg_prewarm and pg_stat_statements. They may continue
to work if they don't, but they'll be doing something that is not
really best practice any more but will happen to work until you make
your request for shared memory dependent on the wrong thing, or until
you load up another module at the same time that tweaks some GUC upon
which your calculation depends (imagine an autotuner that adjusts
pg_stat_statements.max at startup time). So I think we'll be
inflicting subtle bugs on extension authors that will never really get
fully sorted out.

Now, on the other hand, I think that the pattern of changing GUCs in
_PG_init() is comparatively uncommon. I don't believe that we have any
in-core code that does it, and the only out-of-core code that does to
my knowledge is Citus. So if we subtly change the best practices
around setting GUCs in _PG_init() hooks, I think that's going to
affect a vastly smaller number of extensions. Either way, we're
changing best practices without really creating a hard break, but I'd
rather move the wood for the small number of extensions that are
tweaking GUCs in _PG_init() than the larger number of extensions (or
so I suppose) that are requesting shared memory there.

Now, we could also decide to create a hard compatibility break to
force extension authors to update. I think that could be done in one
of two ways. One possibility is to refuse SetConfigOption() in
_PG_init(); anyone who wants to do that has to use the new set-a-GUC
hook we simultaneously add. The other is to refuse
RequestAddinShmemSpace() and RequestNamedLWLockTranche() in
_PG_init(); anyone who wants to do that has to use the new
request-shmem-resources hook that we simultaneously add. I tend to
think that the latter is a slightly more principled option, because I
doubt that refusing SetConfigOption() is a bullet-proof defense
against, well, I don't know, anything really. You can hackily modify
GUCs by other means, and you can do bad math with things that aren't
GUCs. I am however pretty sure that if we refuse to provide shmem
space unless you request it from a new add-shmem hook, that will be
pretty water-tight, and it doesn't seem like it should be a big issue
for extension authors to move that code to some new hook we invent.

So in the end I see basically four possibilities here:

A. No hard compatibility break, but invent a set-a-GUC hook that runs earlier.
B. No hard compatibility break, but invent a request-shmem hook that runs later.
C. Invent a set-a-GUC hook that runs earlier AND refuse to set GUCs in _PG_init.
D. Invent a request-shmem hook that runs later AND refuse to accept
shmem requests in _PG_init.

My preferred options are A or D for the reasons explained above. It
seems like you prefer B.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Building Postgres with lz4 on Visual Studio

2022-04-13 Thread Melih Mutlu
Hi all,

I tried to build Postgres from source using Visual Studio 19. It worked all
good.
Then I wanted to build it with some dependencies, started with the ones
listed here [1]. But I'm having some issues with lz4.

First I downloaded the latest release of lz4 from this link [2].
Modified the src\tools\msvc\config.pl file as follows:

> use strict;
> use warnings;
>


our $config;
> $config->{"tap_tests"} = 1;
> $config->{"asserts"} = 1;
>


$config->{"lz4"} = "c:/lz4/";
> $config->{"openssl"} = "c:/openssl/1.1/";
> $config->{"perl"} = "c:/strawberry/$ENV{DEFAULT_PERL_VERSION}/perl/";
> $config->{"python"} = "c:/python/";
>


1;

based on /src/tools/ci/windows_build_config.pl

Then ran the following commands:

> vcvarsall x64
> perl src/tools/msvc/mkvcbuild.pl
> msbuild  -m -verbosity:minimal /p:IncludePath="C:\lz4" pgsql.sln


msbuild command fails with the following error:
"LINK : fatal error LNK1181: cannot open input file
'c:\lz4\\lib\liblz4.lib' [C:\postgres\libpgtypes.vcxproj]"

What I realized is that c:\lz4\lib\liblz4.lib actually does not exist.
The latest versions of lz4, downloaded from [2], do not contain \liblz4.lib
anymore, as opposed to what's written here [3]. Also there isn't a lib
folder too.

After those changes on lz4 side, AFAIU there seems like this line adds
library from wrong path in Solution.pm file [4].

> $proj->AddIncludeDir($self->{options}->{lz4} . '\include');
> $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib');


Even if I spent some time on this problem and tried to fix some places, I'm
not able to run a successful build yet.
This is also the case for zstd too. Enabling zstd gives the same error.

Has anyone had this issue before? Is this something that anyone is aware of
and somehow made it work?
I would appreciate any comment/suggestion etc.

Thanks,
Melih


[1]
https://www.postgresql.org/docs/current/install-windows-full.html#id-1.6.5.8.8
[2] https://github.com/lz4/lz4/releases/tag/v1.9.3
[3] https://github.com/lz4/lz4/tree/dev/lib/dll/example
[4]
https://github.com/postgres/postgres/blob/c1932e542863f0f646f005b3492452acc57c7e66/src/tools/msvc/Solution.pm#L1092


Re: Atomic rename feature for Windows.

2022-04-13 Thread Tom Lane
Robert Haas  writes:
> On Wed, Apr 13, 2022 at 3:04 AM Michael Paquier  wrote:
>> Agreed.  I think that things can be usually helpful.  Now, I am not
>> really convinced that there is a strong need in running a VAX if you
>> are worrying about timing issues so this is a matter of balance.  You
>> could get down to the same level of coverage with something as cheap
>> as a Raspberry PI or such.  There are also configure switches that
>> emulate rather non-common behaviors, like --disable-spinlocks or
>> --disable-atomics that I'd rather never see gone.

> I don't really agree with you about any of this.

Meh.  I agree that it seems unlikely that anyone will come out with a
new processor design that lacks the ability to do spinlocks or atomics.
It's substantially more likely though that someone would want those
configure switches temporarily while in the process of porting
Postgres to a new processor, so that they don't have to make
absolutely everything work before they can test anything.

Independently of that, I think that our interest in weird old
processors is mostly about checking our assumptions about exactly
what processor-dependent facilities look like.  For example,
although I agree that spinlocks should be possible on everything
we care about supporting, I missed the stone tablet on which it is
graven that thou shalt use zero for the unlocked state of a spinlock.
The main reason I keep my old HPPA dinosaur alive is because it is
(I think) our only remaining architecture in which that isn't true,
and I think we need to keep ourselves honest about that sort of
detail.  Next decade's hot new processor design might do things
differently enough that it matters that we use SpinLockInit()
not memset-to-zero.  This is not academic either, as we've had
exactly such bugs in the past.

The situation for OSes is a bit different, because IMV we generally
prefer to restrict ourselves to POSIX-compliant system calls,
and to the extent we can do that all OSes look alike.  The reason
that Windows is such a grade-A pain in the rear is exactly that
their POSIX compliance sucks, and yet we committed to supporting
them anyway.  If some new OS that is not POSIX-compliant comes
down the pike, I think we're far more likely to decline to support
it than otherwise.

But to tie this back to the point of the thread --- anytime we
can reasonably start to rely on POSIX behavior in newer versions
of Windows, I'm for that.

regards, tom lane




Re: deparsing utility commands

2022-04-13 Thread Peter Eisentraut

On 13.04.22 06:29, Ajin Cherian wrote:

This patch-set has not been rebased for some time. I see that there is
interest in this patch from the logical
replication of DDL thread [1].

I will take a stab at rebasing this patch-set, I have already rebased
the first patch and will work on the other
patches in the coming days. Do review and give me feedback.


The patch you posted contains neither a detailed commit message nor 
documentation or test changes, so it's impossible to tell what it's 
supposed to do.






Re: JSON docs: RFC7159 is now superceded

2022-04-13 Thread Simon Riggs
On Wed, 13 Apr 2022 at 14:53, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > Minor doc patch to replace with latest RFC number
>
> Hmm, I'm a bit disinclined to claim compliance with a new RFC
> sight unseen.  What were the changes?

I checked... so I should have mentioned this before

https://datatracker.ietf.org/doc/html/rfc8259#appendix-A

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: JSON docs: RFC7159 is now superceded

2022-04-13 Thread Andrew Dunstan


On 2022-04-13 We 09:38, Simon Riggs wrote:
> Minor doc patch to replace with latest RFC number
>
> Intended for PG15



Idea is fine, but


-  data, as specified in https://tools.ietf.org/html/rfc7159;>RFC
-  7159. Such data can also be stored as text, but
+  data, as specified in https://tools.ietf.org/html/rfc8259;>RFC
+  8259, which supercedes the earlier RFC 7159.
+  Such data can also be stored as text, but


Do we need to mention the obsoleting of RFC7159? Anyone who cares enough
can see that by looking at the RFC - it mentions what it obsoletes.

I haven't checked that anything that changed in RFC8259 affects us. I
doubt it would but I guess we should double check.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: JSON docs: RFC7159 is now superceded

2022-04-13 Thread Tom Lane
Simon Riggs  writes:
> Minor doc patch to replace with latest RFC number

Hmm, I'm a bit disinclined to claim compliance with a new RFC
sight unseen.  What were the changes?

regards, tom lane




Re: Atomic rename feature for Windows.

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 3:04 AM Michael Paquier  wrote:
> Agreed.  I think that things can be usually helpful.  Now, I am not
> really convinced that there is a strong need in running a VAX if you
> are worrying about timing issues so this is a matter of balance.  You
> could get down to the same level of coverage with something as cheap
> as a Raspberry PI or such.  There are also configure switches that
> emulate rather non-common behaviors, like --disable-spinlocks or
> --disable-atomics that I'd rather never see gone.

I don't really agree with you about any of this.

A Raspberry Pi running Linux is very similar to any other Linux
machine, except with less resources. A machine running a different
operating system is a totally different thing. I agree that there is
value in supporting other operating systems if those are things people
might actually use, because it's good for PostgreSQL to be portable,
but it isn't really useful for us to be portable to systems that
nobody runs any more. I am just old enough to have a bit of nostalgia
about the idea of surrendering support for VAX or a PDP-11 or an IRIX
workstation, but in practice efforts to port to those platforms are
efforts for the dedicated hobbyist rather than anything that will make
PostgreSQL a better piece of software.

Similarly for spinlocks and atomics. There have been in the past
systems that did not have these things, but it seems very unlikely
that there will be new systems in the future that don't have these
things. And continuing to support those configurations actually adds a
lot of complexity. The code is weirdly structured so that you can
emulate atomics with spinlocks, but spinlocks on most systems are
actually built using atomics, except when we emulate spinlocks with
semaphores. It's really kind of a mess, and we could clean things up
and make them more straightforward if we were willing to decide that
atomics and spinlocks are basic requirements for PostgreSQL to run.

Now I don't know if we should decide that today or at some point in
the future, but single-processor architectures are pretty dead.
Embedded devices like phones are shipping with multiple cores. Even a
single processor system is probably based on an underlying
architecture that is multiprocessor capable. Where, outside of a
museum, would you find a system that required --disable-spinlocks or
--disable-atomics?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




JSON docs: RFC7159 is now superceded

2022-04-13 Thread Simon Riggs
Minor doc patch to replace with latest RFC number

Intended for PG15

-- 
Simon Riggshttp://www.EnterpriseDB.com/


json_docs_rfc8259.v1.patch
Description: Binary data


Re: make MaxBackends available in _PG_init

2022-04-13 Thread Tom Lane
Robert Haas  writes:
> Hmm. I suppose I was thinking that we'd go the other way around: move
> the call to InitializeMaxBackends() earlier, as proposed previously,
> and add a hook to allow extensions to get control before that point.
> The reason that I like that approach is that I suspect that it's more
> common for extensions to want to size shared memory data structures
> than it is for them to want to change GUCs, and therefore I thought
> that approach would fix things for the most amount of people with the
> least amount of code change. But it seems like maybe Tom thinks I'm
> incorrect about the relative frequency of those things, so I don't
> know.

Maybe I'm missing something, but I figured we'd keep the _PG_init
calls where they are to minimize side-effects, and then add an optional
hook just before/after shared memory size is determined.  Cases that
work well now continue to work well, and cases that don't work so
well can be fixed by making use of the hook.  In particular you
can still do RequestAddinShmemSpace() in _PG_init as long as the
request size doesn't depend on factors that other extensions might
change.  If you're doing something funny then you might need to
postpone RequestAddinShmemSpace till the new hook call.

regards, tom lane




Re: Add --{no-,}bypassrls flags to createuser

2022-04-13 Thread Robert Haas
On Wed, Apr 13, 2022 at 4:35 AM Kyotaro Horiguchi
 wrote:
> I don't think there's a definitive criteria (other than feasibility)
> for whether each CREATE ROLE option should have the correspondent
> option in the createuser command.  I don't see a clear reason why
> createuser command should not have the option.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make MaxBackends available in _PG_init

2022-04-13 Thread Robert Haas
On Tue, Apr 12, 2022 at 6:26 PM Nathan Bossart  wrote:
> Okay.  So maybe we only need the attached patches.  0001 is just 5ecd018
> un-reverted, and 0002 is Julien's patch from a few weeks ago with some
> minor edits.

Hmm. I suppose I was thinking that we'd go the other way around: move
the call to InitializeMaxBackends() earlier, as proposed previously,
and add a hook to allow extensions to get control before that point.
The reason that I like that approach is that I suspect that it's more
common for extensions to want to size shared memory data structures
than it is for them to want to change GUCs, and therefore I thought
that approach would fix things for the most amount of people with the
least amount of code change. But it seems like maybe Tom thinks I'm
incorrect about the relative frequency of those things, so I don't
know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: should frontend tools use syncfs() ?

2022-04-13 Thread Justin Pryzby
On Thu, Sep 30, 2021 at 12:49:36PM +0900, Michael Paquier wrote:
> On Wed, Sep 29, 2021 at 07:43:41PM -0500, Justin Pryzby wrote:
> > Forking this thread in which Thomas implemented syncfs for the startup 
> > process
> > (61752afb2).
> > https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BSG9jSW3ekwib0cSdC0yD-jReJ21X4bZAmqxoWTLTc2A%40mail.gmail.com
> > 
> > Is there any reason that initdb/pg_basebackup/pg_checksums/pg_rewind 
> > shouldn't
> > use syncfs()  ?
> 
> That makes sense.
> 
> > do_syncfs() is in src/backend/ so would need to be duplicated^Wimplemented 
> > in
> > common.
> 
> The fd handling in the backend makes things tricky if trying to plug
> in a common interface, so I'd rather do that as this is frontend-only
> code.
> 
> > They can't use the GUC, so need to add an cmdline option or look at an
> > environment variable.
> 
> fsync_pgdata() is going to manipulate many inodes anyway, because
> that's a code path designed to do so.  If we know that syncfs() is
> just going to be better, I'd rather just call it by default if
> available and not add new switches to all the frontend tools in need
> of flushing the data folder, switches that are not documented in your
> patch.

It is a draft/POC, after all.

The argument against using syncfs by default is that it could be worse than
recursive fsync if a tiny 200MB postgres instance lives on a shared filesystem
along with other, larger applications (maybe a larger postgres instance).

There's also an argument that syncfs might be unreliable in the case of a write
error.  (But I agreed with Thomas' earlier assessment: that claim caries little
weight since fsync() itself wasn't reliable for 20some years).

I didn't pursue this patch, as it's easier for me to use /bin/sync -f.  Someone
should adopt it if interested.

-- 
Justin




Re: Unit tests for SLRU

2022-04-13 Thread Pavel Borisov
>
> Hi hackers,
>
> > Here is version 3 of the patch.
> > [...]
> > I think the tests are about as good as they will ever get.
>
> Here is version 4. Same as v3, but with resolved conflicts against the
> current `master` branch.
>
Hi, Alexander!
The test seems good enough to be pushed.

Only one thing to note. Maybe it would be good not to copy-paste Assert
after every call of SimpleLruInit, putting it into the wrapper function
instead. So the test can call calling the inner function (without assert)
and all other callers using the wrapper. Not sure about naming though.
Maybe  rename current SimpleLruInit -> SimpleLruInitInner and a new wrapper
being under the old name (SimpleLruInit).

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Error logging messages

2022-04-13 Thread Daniel Gustafsson
When reviewing the patch in "Frontend error logging style" [0] I noticed that
some messages could do with a little bit of touching up.  The original review
was posted and responded to in that thread, but to keep goalposts in place it
was put off until that patch had landed.  To avoid this getting buried in that
thread I decided to start a new one with the findings from there.  To make
reviewing easier I split the patch around the sorts of changes proposed.

0001: Makes sure that database and file names are printed quoted.  This patch
has hunks in contrib and backend as well.

0002: Capitalizes pg_log_error_detail and conversely starts pg_log_error with a
lowercase letter without punctuation.

0003: Extend a few messages with more information to be more consistent with
other messages (and more helpful).

0004: Add pg_log_error() calls on all calls to close in pg_basebackup.  Nearly
all had already, and while errors here are likely to be rare, when they do
happen something is really wrong and every bit of information can help
debugging.

0005: Put keywords as parameters in a few pg_dump error messages, to make their
translations reuseable.

--
Daniel Gustafsson   https://vmware.com/

[0] https://postgr.es/m/1363732.1636496...@sss.pgh.pa.us



0005-Parameterize-values-to-lessen-translation-burden.patch
Description: Binary data


0004-Add-errorhandling-on-file-close.patch
Description: Binary data


0003-Add-additional-information-to-be-more-helpful.patch
Description: Binary data


0002-pg_log_error-and-pg_log_error_detail-capitalization-.patch
Description: Binary data


0001-Ensure-file-and-database-names-are-quoted.patch
Description: Binary data


  1   2   >