Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Craig Ringer
On Mon, 23 Aug 2021 at 22:45, Julien Rouhaud  wrote:

> On Mon, Aug 23, 2021 at 10:22 PM Tom Lane  wrote:
> >
> > Bruce Momjian  writes:
> > > On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote:
> > >> By that argument, *every* globally-visible variable should be marked
> > >> PGDLLIMPORT.  But the mere fact that two backend .c files need to
> access
> >
> > > No, Julien says 99% need only the GUCs, so that is not the argument I
> am
> > > making.
> >
> > That's a claim unbacked by any evidence that I've seen.  More to
> > the point, we already have a mechanism that extensions can/should
> > use to read and write GUC settings, and it's not direct access.
>
> I clearly didn't try all single extension available out there.  It's
> excessively annoying to compile extensions on Windows, and also I
> don't have a lot of dependencies installed so there are some that I
> wasn't able to test since I'm lacking some other lib and given my
> absolute lack of knowledge of that platform I didn't spent time trying
> to install those.
>
>
Plenty of them are closed too.

While that's not the Pg community's problem, as such, it'd be nice not to
arbitrarily break them all for no actual benefit.


Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Craig Ringer
On Tue, 24 Aug 2021 at 05:08, Chapman Flack  wrote:

> On 08/23/21 14:30, Robert Haas wrote:
> > it seems likely that this proposed
> > API would have the exact same problem, because it would let people do
> > exactly the same thing. And, going through this proposed API would
> > still be significantly more expensive than just accessing the bare
> > variables, because you'd at least have to do some kind of lookup based
> > on the GUC name
>
> I think the API ideas in [0] would not let people do exactly the same
> thing.
>
> They would avoid exposing the bare variables to overwrite. Not that
> there has been any plague of extensions going and overwriting GUCs,
> but I think in some messages on this thread I detected a sense that
> in principle it's better if an API precludes it, and that makes sense
> to me.
>
> The second idea also avoids the expense of name-based lookup (except once
> at extension initialization), and in fact minimizes the cost of obtaining
> the current value when needed, by slightly increasing the infrequent cost
> of updating values.
>

I'd be generally in favour of something that reduced our reliance on the
current chaotic and inconsistent jumble of globals which are a semi-random
combination of compilation-unit-scoped, globally-scoped-except-on-Windows
and globally scoped.

Tackling GUCs would be a good start. Especially given the number of GUCs
where the actual GUC value isn't the state that anyone should be
interacting with directly since a hook maintains the "real" state derived
from the GUC storage.

And preventing direct writes to GUCs seems like a clearly correct thing to
do.

Some consideration of performance would be important for some of the hotter
GUCs, of course, but most extensions won't be hammering most GUC access a
lot.


Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Craig Ringer
On Tue, 24 Aug 2021 at 13:21, Craig Ringer 
wrote:

>
> There is not even the thinnest pretense of Pg having a dedicated extension
> API or any sort of internal vs public API separation.
>

Oh, and if we do want such a separation, we'll need to introduce a MUCH
lower-pain-and-overhead way to get related patches in. Otherwise it'll take
decades to add any necessary function wrappers for currently exported data
symbols, add necessary hooks, wrap or hide internal symbols and state, etc.

But ... what is the actual goal and expected outcome of such a hypothetical
public/private API separation?

It won't help meaningfully with server maintenance: We already break
extensions freely in major releases, and sometimes even minor releases. We
don't make any stable API promise at all. So any argument that it would
make maintenance of the core server easier is weak at best.

It won't help protect server runtime stability: The server is written in C,
and makes heavy use of non-opaque / non-hidden types. Many of which would
not be practical to hide without enormous refactoring if at all. Writing
any sort of "safe" C API is very difficult even when the exposed
functionality is very narrow and well defined. Even then such an API can
only help prevent inadvertent mistakes, since C programs are free to grovel
around in memory. Look at the mess with EXPORT_SYMBOL_GPL in the kernel for
just how ugly that can get. So I don't think there's any realistic way to
claim that narrowing the exposed API surface would make it safer to load
and run extensions that the user has not separately checked and reviewed or
obtained from a trusted source with robust testing practices. Certainly it
offers no benefit at all against a bad actor.

It won't make it safer to use untrusted extensions.

What will it do? Not much, in the short term, except cripple existing
extensions or add a pile of questionably useful code annotations. The only
real benefits I see are some improvement in link-time optimization and
export symbol table size. Both of which are nice, but IMO not worth the
pain by themselves for a pure C program. In C++, with its enormous symbol
tables it's absolutely worth it. But not really for Pg.

To be clear, I actually love the idea of starting to define a solid public
API, with API, ABI and semantic promises and associated tests. But to say
it's a nontrivial amount of work is an enormous understatement. And unless
done by an existing committer who is trusted to largely define a
provisional API without bike-shedding and arguing the merits of every
single part of it, it's nearly impossible to do with the way Pg is
currently developed.

It's completely beyond me why it's OK to export all function symbols on
Windows, but not all data symbols. Or why it's OK to export all data
symbols on *nix, but not on Windows.


Re: Failure of subscription tests with topminnow

2021-08-23 Thread Ajin Cherian
On Mon, Aug 16, 2021 at 6:33 PM Amit Kapila  wrote:

> The "ALTER SUBSCRIPTION tap_sub CONNECTION" would lead to restart the
> walsender process. Now, here the problem seems to be that the previous
> walsender process (16336) didn't exit and the new process started to
> use the slot which leads to the error shown in the log. This is
> evident from the below part of log where we can see that 16336 has
> exited after new process started to use the slot.
>
> 2021-08-15 18:44:38.027 CEST [16475:6] tap_sub LOG:  received
> replication command: START_REPLICATION SLOT "tap_sub" LOGICAL
> 0/16BEEB0 (proto_version '1', publication_names
> '"tap_pub","tap_pub_ins_only"')
> 2021-08-15 18:44:38.027 CEST [16475:7] tap_sub STATEMENT:
> START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
> publication_names '"tap_pub","tap_pub_ins_only"')
> 2021-08-15 18:44:38.027 CEST [16475:8] tap_sub ERROR:  replication
> slot "tap_sub" is active for PID 16336
> 2021-08-15 18:44:38.027 CEST [16475:9] tap_sub STATEMENT:
> START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
> publication_names '"tap_pub","tap_pub_ins_only"')
> 2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG:  disconnection:
> session time: 0:00:00.036 user=nm database=postgres host=[local]
> 2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG:  disconnection:
> session time: 0:00:06.367 user=nm database=postgres host=[local]
>
> One idea to solve this is to first disable the subscription, wait for
> the walsender process to exit, and then change the connection string
> and re-enable the subscription.

I tried to simulate this by putting delays prior to the exit of the
walsender. Even though I see the same error
in the logs that the replication slot is active for the previous PID,
the test case does not fail in the way seen in this case here..

The new walsender tries to acquire the slot but seeing that there is
another PID that is active on the slot, it
errors and exits. At no point does this new failed walsender update
its pid for the slot. As a result, the below polled query
would not return a true value

$node_publisher->poll_query_until('postgres',
"SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = 'tap_sub';"
) or die "Timed out while waiting for apply to restart";

In my runs I see that this query is repeated until a new walsender is
able to successfully acquire the slot.
I am not able to explain why this query returned true in the topminnow
tap test. This would imply that a walsender
was able to acquire the slot and update its pid but I don't see how.
We also know that if this polled query returns
a true (implying a pid other than $oldpid), then the next query in the
test is to try and identify the pid:

SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub';

>From the topminnow logs we know that this query returned a "NULL", as
the value extracted from this is used
to formulate the next query which errored out and eventually caused
the test case to fail.

 [16482:3] 001_rep_changes.pl ERROR:  syntax error at or near "FROM"
at character 16
 [16482:4] 001_rep_changes.pl STATEMENT:  SELECT pid !=  FROM
pg_stat_replication WHERE application_name = 'tap_sub';

I am not an expert in perl but I looked at the perl function used in
the tap test poll_query_until(), this would poll until the query
returns a 'true' (unless specified otherwise).
I don't see in my tests that the polled function exits if the query
returns a NULL. I don't know if differences in the installed perl can
cause this difference in
behaviour. Can a NULL set be construed as a true and cause the poll to exit?

 Any suggestions?

regards,
Ajin Cherian
Fujitsu Australia




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Craig Ringer
On Tue, 24 Aug 2021 at 02:31, Robert Haas  wrote:

> On Mon, Aug 23, 2021 at 11:40 AM Alvaro Herrera 
> wrote:
> > In that case, why not improve the API with functions that return the
> > values in some native datatype?  For scalars with native C types (int,
> > floats, Boolean etc) this is easy enough; I bet it'll solve 99% of the
> > problems or more.
>
> Sure, but ... why bother?
>
> The entire argument rests on the presumption that there is some harm
> being done by people accessing the values directly, but I don't think
> that's true. And, if it were true, it seems likely that this proposed
> API would have the exact same problem, because it would let people do
> exactly the same thing. And, going through this proposed API would
> still be significantly more expensive than just accessing the bare
> variables, because you'd at least have to do some kind of lookup based
> on the GUC name to find the corresponding variable. It's just a
> solution in search of a problem.
>
> Nothing bad happens when people write extensions that access GUC
> variables directly. It works totally, completely fine. Except on
> Windows.
>

Not only that, but postgres already exports every non-static function
symbol on both *nix and Windows, and every data symbol on *nix. A lot of
those function symbols are very internal and give anything that can call
them the ability to wreck absolute havoc on the server's state.

There is not even the thinnest pretense of Pg having a dedicated extension
API or any sort of internal vs public API separation. This ongoing pain
with PGDLLIMPORT on Windows is hard to see as anything but an excuse to
make working with and supporting Windows harder because some of us don't
like it. I happen to rather dislike working with Windows myself, but I get
to do it anyway, and I'd be very happy to remove this particular source of
pain.


Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Craig Ringer
On Mon, 23 Aug 2021 at 22:15, Tom Lane  wrote:

> Bruce Momjian  writes:
> > On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote:
> >> Then shouldn't we try to prevent direct access on all platforms rather
> than
> >> only one?
>
> > Agreed.  If Julian says 99% of the non-export problems are GUCs, and we
> > can just export them all, why not do it?  We already export every global
> > variable on Unix-like systems, and we have seen no downsides.
>
> By that argument, *every* globally-visible variable should be marked
> PGDLLIMPORT.  But the mere fact that two backend .c files need to access
> some variable doesn't mean that we want any random bit of code doing so.
>
> And yes, I absolutely would prohibit extensions from accessing many
> of them, if there were a reasonable way to do it.  It would be a good
> start towards establishing a defined API for extensions.
>

There is: -fvisibility=hidden and __attribute__((visibility("default"))) .
Or if you prefer to explicitly mark private symbols, use
__attribute__((visiblity("hidden"))) .

In addition to API surface control this also gives you a smaller export
symbol table for faster dynamic linking, and it improves link-time
optimization.

I could've sworn I proposed its use in the past but I can't find a relevant
list thread except quite a recent one. All I can find is [1] . But that is
where we should start: switch from a linker script for libpq to using
PGDLLIMPORT (actually  it'd be LIBPQDLLIMPORT) in libpq. When
-DBUILDING_LIBPQ this would expand to __declspec("dllexport") on Windows
and __attribute__((visibility("default"))) on gcc/clang. Otherwise it
expands to __declspec("dllimport") on Windows and empty on other targets.
This would also be a good time to rename the confusingly named BUILDING_DLL
macro to BUILDING_POSTGRES .

The next step would be to have PGDLLIMPORT expand to
__attribute__((visibility("default"))) on gcc/clang when building the
server itself. This won't do anything by itself since all symbols are
already default visibility. But once the "public API" of both function and
data symbol is so-annotated, we could switch to building Pg with
-fvisibility=hidden by default, and on Windows, we'd disable the linker
script that exports all functions using a generated .DEF file.


[1]
https://www.postgresql.org/message-id/CAMsr%2BYHa3TfA4rKtnZuzurwCSmxxXNQHFE3UE29BoDEQcwfuxA%40mail.gmail.com


Re: Queries that should be canceled will get stuck on secure_write function

2021-08-23 Thread Fujii Masao



On 2021/08/24 0:26, Alvaro Herrera wrote:

Aren't we talking about query cancellations that occur in response to a
standby delay limit?  Those aren't in response to user action.  What I
mean is that if the standby delay limit is exceeded, then we send a
query cancel; we expect the standby to cancel its query at that time and
then the primary can move on.  But if the standby doesn't react, then we
can have it terminate its connection.


+1


On 2021/08/24 3:45, Robert Haas wrote:

On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera  wrote:

Aren't we talking about query cancellations that occur in response to a
standby delay limit?  Those aren't in response to user action.


Oh, you're right. But I guess a similar problem could also occur in
response to pg_terminate_backend(), no?


There seems no problem in that case because pg_terminate_backend() causes
a backend to set ProcDiePending to true in die() signal hander and
ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending.
No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Craig Ringer
On Sun, 22 Aug 2021 at 21:29, Julien Rouhaud  wrote:

> On Sun, Aug 22, 2021 at 09:19:42AM -0400, Tom Lane wrote:
> >
> > Uh, no, it's exactly *not* clear.  There are a lot of GUCs that are only
> > of interest to particular subsystems.  I do not see why being a GUC makes
> > something automatically more interesting than any other global variable.
> > Usually, the fact that one is global is only so the GUC machinery itself
> > can get at it, otherwise it'd be static in the owning module.
> >
> > As for "extensions should be able to get at the values", the GUC
> machinery
> > already provides uniform mechanisms for doing that safely.  Direct access
> > to the variable's internal value would be unsafe in many cases.
>
> Then shouldn't we try to prevent direct access on all platforms rather than
> only one?
>

Yes. That's what I think we should be doing if we're not going to
PGDLLIMPORT them all.

The current API is painful because it round-trips via a text
representation. We'd at least want some kind of

GetConfigOptionBool(...)
GetConfigOptionEnum(...)

etc.

I don't understand the objection to marking them all PGDLLIMPORT anyway
though.

Any pretense that Pg has any sort of public/private API divide is pure
fantasy. Whether things are static or not, in public headers or "internal"
headers, etc,  is nearly random. We have absolutely no API surface control
such as __attribute__((visibility("hidden"))) annotations, and proposals to
add them have been soundly rejected in the past.

If we have a defined API, where is it defined and annotated?

If we don't, then Windows being different and incompatible is a bug, and
that bug should be fixed.


Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Julien Rouhaud
On Tue, Aug 24, 2021 at 12:36 PM Pavel Stehule  wrote:
>
> There are few GUC variables, where we need extra fast access - work_mem, 
> encoding settings, and maybe an application name. For others I hadn't needed 
> to access it for over 20 years. But I understand that more complex extensions 
> like timescaledb will use more internal GUC.

Note that even trivial extensions require some other GUC access.  For
instance any single extension that wants to aggregate data based on
the query_id has to store an array of query_id in shmem to know what a
parallel worker is working on.  On top of wasting memory and CPU, it
also means that computing the maximum number of backends in required.
So you need to recompute MaxBackends, which means access to multiple
GUCs.

Unfortunately the patch to expose the query_id didn't fix that problem
as it only passes the top level query_id to the pararllel workers, not
the current one.




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Pavel Stehule
po 23. 8. 2021 v 23:08 odesílatel Chapman Flack 
napsal:

> On 08/23/21 14:30, Robert Haas wrote:
> > it seems likely that this proposed
> > API would have the exact same problem, because it would let people do
> > exactly the same thing. And, going through this proposed API would
> > still be significantly more expensive than just accessing the bare
> > variables, because you'd at least have to do some kind of lookup based
> > on the GUC name
>
> I think the API ideas in [0] would not let people do exactly the same
> thing.
>
> They would avoid exposing the bare variables to overwrite. Not that
> there has been any plague of extensions going and overwriting GUCs,
> but I think in some messages on this thread I detected a sense that
> in principle it's better if an API precludes it, and that makes sense
> to me.
>
> The second idea also avoids the expense of name-based lookup (except once
> at extension initialization), and in fact minimizes the cost of obtaining
> the current value when needed, by slightly increasing the infrequent cost
> of updating values.
>
>
There are few GUC variables, where we need extra fast access - work_mem,
encoding settings, and maybe an application name. For others I hadn't
needed to access it for over 20 years. But I understand that more complex
extensions like timescaledb will use more internal GUC.

Maybe a new light API based on enum identifiers instead of string
identifiers that ensure relatively fast access (and safe and secure) can be
a good solution. And for special variables, there can be envelope functions
for very fast access. But although the special interface, the
responsibility of the extension's author will not be less, and the C
extensions will not be more trusted, so it is hard to say something about
possible benefits. I am inclined to Robert's opinion, so the current state
is not too bad, and maybe the best thing that is possible now is the
decreasing difference between supported platforms.

Regards

Pavel


>


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-23 Thread Greg Nancarrow
On Tue, Aug 24, 2021 at 5:00 AM Robert Haas  wrote:
>
>
> I think this looks pretty good. I am not sure I see any reason to
> introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we
> just use RestoreTransactionSnapshot() and then call
> PushActiveSnapshot() from parallel.c? That seems preferable to me from
> the standpoint of avoiding multiplication of APIs.
>

I initially thought this too, but RestoreTransactionSnapshot() sets up
the resultant transaction snapshot in "CurrentSnapshot", which is
static to snapmgr.c (like the other pointers to valid snapshots) and I
didn't really want to mess with the visibility of that, to allow a
call to PushActiveSnapshot(CurrentSnapshot) in parallel.c. Also, I
wasn't sure if it was safe to call GetTransactionSnapshot() here
without the risk of unwanted side-effects - but, looking at it again,
I think it is probably OK, so I did use it in my revised patch
(attached) and removed
that new function RestoreTxnSnapshotAndSetAsActive(). Do you agree
that it is OK to call GetTransactionSnapshot() here?

> I also think that the comments should explain why we are doing this,
> rather than just that we are doing this. So instead of this:
>
> +/*
> + * If the transaction snapshot was serialized, restore it and restore the
> + * active snapshot too. Otherwise, the active snapshot is also installed 
> as
> + * the transaction snapshot.
> + */
>
> ...perhaps something like:
>
> If the transaction isolation level is READ COMMITTED or SERIALIZABLE,
> the leader has serialized the transaction snapshot and we must restore
> it. At lower isolation levels, there is no transaction-lifetime
> snapshot, but we need TransactionXmin to get set to a value which is
> less than or equal to the xmin of every snapshot that will be used by
> this worker. The easiest way to accomplish that is to install the
> active snapshot as the transaction snapshot. Code running in this
> parallel worker might take new snapshots via GetTransactionSnapshot()
> or GetLatestSnapshot(), but it shouldn't have any way of acquiring a
> snapshot older than the active snapshot.
>

I agree, that is a better comment and detailed description, but didn't
you mean "If the transaction isolation level is REPEATABLE READ or
SERIALIZABLE ..."?

since we have:

#define XACT_READ_UNCOMMITTED   0
#define XACT_READ_COMMITTED 1
#define XACT_REPEATABLE_READ2
#define XACT_SERIALIZABLE   3

#define IsolationUsesXactSnapshot() (XactIsoLevel >= XACT_REPEATABLE_READ)


Regards,
Greg Nancarrow
Fujitsu Australia


v10-0001-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Support reset of Shared objects statistics in "pg_stat_reset" function

2021-08-23 Thread Mahendra Singh Thalor
On Sun, 22 Aug 2021 at 22:53, Sadhuprasad Patro  wrote:
>
> > On 2021/08/20 11:07, Mahendra Singh Thalor wrote:
> > > 1)
> > >   Resets statistics for a single table or index in the
current database
> > > -to zero.
> > > +to zero. The input can be a shared table also
> > >
> > > I think, above comment should be modified. Maybe, we can modify it as
"If input is a shared oid(table or index or toast), then resets statistics
for a single shared entry to zero.
> >
> > I'm not sure if ordinary users can understand what "shared oid" means.
Instead,
> > what about "Resets statistics for a single relation in the current
database or
> > shared across all databases in the cluster to zero."?
> >
>
> Thank you for the review here. As per the comments, attached the
> latest patch here...
>

Thanks Sadhu for the updated patch.

Patch looks good to me and I don't have any more comments.

I marked this patch as 'ready for committer'.
https://commitfest.postgresql.org/34/3282/

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


Re: Allow batched insert during cross-partition updates

2021-08-23 Thread Amit Langote
On Tue, Jul 27, 2021 at 11:32 AM Amit Langote  wrote:
> On Thu, Jul 22, 2021 at 2:18 PM vignesh C  wrote:
> > One of the test postgres_fdw has failed, can you please post an
> > updated patch for the fix:
> > test postgres_fdw ... FAILED (test process exited with
> > exit code 2) 7264 ms
>
> Thanks Vignesh.
>
> I found a problem with the underlying batching code that caused this
> failure and have just reported it here:
>
> https://www.postgresql.org/message-id/CA%2BHiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95%3DJcwUnyV%3Df0rAKQ%40mail.gmail.com
>
> Here's v8, including the patch for the above problem.

Tomas committed the bug-fix, so attaching a rebased version of the
patch for $subject.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v9-0001-Allow-batching-of-inserts-during-cross-partition-.patch
Description: Binary data


Re: prevent immature WAL streaming

2021-08-23 Thread Kyotaro Horiguchi
At Mon, 23 Aug 2021 18:52:17 -0400, Alvaro Herrera  
wrote in 
> Included 蔡梦娟 and Jakub Wartak because they've expressed interest on
> this topic -- notably [2] ("Bug on update timing of walrcv->flushedUpto
> variable").
> 
> As mentioned in the course of thread [1], we're missing a fix for
> streaming replication to avoid sending records that the primary hasn't
> fully flushed yet.  This patch is a first attempt at fixing that problem
> by retreating the LSN reported as FlushPtr whenever a segment is
> registered, based on the understanding that if no registration exists
> then the LogwrtResult.Flush pointer can be taken at face value; but if a
> registration exists, then we have to stream only till the start LSN of
> that registered entry.
> 
> This patch is probably incomplete.  First, I'm not sure that logical
> replication is affected by this problem.  I think it isn't, because
> logical replication will halt until the record can be read completely --
> maybe I'm wrong and there is a way for things to go wrong with logical
> replication as well.  But also, I need to look at the other uses of
> GetFlushRecPtr() and see if those need to change to the new function too
> or they can remain what they are now.
> 
> I'd also like to have tests.  That seems moderately hard, but if we had
> WAL-molasses that could be used in walreceiver, it could be done. (It
> sounds easier to write tests with a molasses-archive_command.)
> 
> 
> [1] https://postgr.es/m/cbddfa01-6e40-46bb-9f98-9340f4379...@amazon.com
> [2] 
> https://postgr.es/m/3f9c466d-d143-472c-a961-66406172af96.mengjuan@alibaba-inc.com

(I'm not sure what "WAL-molasses" above expresses, same as "sugar"?)

For our information, this issue is related to the commit 0668719801
which makes XLogPageRead restart reading a (continued or
segments-spanning) record with switching sources.  In that thread, I
modifed the code to cause a server crash under the desired situation.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: .ready and .done files considered harmful

2021-08-23 Thread Kyotaro Horiguchi
(sigh..)

At Tue, 24 Aug 2021 11:35:06 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > IIUC partial WAL files are handled because the next file in the
> > sequence with the given TimeLineID won't be there, so we will fall
> > back to a directory scan and pick it up.  Timeline history files are
> > handled by forcing a directory scan, which should work because they
> > always have the highest priority.  Backup history files, however, do
> > not seem to be handled.  I think one approach to fixing that is to
> > also treat backup history files similarly to timeline history files.
> > If one is created, we force a directory scan, and the directory scan
> > logic will consider backup history files as higher priority than
> > everything but timeline history files.
> 
> Backup history files are (currently) just informational and they are
> finally processed at the end of a bulk-archiving performed by the fast
> path.  However, I feel that it is cleaner to trigger a directory scan
> every time we add an other-than-a-regular-WAL-file, as base-backup or
- promotion are not supposed happen so infrequently.
+ promotion are not supposed happen so frequently.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: .ready and .done files considered harmful

2021-08-23 Thread Kyotaro Horiguchi
At Tue, 24 Aug 2021 00:03:37 +, "Bossart, Nathan"  
wrote in 
> On 8/23/21, 10:49 AM, "Robert Haas"  wrote:
> > On Mon, Aug 23, 2021 at 11:50 AM Bossart, Nathan  
> > wrote:
> >> To handle a "cheating" archive command, I'd probably need to add a
> >> stat() for every time pgarch_readyXLog() returned something from
> >> arch_files.  I suspect something similar might be needed in Dipesh's
> >> patch to handle backup history files and partial WAL files.
> >
> > I think he's effectively got that already, although it's probably
> > inside of pgarch_readyXLog(). The idea there is that instead of having
> > a cache of files to be returned (as in your case) he just checks
> > whether the next file in sequence happens to be present and if so
> > returns that file name. To see whether it's present, he uses stat().
> 
> IIUC partial WAL files are handled because the next file in the
> sequence with the given TimeLineID won't be there, so we will fall
> back to a directory scan and pick it up.  Timeline history files are
> handled by forcing a directory scan, which should work because they
> always have the highest priority.  Backup history files, however, do
> not seem to be handled.  I think one approach to fixing that is to
> also treat backup history files similarly to timeline history files.
> If one is created, we force a directory scan, and the directory scan
> logic will consider backup history files as higher priority than
> everything but timeline history files.

Backup history files are (currently) just informational and they are
finally processed at the end of a bulk-archiving performed by the fast
path.  However, I feel that it is cleaner to trigger a directory scan
every time we add an other-than-a-regular-WAL-file, as base-backup or
promotion are not supposed happen so infrequently.

> I've been looking at the v9 patch with fresh eyes, and I still think
> we should be able to force the directory scan as needed in
> XLogArchiveNotify().  Unless the file to archive is a regular WAL file
> that is > our stored location in archiver memory, we should force a
> directory scan.  I think it needs to be > instead of >= because we
> don't know if the archiver has just completed a directory scan and
> found a later segment to use to update the archiver state (but hasn't
> yet updated the state in shared memory).

I'm afraid that it can be seen as a violation of modularity. I feel
that wal-emitter side should not be aware of that datail of
archiving. Instead, I would prefer to keep directory scan as far as it
found an smaller segment id than the next-expected segment id ever
archived by the fast-path (if possible).  This would be
less-performant in the case out-of-order segments are frequent but I
think the overall objective of the original patch will be kept.

> Also, I think we need to make sure to set PgArch->dirScan back to true
> at the end of pgarch_readyXlog() unless we've found a new regular WAL
> file that we can use to reset the archiver's stored location.  This
> ensures that we'll keep doing directory scans as long as there are
> timeline/backup history files to process.  

Right.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-23 Thread Masahiko Sawada
On Tue, Aug 24, 2021 at 10:01 AM houzj.f...@fujitsu.com
 wrote:
>
>
>
> > -Original Message-
> > From: Masahiko Sawada 
> > Sent: Tuesday, August 24, 2021 8:52 AM
> >
> > Thanks. The patch for HEAD looks good to me. Regarding the patch for PG14, I
> > think we need to update the comment as well:
> >
> > @@ -987,7 +986,7 @@ AlterSubscription(AlterSubscriptionStmt *stmt, bool
> > isTopLevel)
> >   PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with
> > refresh");
> >
> >   /* Only refresh the added/dropped list of publications. */
> > - sub->publications = stmt->publication;
> > + sub->publications = publist;
> >
>
> Thanks for the comment, attach new version patch which fixed it.

Thank you for updating the patch! Looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-23 Thread houzj.f...@fujitsu.com


> -Original Message-
> From: Masahiko Sawada 
> Sent: Tuesday, August 24, 2021 8:52 AM
> 
> Thanks. The patch for HEAD looks good to me. Regarding the patch for PG14, I
> think we need to update the comment as well:
> 
> @@ -987,7 +986,7 @@ AlterSubscription(AlterSubscriptionStmt *stmt, bool
> isTopLevel)
>   PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with
> refresh");
> 
>   /* Only refresh the added/dropped list of publications. */
> - sub->publications = stmt->publication;
> + sub->publications = publist;
> 

Thanks for the comment, attach new version patch which fixed it.

Best regards,
Hou zj


v8-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch
Description:  v8-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch


v8-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch
Description:  v8-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch


Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-23 Thread Masahiko Sawada
On Mon, Aug 23, 2021 at 11:05 PM houzj.f...@fujitsu.com
 wrote:
>
> On Mon, Aug 23, 2021 8:01 PM Amit Kapila  wrote:
> > On Mon, Aug 23, 2021 at 2:45 PM 
> > houzj.f...@fujitsu.com wrote:
> > >
> > > On Mon, Aug 23, 2021 12:59 PM Amit Kapila  wrote:
> > > >
> > > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com 
> > > >  wrote:
> > > > >
> > > > > Personally, I also think it will be better to make the behavior 
> > > > > consistent.
> > > > > Attach the new version patch make both ADD and DROP behave the
> > > > > same as SET PUBLICATION which refresh all the publications.
> > > > >
> > > >
> > > > I think we can have tests in the separate test file
> > > > (alter_sub_pub.pl) like you earlier had in one of the versions. Use
> > > > some meaningful names for tables instead of temp1, temp2 as you had in 
> > > > the
> > previous version.
> > > > Otherwise, the code changes look good to me.
> > >
> > > Thanks for the comment.
> > > Attach new version patch which did the following changes.
> > >
> > > * move the tests to a separate test file (024_alter_sub_pub.pl)
> > > * adjust the document of copy_data option
> > > * add tab-complete for copy_data option when ALTER DROP publication.
> > >
> >
> > Thanks, the patch looks good to me. I have made some cosmetic changes in the
> > attached version. It makes the test cases easier to understand.
> > I am planning to push this tomorrow unless there are more comments or
> > suggestions.
>
> Thanks ! The patch looks good to me.
>
> I noticed that the patch cannot be applied to PG14-stable,
> so I attach a separate patch for back-patch(I didn’t change the patch for 
> HEAD branch).

Thanks. The patch for HEAD looks good to me. Regarding the patch for
PG14, I think we need to update the comment as well:

@@ -987,7 +986,7 @@ AlterSubscription(AlterSubscriptionStmt *stmt,
bool isTopLevel)
  PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");

  /* Only refresh the added/dropped list of publications. */
- sub->publications = stmt->publication;
+ sub->publications = publist;

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-23 Thread Bruce Momjian
On Mon, Aug 23, 2021 at 04:57:31PM -0400, Robert Haas wrote:
> On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda  wrote:
> > Thanks Robert for your comments.
> > I have split the patch into two portions. One that handles DB OID and
> > the other that
> > handles tablespace OID and relfilenode OID.
> 
> It's pretty clear from the discussion, I think, that the database OID
> one is going to need rework to be considered.

I assume this patch is not going to be applied until there is an actual
use case for preserving these values.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: .ready and .done files considered harmful

2021-08-23 Thread Bossart, Nathan
On 8/23/21, 10:49 AM, "Robert Haas"  wrote:
> On Mon, Aug 23, 2021 at 11:50 AM Bossart, Nathan  wrote:
>> To handle a "cheating" archive command, I'd probably need to add a
>> stat() for every time pgarch_readyXLog() returned something from
>> arch_files.  I suspect something similar might be needed in Dipesh's
>> patch to handle backup history files and partial WAL files.
>
> I think he's effectively got that already, although it's probably
> inside of pgarch_readyXLog(). The idea there is that instead of having
> a cache of files to be returned (as in your case) he just checks
> whether the next file in sequence happens to be present and if so
> returns that file name. To see whether it's present, he uses stat().

IIUC partial WAL files are handled because the next file in the
sequence with the given TimeLineID won't be there, so we will fall
back to a directory scan and pick it up.  Timeline history files are
handled by forcing a directory scan, which should work because they
always have the highest priority.  Backup history files, however, do
not seem to be handled.  I think one approach to fixing that is to
also treat backup history files similarly to timeline history files.
If one is created, we force a directory scan, and the directory scan
logic will consider backup history files as higher priority than
everything but timeline history files.

I've been looking at the v9 patch with fresh eyes, and I still think
we should be able to force the directory scan as needed in
XLogArchiveNotify().  Unless the file to archive is a regular WAL file
that is > our stored location in archiver memory, we should force a
directory scan.  I think it needs to be > instead of >= because we
don't know if the archiver has just completed a directory scan and
found a later segment to use to update the archiver state (but hasn't
yet updated the state in shared memory).

Also, I think we need to make sure to set PgArch->dirScan back to true
at the end of pgarch_readyXlog() unless we've found a new regular WAL
file that we can use to reset the archiver's stored location.  This
ensures that we'll keep doing directory scans as long as there are
timeline/backup history files to process.  

Nathan



prevent immature WAL streaming

2021-08-23 Thread Alvaro Herrera
Included 蔡梦娟 and Jakub Wartak because they've expressed interest on
this topic -- notably [2] ("Bug on update timing of walrcv->flushedUpto
variable").

As mentioned in the course of thread [1], we're missing a fix for
streaming replication to avoid sending records that the primary hasn't
fully flushed yet.  This patch is a first attempt at fixing that problem
by retreating the LSN reported as FlushPtr whenever a segment is
registered, based on the understanding that if no registration exists
then the LogwrtResult.Flush pointer can be taken at face value; but if a
registration exists, then we have to stream only till the start LSN of
that registered entry.

This patch is probably incomplete.  First, I'm not sure that logical
replication is affected by this problem.  I think it isn't, because
logical replication will halt until the record can be read completely --
maybe I'm wrong and there is a way for things to go wrong with logical
replication as well.  But also, I need to look at the other uses of
GetFlushRecPtr() and see if those need to change to the new function too
or they can remain what they are now.

I'd also like to have tests.  That seems moderately hard, but if we had
WAL-molasses that could be used in walreceiver, it could be done. (It
sounds easier to write tests with a molasses-archive_command.)


[1] https://postgr.es/m/cbddfa01-6e40-46bb-9f98-9340f4379...@amazon.com
[2] 
https://postgr.es/m/3f9c466d-d143-472c-a961-66406172af96.mengjuan@alibaba-inc.com

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
>From 2c5d347e9b6819ba59fa0e72e4227a2cb4d1230f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 23 Aug 2021 18:25:48 -0400
Subject: [PATCH] Don't stream non-final WAL segments

Avoid setting the physical-stream replication read pointer in the middle
of a WAL record.  This can occur if a record is split in two (or more)
across segment boundaries.  The reason to avoid it is that if we stream
the segment containing the first half, and then we crash before writing
the next segment, the primary will rewrite the tail of the segment with
a new WAL record (having discarded the incomplete record), but the
replica will be stuck trying to replay a broken file (since the next
segment will never contain the now-gone data).

To do this, change streaming replication to retreat the flush pointer
according to registered segment boundaries.
---
 src/backend/access/transam/xlog.c   | 39 ++---
 src/backend/replication/walsender.c |  2 +-
 src/include/access/xlog.h   |  1 +
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24165ab03e..2b85d1b2ae 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -731,8 +731,10 @@ typedef struct XLogCtlData
 	 */
 	XLogSegNo	lastNotifiedSeg;
 	XLogSegNo	earliestSegBoundary;
+	XLogRecPtr	earliestSegBoundaryStartPtr;
 	XLogRecPtr	earliestSegBoundaryEndPtr;
 	XLogSegNo	latestSegBoundary;
+	XLogRecPtr	latestSegBoundaryStartPtr;
 	XLogRecPtr	latestSegBoundaryEndPtr;
 
 	slock_t		segtrack_lck;	/* locks shared variables shown above */
@@ -932,7 +934,7 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo,
 		   XLogSegNo *endlogSegNo);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
-static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos);
+static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr startpos, XLogRecPtr endpos);
 static void CleanupBackupHistory(void);
 static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
@@ -1183,11 +1185,11 @@ XLogInsertRecord(XLogRecData *rdata,
 		 *
 		 * Note that we did not use XLByteToPrevSeg() for determining the
 		 * ending segment.  This is so that a record that fits perfectly into
-		 * the end of the segment causes the latter to get marked ready for
+		 * the end of the segment causes said segment to get marked ready for
 		 * archival immediately.
 		 */
 		if (StartSeg != EndSeg && XLogArchivingActive())
-			RegisterSegmentBoundary(EndSeg, EndPos);
+			RegisterSegmentBoundary(EndSeg, StartPos, EndPos);
 
 		/*
 		 * Advance LogwrtRqst.Write so that it includes new block(s).
@@ -4398,7 +4400,7 @@ ValidateXLOGDirectoryStructure(void)
  * to delay until the end segment is known flushed.
  */
 static void
-RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr endpos)
+RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr startpos, XLogRecPtr endpos)
 {
 	XLogSegNo	segno PG_USED_FOR_ASSERTS_ONLY;
 
@@ -4415,6 +4417,7 @@ RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr endpos)
 	if (XLogCtl->earliestSegBoundary == MaxXLogSegNo)
 	{
 		XLogCtl->earliestSegBoundary = seg;
+		XLogCtl->earliestSegBoundaryStartPtr = startpos;
 		XLogCtl->earliestSegBoundaryEndPtr = endpos;
 	}
 	else if (seg > 

Re: The Free Space Map: Problems and Opportunities

2021-08-23 Thread Peter Geoghegan
On Fri, Aug 20, 2021 at 1:30 PM Robert Haas  wrote:
> On Fri, Aug 20, 2021 at 3:40 PM Peter Geoghegan  wrote:
> > My concern here is really the data structure and its overall
> > complexity.

> I don't think I understand the data structure that you have in mind
> well enough to comment intelligently.

Right now my prototype has a centralized table in shared memory, with
a hash table. One entry per relation, generally multiple freelists per
relation. And with per-freelist metadata such as owner and original
leader backend XID values. Plus of course the lists of free blocks
themselves. The prototype already clearly fixes the worst problems
with BenchmarkSQL, but that's only one of my goals. That's just the
starting point.

I appreciate your input on this. And not just on the implementation
details -- the actual requirements themselves are still in flux. This
isn't so much a project to replace the FSM as it is a project that
adds a new rich abstraction layer that goes between access methods and
smgr.c -- free space management is only one of the responsibilities.

Access methods like heapam can talk to the new abstraction layer,
sometimes exploiting semantics from the logical database -- owning
transaction IDs for a given lease, things like that. The new
abstraction may thereby help us when working on other problems,
including problems in areas that might seem totally unrelated. That's
the big idea. Here are some (admittedly quite speculative) examples of
places that the infrastructure could help with:

* Parallel INSERT/COPY.

I imagine parallel DML is hard because of coordination problems. With
the infrastructure I have in mind, a parallel DML implementation could
request a large free space lease, and distribute blocks from that
lease among parallel workers during execution, on demand. This
naturally avoids unnecessary buffer lock contention among concurrent
writes from the same transaction (but different workers).

But that's not all. I'm also thinking of things like LWLock/buffer
lock deadlock avoidance -- issues of fundamental correctness. That
stuff becomes easier, too. Individual parallel workers from a parallel
INSERT could safely assume that the blocks that the parallel leader
tells them about really is only going to be accessed by them, for as
long as the owning transaction is around (barring read-only access by
VACUUM, things like that). There is simply no question of any other
backend/parallel worker thinking that it has the right to use the same
block before the transaction commits.

The transaction-scoped ownership semantics must be truly robust for
this to work at all, but that's a good idea anyway.

* Truncate tables without a disruptive relation-level lock.

AFAICT it isn't actually logically necessary to get a heavyweight
relation extension lock to truncate a relation during VACUUM. The only
reason we need that right now is because it's really hard to nail down
which backends might have a usable BlockNumber that they still expect
to not point past the end of the physical relfilenode (or much worse,
point to new and logically unrelated data). Perhaps we can solve the
underlying problem using something like Lanin & Shasha's "drain
technique", which is used inside nbtree for page deletion today.

The general idea is to defer the physical unlink/recycling until we
know for sure that any backends that might have had unsafe references
have gone away. This is a very broad technique.

* Truncate-away empty heap table segments that happen to come before
other segments that still have useful data.

* Eager cleanup of aborted transactions by VACUUM (or some variation
of VACUUM that just handles aborts).

Again, we're using information about the logical database to solve
problem with the physical database -- that information is what we seem
to currently be missing in all these areas. That's really the central
idea behind the new infrastructure.

-- 
Peter Geoghegan




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-23 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda  wrote:
> > Thanks Robert for your comments.
> > I have split the patch into two portions. One that handles DB OID and
> > the other that
> > handles tablespace OID and relfilenode OID.
> 
> It's pretty clear from the discussion, I think, that the database OID
> one is going to need rework to be considered.

Regarding that ... I have to wonder just what promises we feel we've
made when it comes to what a user is expected to be able to do with the
new cluster *before* pg_upgrade is run on it.  For my part, I sure feel
like it's "nothing", in which case it seems like we can do things that
we can't do with a running system, like literally just DROP and recreate
with the correct OID of any databases we need to, or even push that back
to the user to do that at initdb time with some kind of error thrown by
pg_upgrade during the --check phase.  "Initial databases have
non-standard OIDs, recreate destination cluster with initdb
--with-oid=12341" or something along those lines.

Also open to the idea of simply forcing 'template1' to always being
OID=1 even if it's dropped/recreated and then just dropping/recreating
the template0 and postgres databases if they've got different OIDs than
what the old cluster did- after all, they should be getting entirely
re-populated as part of the pg_upgrade process itself.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Chapman Flack
On 08/23/21 14:30, Robert Haas wrote:
> it seems likely that this proposed
> API would have the exact same problem, because it would let people do
> exactly the same thing. And, going through this proposed API would
> still be significantly more expensive than just accessing the bare
> variables, because you'd at least have to do some kind of lookup based
> on the GUC name

I think the API ideas in [0] would not let people do exactly the same thing.

They would avoid exposing the bare variables to overwrite. Not that
there has been any plague of extensions going and overwriting GUCs,
but I think in some messages on this thread I detected a sense that
in principle it's better if an API precludes it, and that makes sense
to me.

The second idea also avoids the expense of name-based lookup (except once
at extension initialization), and in fact minimizes the cost of obtaining
the current value when needed, by slightly increasing the infrequent cost
of updating values.

Regards,
-Chap


[0] https://www.postgresql.org/message-id/6123C425.3080409%40anastigmatix.net




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-23 Thread Robert Haas
On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda  wrote:
> Thanks Robert for your comments.
> I have split the patch into two portions. One that handles DB OID and
> the other that
> handles tablespace OID and relfilenode OID.

It's pretty clear from the discussion, I think, that the database OID
one is going to need rework to be considered.

Regarding the other one:

- The comment in binary_upgrade_set_pg_class_oids() is still not
accurate. You removed the sentence which says "Indexes cannot have
toast tables, so we need not make this probe in the index code path"
but the immediately preceding sentence is still inaccurate in at least
two ways. First, it only talks about tables, but the code now applies
to indexes. Second, it only talks about OIDs, but now also deals with
refilenodes. It's really important to fully update every comment that
might be affected by your changes!

- The SQL query in that function isn't completely correct. There is a
left join from pg_class to pg_index whose ON clause includes
"c.reltoastrelid = i.indrelid AND i.indisvalid." The reason it's
likely that is because it is possible, in corner cases, for a TOAST
table to have multiple TOAST indexes. I forget exactly how that
happens, but I think it might be like if a REINDEX CONCURRENTLY on the
TOAST table fails midway through, or something of that sort. Now if
that happens, the LEFT JOIN you added is going to cause the output to
contain multiple rows, because you didn't replicate the i.indisvalid
condition into that ON clause. And then it will fail. Apparently we
don't have a pg_upgrade test case for this scenario; we probably
should. Actually what I think would be even better than putting
i.indisvalid into that ON clause would be to join off of i.indrelid
rather than c.reltoastrelid.

- The code that decodes the various columns of this query does so in a
slightly different order than the query itself. It would be better to
make it match. Perhaps put relkind first in both cases. I might also
think about trying to make the column naming a bit more consistent,
e.g. relkind, relfilenode, toast_oid, toast_relfilenode,
toast_index_oid, toast_index_relfilenode.

- In heap_create(), the wording of the error messages is not quite
consistent. You have "relfilenode value not set when in binary upgrade
mode", "toast relfilenode value not set when in binary upgrade mode",
and "pg_class index relfilenode value not set when in binary upgrade
mode". Why does the last one mention pg_class when the other two
don't?

- The code in heap_create() now has no comments whatsoever, which is a
shame, because it's actually kind of a tricky bit of logic. Someone
might wonder why we override the relfilenode inside that function
instead of doing it at the same places where we absorb
binary_upgrade_next_{heap,index,toast}_pg_class_oid and the passing
down the relfilenode. I think the answer is that passing down the
relfilenode from the caller would result in storage not actually being
created, whereas in this case we want it to be created but just with
the value we specify, and the reason we want that is because we need
later DDL that happens after these statements but before the old
cluster's relations are moved over to execute successfully, which it
won't if the storage is altogether absent.

However, that raises the question of whether this patch has even got
the basic design right. Maybe we ought to actually be absorbing the
relfilenode setting at the same places where we're doing so for the
OID, and then passing an additional parameter to heap_create() like
bool suppress_storage or something like that. Maybe, taking it even
further, we ought to be changing the signatures of
binary_upgrade_next_heap_pg_class_oid and friends to be two-argument
functions, and pass down the OID and the relfilenode in the same call,
rather than calling two separate functions. I'm not so much concerned
about the cost of calling two functions as the potential for
confusion. I'm not honestly sure that either of these changes are the
right thing to do, but I am pretty strongly inclined to do at least
the first part - trying to absorb reloid and relfilenode in the same
places. If we're not going to do that we certainly need to explain why
we're doing it the way we are in the comments.

It's not really this patch's fault, but it would sure be nice if we
had some better testing for this area. Suppose this patch somehow
changed nothing from the present behavior. How would we know? Or
suppose it managed to somehow set all the relfilenodes in the new
cluster to random values rather than the intended one? There's no
automated testing that would catch any of that, and it's not obvious
how it could be added to test.sh. I suppose what we really need to do
at some point is rewrite that as a TAP test, but that seems like a
separate project from this patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-23 Thread Stephen Frost
Greetings,

* Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > On Aug 23, 2021, at 12:51 PM, Stephen Frost  wrote:
> > Simply using superuser_arg() isn't sufficient is exactly the point that
> > I'm trying to make.  As a 'landlord', I might very well want to have
> > some kind of 'landlord' role that isn't directly a superuser but which
> > could *become* a superuser by having been GRANT'd a superuser role- but
> > I certainly don't want that role's objects to be able to be messed with
> > by the tenant.
> 
> > If one of those other non-superuser roles is, itself, a role that can
> > become a superuser 
> 
> If you have a sandbox-superuser who can do anything within the sandbox but 
> nothing outside the sandbox, then you need a pretty good wall at the 
> periphery of the sandbox.  Breaking sandbox-superuser-ishness into multiple 
> distinct privileges rather than one monolithic privilege doesn't change the 
> need for a good wall at the periphery.  The pg_manage_database_objects role 
> doesn't encompass all sandbox-superuser privileges, but it is on that side of 
> the wall.
> 
> We could agree to move the wall a little, and say that non-superuser roles 
> who have the ability to become superusers are on the other side of the wall.  
> That's fine.  I'd have to rework the patch a bit, but conceptually that seems 
> doable.  We could also say that non-superusers who are members of privileged 
> roles (pg_execute_server_programs, pg_signal_backend, etc) are likewise on 
> the other side of that wall.
> 
> Does that work?

I'd much rather we go down the path that Robert had suggested where we
find a way to make a connection between the tenant role and everything
that they create, and leave everything that is outside of that box on
the other side of the 'wall'.  There's also the risk that the landlord
creates a role one day but then GRANT's superuser rights to that role on
another day, that happened to be after the tenant managed to gain
control of that role.  That kind of thing is something we should work
hard to make difficult to happen- the landlord should have to explicitly
give the tenant control over something that the landlord creates, it
shouldn't happen automagically.

Having hard-coded lists of which predefined roles are 'ok' and which
aren't sounds generally bad and I don't think we'd actually want to
include all predefined roles in that list either (even if it'd be fine
today, which I don't think it is given things like pg_monitor and
pg_signal_backend, though perhaps there could be some debate over
those...).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-23 Thread Mark Dilger



> On Aug 23, 2021, at 12:51 PM, Stephen Frost  wrote:
> 
> Simply using superuser_arg() isn't sufficient is exactly the point that
> I'm trying to make.  As a 'landlord', I might very well want to have
> some kind of 'landlord' role that isn't directly a superuser but which
> could *become* a superuser by having been GRANT'd a superuser role- but
> I certainly don't want that role's objects to be able to be messed with
> by the tenant.

> If one of those other non-superuser roles is, itself, a role that can
> become a superuser 

If you have a sandbox-superuser who can do anything within the sandbox but 
nothing outside the sandbox, then you need a pretty good wall at the periphery 
of the sandbox.  Breaking sandbox-superuser-ishness into multiple distinct 
privileges rather than one monolithic privilege doesn't change the need for a 
good wall at the periphery.  The pg_manage_database_objects role doesn't 
encompass all sandbox-superuser privileges, but it is on that side of the wall.

We could agree to move the wall a little, and say that non-superuser roles who 
have the ability to become superusers are on the other side of the wall.  
That's fine.  I'd have to rework the patch a bit, but conceptually that seems 
doable.  We could also say that non-superusers who are members of privileged 
roles (pg_execute_server_programs, pg_signal_backend, etc) are likewise on the 
other side of that wall.

Does that work?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote:

> On 8/23/21, 12:55 PM, "alvhe...@alvh.no-ip.org"  
> wrote:
> > Thanks.  I've pushed this now to all live branches.
> 
> Thank you!  I appreciate the thorough reviews.  Should we make a new
> thread for the streaming replication fix?

Yeah, this one is long enough :-)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: archive status ".ready" files may be created too early

2021-08-23 Thread Bossart, Nathan
On 8/23/21, 12:55 PM, "alvhe...@alvh.no-ip.org"  wrote:
> Thanks.  I've pushed this now to all live branches.

Thank you!  I appreciate the thorough reviews.  Should we make a new
thread for the streaming replication fix?

Nathan



Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread David Fetter
On Mon, Aug 23, 2021 at 10:56:52AM -0400, Robert Haas wrote:
> On Mon, Aug 23, 2021 at 10:15 AM Tom Lane  wrote:
> > And yes, I absolutely would prohibit extensions from accessing many
> > of them, if there were a reasonable way to do it.  It would be a good
> > start towards establishing a defined API for extensions.
> 
> Mostly, it would make extension development more difficult for no
> discernable benefit to the project.
> 
> You've made this argument many times over the years ... but we're no
> closer to having an extension API than we ever were, and we continue
> to get complaints about breaking stuff on Windows on a pretty regular
> basis.
> 
> Honestly, it seems unimaginable that an API is ever really going to be
> possible. It would be a ton of work to maintain, and we'd just end up
> breaking it every time we discover that there's a new feature we want
> to implement which doesn't fit into the defined API now. That's what
> we do *now* with functions that third-party extensions actually call,
> and with variables that they access, and it's not something that, in
> my experience, is any great problem in maintaining an extension.
> You're running code that is running inside somebody else's executable
> and sometimes you have to adjust it for upstream changes. That's life,
> and I don't think that complaints about that topic are nearly as
> frequent as complaints about extensions breaking on Windows because of
> missing PGDLLIMPORT markings.
> 
> And more than that, I'm pretty sure that you've previously taken the
> view that we shouldn't document all the hook functions that only exist
> in the backend for the purpose of extension use.

As the person on the receiving end of that one, I was nonplussed, so I
took a step back to think it over.  I recognized at that time that I
didn't have a great answer for a legitimate concern, namely that any
change, however trivial, that goes into the core and doesn't go
directly to core database functionality, represents a long-term
maintenance burden.

The thing I'm coming to is that the key architectural feature
PostgreSQL has that other RDBMSs don't is its extensibility. Because
that's been a stable feature over time, I'm pretty sure we actually
need to see documenting that as a thing that does actually go to core
database functionality. Yes, there are resources involved with doing a
thing like this, but I don't think that they require constant or even
frequent attention from committers or even from seasoned DB hackers.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote:

> Ah, okay.  BTW the other changes you mentioned made sense to me.

Thanks.  I've pushed this now to all live branches.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-23 Thread Stephen Frost
Greetings,

* Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > On Aug 23, 2021, at 11:13 AM, Stephen Frost  wrote:
> > This I have to object to pretty strongly- we have got to get away from
> > the idea that just because X isn't a superuser or isn't owned by a
> > superuser that it's fine to allow some non-superuser to mess with it.
> 
> I thought we were trying to create a set of roles which could cumulatively do 
> everything *inside the sandbox* that superuser can do, but which cannot 
> escape the sandbox.  I would think this pg_manage_database_objects role would 
> be reasonable in the context of that effort.

I wasn't objecting to the general concept of trying to have a role that
can do lots of things inside the sandbox but aren't allowed to escape
it.  I was specifically objecting to the idea that just checking if an
object is directly owned by a superuser is not sufficient to prevent a
role from being able to escape the sandbox.

> > In particlar, just because a role isn't explicitly marked as a superuser
> > doesn't mean that the role can't *become* a superuser, or that it hasn't
> > got privileged access to the system in other ways, such as by being a
> > member of other predefined roles that perhaps the role who is a member
> > of pg_manage_database_objects doesn't have.
> 
> The implementation does not allow pg_manage_database_objects to mess with 
> objects that are owned by a role which satisfies superuser_arg().  If you are 
> renting out a database to a tenant and change the ownership of stuff to a 
> non-superuser, then you get what you get.  But why would you do that?

Simply using superuser_arg() isn't sufficient is exactly the point that
I'm trying to make.  As a 'landlord', I might very well want to have
some kind of 'landlord' role that isn't directly a superuser but which
could *become* a superuser by having been GRANT'd a superuser role- but
I certainly don't want that role's objects to be able to be messed with
by the tenant.

> > Such a check against
> > modifying of "superuser owned" objects implies that it's providing some
> > kind of protection against the role being able to become a superuser
> > when it doesn't actually provide that protection in any kind of reliable
> > fashion and instead ends up fooling the user.
> 
> Please provide steps to reproduce this issue.  Assume that a database is 
> initialized and that everything is owned by the system.  A "tenant" role is 
> created and granted pg_manage_database_objects, and other non-superuser roles 
> are created.  Now, what exactly can "tenant" do that you find objectionable?

If one of those other non-superuser roles is, itself, a role that can
become a superuser and it decides to create some functions for its own
purposes, then the tenant role would be able to modify those functions,
allowing the tenant to gain access to the non-superuser role, and from
there being able to gain access to superuser.

Something along these lines, basically:

CREATE USER tenant;
GRANT pg_manage_database_objects TO tenant;
CREATE USER landlord;
GRANT postgres TO landlord;
SET ROLE landlord;
CREATE FUNCTION do_stuff();
put call to do_stuff() into a cronjob
SET ROLE tenant;
CREATE OR REPLACE do_stuff(); -- with code to take over landlord

poof- tenant has ability to be landlord and then further to become
postgres.

All of the above applies beyond just superuser too- consider a
non-superuser role which has been grant'd pg_execute_server_program.
That won't trip up superuser_arg() but it sure would allow a role to
break out of the sandbox.

> > This is the issue with CREATEROLE and we definitely shouldn't be
> > doubling-down on that mistake, and also brings up the point that I, at
> > least, had certainly hoped that part of this effort would include a way
> > for roles to be created by a user with an appropriate predefined role,
> > and w/o CREATEROLE (which would then be deprecated or, ideally, just
> > outright removed).
> 
> Well, pg_manage_database_objects has no special ability to create or drop 
> roles.  I thought separating those powers made more sense than grouping them 
> together.  We can have a new role for doing what you say, but that seems 
> redundant with CREATEROLE.  I didn't want this patch set to be bogged down 
> waiting for a consensus on how to change the CREATEROLE privilege.

CREATEROLE doesn't work to give to folks generally because of the issues
above- its check is, similarly, too simple and always has been.  This
isn't news either, it's been discussed in various places from time to
time and is part of why people who run cloud providers end up either not
giving out that role attribute and providing another way, or they hack
up the PG core code to handle the way that attribute works differently.

> >  I get that this doesn't have to be in the first
> > patch or even patch set going down this road but the lack of discussion
> > or of any coordination between this effort and the other one that is
> > trying to 

Re: Postgres perl module namespace

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 3:03 PM Andrew Dunstan  wrote:
> OK, I count 3 in favor of changing to PgTest::Cluster, 1 against,
> remainder don't care.

I'd have gone with something starting with Postgres:: ... but I don't care much.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-23 Thread Mark Dilger



> On Aug 23, 2021, at 11:13 AM, Stephen Frost  wrote:
> 
> This I have to object to pretty strongly- we have got to get away from
> the idea that just because X isn't a superuser or isn't owned by a
> superuser that it's fine to allow some non-superuser to mess with it.

I thought we were trying to create a set of roles which could cumulatively do 
everything *inside the sandbox* that superuser can do, but which cannot escape 
the sandbox.  I would think this pg_manage_database_objects role would be 
reasonable in the context of that effort.

> In particlar, just because a role isn't explicitly marked as a superuser
> doesn't mean that the role can't *become* a superuser, or that it hasn't
> got privileged access to the system in other ways, such as by being a
> member of other predefined roles that perhaps the role who is a member
> of pg_manage_database_objects doesn't have.

The implementation does not allow pg_manage_database_objects to mess with 
objects that are owned by a role which satisfies superuser_arg().  If you are 
renting out a database to a tenant and change the ownership of stuff to a 
non-superuser, then you get what you get.  But why would you do that?

> Such a check against
> modifying of "superuser owned" objects implies that it's providing some
> kind of protection against the role being able to become a superuser
> when it doesn't actually provide that protection in any kind of reliable
> fashion and instead ends up fooling the user.

Please provide steps to reproduce this issue.  Assume that a database is 
initialized and that everything is owned by the system.  A "tenant" role is 
created and granted pg_manage_database_objects, and other non-superuser roles 
are created.  Now, what exactly can "tenant" do that you find objectionable?

> This is the issue with CREATEROLE and we definitely shouldn't be
> doubling-down on that mistake, and also brings up the point that I, at
> least, had certainly hoped that part of this effort would include a way
> for roles to be created by a user with an appropriate predefined role,
> and w/o CREATEROLE (which would then be deprecated or, ideally, just
> outright removed).

Well, pg_manage_database_objects has no special ability to create or drop 
roles.  I thought separating those powers made more sense than grouping them 
together.  We can have a new role for doing what you say, but that seems 
redundant with CREATEROLE.  I didn't want this patch set to be bogged down 
waiting for a consensus on how to change the CREATEROLE privilege.

>  I get that this doesn't have to be in the first
> patch or even patch set going down this road but the lack of discussion
> or of any coordination between this effort and the other one that is
> trying to address the CREATEROLE issue seems likely to land us in a bad
> place with two distinct approaches being used.

I'm confused.  This patch set doesn't come within a country mile of CREATEROLE. 
 Why should this patch set have to coordinate with that one?  I'm not arguing 
with you -- merely asking what I'm misunderstanding?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-23 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Aug 23, 2021 at 2:13 PM Stephen Frost 
> wrote:> This I have to object to pretty strongly- we have got to get
> away from
> > the idea that just because X isn't a superuser or isn't owned by a
> > superuser that it's fine to allow some non-superuser to mess with it.
> > In particlar, just because a role isn't explicitly marked as a superuser
> > doesn't mean that the role can't *become* a superuser, or that it hasn't
> > got privileged access to the system in other ways, such as by being a
> > member of other predefined roles that perhaps the role who is a member
> > of pg_manage_database_objects doesn't have.  Such a check against
> > modifying of "superuser owned" objects implies that it's providing some
> > kind of protection against the role being able to become a superuser
> > when it doesn't actually provide that protection in any kind of reliable
> > fashion and instead ends up fooling the user.
> 
> I think you make a good point here, but it seems to me that we need
> *something*. We need a way to create a "lead tenant" role that can
> create other tenant roles and then, err, boss them around. Not only
> drop the roles again, but also drop or alter or change the owner of
> their objects, or really bypass any security those roles would like to
> assert as against the lead tenant. If we can't see a way to create
> some sort of role of that sort, then I don't think we can really say
> we've solved anything much.

Sure, but we can't just use the "superuser" flag for that, we need
something better.  The "better" in my mind here would be akin to what
we're thinking about doing for event triggers, but for roles which
actually already have a distinction between becoming a role vs. being
able to GRANT that role to another role, and that's the 'admin' option.

In other words, the user we imagine being GRANT'd this hypothetical
pg_manage_database_objects role wouldn't actually need that role to
explicitly give them access to be able to modify the objects of other
roles- it would be able to do that by virtue of just being a member of
those roles.  The roles who are allowed to modify existing role
membership should have the 'admin' right on those roles, and what we
just need is a new predefined role that's basically "allow roles to be
created or dropped" but where the only roles which can be GRANT'd by a
user with that ability are the ones that they have admin rights on, and
the only roles that they're allowed to drop they also have to have admin
rights on.

> > This is the issue with CREATEROLE and we definitely shouldn't be
> > doubling-down on that mistake, and also brings up the point that I, at
> > least, had certainly hoped that part of this effort would include a way
> > for roles to be created by a user with an appropriate predefined role,
> > and w/o CREATEROLE (which would then be deprecated or, ideally, just
> > outright removed).  I get that this doesn't have to be in the first
> > patch or even patch set going down this road but the lack of discussion
> > or of any coordination between this effort and the other one that is
> > trying to address the CREATEROLE issue seems likely to land us in a bad
> > place with two distinct approaches being used.
> 
> Is there an active effort to do something about CREATEROLE? Do you
> have a link to the thread? I feel like this is one of those things
> that has occasioned discussion over the years but I am not aware of an
> active project or a specific proposal to do something about this.

Hrmpf, I had been thinking of this:

https://www.postgresql.org/message-id/flat/c2ee39152957af339ae6f3e851aef09930dd2faf.ca...@credativ.de

registered in the CF here: https://commitfest.postgresql.org/34/2918/

though I see now that it isn't trying to explicitly deal with the
CREATEROLE bit (which I had understood from some other discussion was a
topic of interest to the author), but is definitely caught up in the
discussion about who is allowed to set what GUCs, and therefore still
seems rather related to me.

> Maybe this can be solved from the other end? Like, as opposed to
> working by exception and saying, "well, everything but superusers,"
> maybe we need to explicitly declare who is included. Like, perhaps we
> could somehow represent the fact that role A has super-powers with
> respect to roles B, C, D, and any future roles that A may create, but
> not other roles that exist on the system, or something of that sort?

Isn't this exactly what having the 'admin' option on a role is?  You're
GRANT'd that role and further are allowed to then GRANT that role to
other roles.  Being a member of that role means you're considered to
have 'ownership' level rights for all the objects that that role owns
too.

Maybe also need to have some condition around "you can only set
attributes on roles which you already have", or maybe we need to invent
'admin' options for each of the role attributes if we think it needs to

Re: Postgres perl module namespace

2021-08-23 Thread Andrew Dunstan


On 8/11/21 9:30 AM, Tom Lane wrote:
> Alvaro Herrera  writes:
>> I'll recast my vote to make this line be
>>      my $node = PgTest::Cluster->new('nodename');
>> which seems succint enough.  I could get behind PgTest::PgCluster too,
>> but having a second Pg there seems unnecessary.
> Either of those WFM.
>
>   



OK, I count 3 in favor of changing to PgTest::Cluster, 1 against,
remainder don't care.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-23 Thread Robert Haas
On Wed, Aug 18, 2021 at 9:28 AM Greg Nancarrow  wrote:
> Yes, I think I agree on that.
> I've updated the patch to restore the actual transaction snapshot in
> the IsolationUsesXactSnapshot() case, otherwise the active snapshot is
> installed as the transaction snapshot.
> I've tested the patch for the different transaction isolation levels,
> and the reported coredump (from assertion failure) is not occurring.
> (In the "serializable" case there are "could not serialize access due
> to read/write dependencies among transactions" errors, as Pavel has
> previously reported, but these occur without the patch and it appears
> to be an unrelated issue)

I think this looks pretty good. I am not sure I see any reason to
introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we
just use RestoreTransactionSnapshot() and then call
PushActiveSnapshot() from parallel.c? That seems preferable to me from
the standpoint of avoiding multiplication of APIs.

I also think that the comments should explain why we are doing this,
rather than just that we are doing this. So instead of this:

+/*
+ * If the transaction snapshot was serialized, restore it and restore the
+ * active snapshot too. Otherwise, the active snapshot is also installed as
+ * the transaction snapshot.
+ */

...perhaps something like:

If the transaction isolation level is READ COMMITTED or SERIALIZABLE,
the leader has serialized the transaction snapshot and we must restore
it. At lower isolation levels, there is no transaction-lifetime
snapshot, but we need TransactionXmin to get set to a value which is
less than or equal to the xmin of every snapshot that will be used by
this worker. The easiest way to accomplish that is to install the
active snapshot as the transaction snapshot. Code running in this
parallel worker might take new snapshots via GetTransactionSnapshot()
or GetLatestSnapshot(), but it shouldn't have any way of acquiring a
snapshot older than the active snapshot.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PROPOSAL] new diagnostic items for the dynamic sql

2021-08-23 Thread Pavel Stehule
Hi

pá 20. 8. 2021 v 10:24 odesílatel Dinesh Chemuduru 
napsal:

> On Sun, 25 Jul 2021 at 16:34, Pavel Stehule 
> wrote:
>

please, can you register this patch to commitfest app

https://commitfest.postgresql.org/34/

Regards

Pavel

>
>>
>> ne 25. 7. 2021 v 12:52 odesílatel Dinesh Chemuduru <
>> dinesh.ku...@migops.com> napsal:
>>
>>> On Sat, 17 Jul 2021 at 01:29, Pavel Stehule 
>>> wrote:
>>>
 Hi

 pá 16. 7. 2021 v 21:47 odesílatel Dinesh Chemuduru <
 dinesh.ku...@migops.com> napsal:

> Hi Everyone,
>
> We would like to propose the below 2 new plpgsql diagnostic items,
> related to parsing. Because, the current diag items are not providing
> the useful diagnostics about the dynamic SQL statements.
>
> 1. PG_PARSE_SQL_STATEMENT (returns parse failed sql statement)
> 2. PG_PARSE_SQL_STATEMENT_POSITION (returns parse failed sql text
> cursor position)
>
> Consider the below example, which is an invalid SQL statement.
>
> postgres=# SELECT 1 JOIN SELECT 2;
> ERROR:  syntax error at or near "JOIN"
> LINE 1: SELECT 1 JOIN SELECT 2;
>  ^
> Here, there is a syntax error at JOIN clause,
> and also we are getting the syntax error position(^ symbol, the
> position of JOIN clause).
> This will be helpful, while dealing with long queries.
>
> Now, if we run the same statement as a dynamic SQL(by using EXECUTE
> ),
> then it seems we are not getting the text cursor position,
> and the SQL statement which is failing at parse level.
>
> Please find the below example.
>
> postgres=# SELECT exec_me('SELECT 1 JOIN SELECT 2');
> NOTICE:  RETURNED_SQLSTATE 42601
> NOTICE:  COLUMN_NAME
> NOTICE:  CONSTRAINT_NAME
> NOTICE:  PG_DATATYPE_NAME
> NOTICE:  MESSAGE_TEXT syntax error at or near "JOIN"
> NOTICE:  TABLE_NAME
> NOTICE:  SCHEMA_NAME
> NOTICE:  PG_EXCEPTION_DETAIL
> NOTICE:  PG_EXCEPTION_HINT
> NOTICE:  PG_EXCEPTION_CONTEXT PL/pgSQL function exec_me(text) line 18
> at EXECUTE
> NOTICE:  PG_CONTEXT PL/pgSQL function exec_me(text) line 21 at GET
> STACKED DIAGNOSTICS
>  exec_me
> -
>
> (1 row)
>
> From the above results, by using all the existing diag items, we are
> unable to get the position of "JOIN" in the submitted SQL statement.
> By using these proposed diag items, we will be getting the required
> information,
> which will be helpful while running long SQL statements as dynamic SQL
> statements.
>
> Please find the below example.
>
> postgres=# SELECT exec_me('SELECT 1 JOIN SELECT 2');
> NOTICE:  PG_PARSE_SQL_STATEMENT SELECT 1 JOIN SELECT 2
> NOTICE:  PG_PARSE_SQL_STATEMENT_POSITION 10
>  exec_me
> -
>
> (1 row)
>
> From the above results, by using these diag items,
> we are able to get what is failing and it's position as well.
> This information will be much helpful to debug the issue,
> while a long running SQL statement is running as a dynamic SQL
> statement.
>
> We are attaching the patch for this proposal, and will be looking for
> your inputs.
>

 +1 It is good idea.  I am not sure if the used names are good. I propose

 PG_SQL_TEXT and PG_ERROR_LOCATION

 Regards

 Pavel


>>> Thanks Pavel,
>>>
>>> Sorry for the late reply.
>>>
>>> The proposed diag items are `PG_SQL_TEXT`, `PG_ERROR_LOCATION` are much
>>> better and generic.
>>>
>>> But, as we are only dealing with the parsing failure, I thought of
>>> adding that to the diag name.
>>>
>>
>> I understand. But parsing is only one case - and these variables can be
>> used for any case. Sure, ***we don't want*** to have PG_PARSE_SQL_TEXT,
>> PG_ANALYZE_SQL_TEXT, PG_EXECUTION_SQL_TEXT ...
>>
>> The idea is good, and you found the case, where it has benefits for
>> users. Naming is hard.
>>
>>
> Thanks for your time and suggestions Pavel.
> I updated the patch as per the suggestions, and attached it here for
> further inputs.
>
> Regards,
> Dinesh Kumar
>
>
>
>> Regards
>>
>> Pavel
>>
>>
>>> Regards,
>>> Dinesh Kumar
>>>
>>>


> Regards,
> Dinesh Kumar
>



Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Peter Geoghegan
On Mon, Aug 23, 2021 at 7:57 AM Robert Haas  wrote:
> In short, +1 from me for the original proposal of marking all GUCs as
> PGDLLIMPORT.

+1

> And, heck, +1 for marking all the other global variables
> that way, too. We're not solving any problem here. We're just annoying
> people, mostly people who are loyal community members and steady
> contributors to the project.

I'm +0.5 on this aspect -- the result might be a lot of verbosity for
no possible benefit.

I'm not sure how many global variables there are. Hopefully not that
many. Maybe making adding new global variables annoying would be a
useful disincentive -- sometimes we use global variables when it isn't
particularly natural (it's natural with GUCs, but not other things).
That might tip the scales, at least for me.

Unnecessary use of global variables are why Postgres 13 went through
several point releases before I accidentally found out that parallel
VACUUM doesn't respect cost limits -- a very simple bug concerning how
we propagate (or fail to propagate) state to parallel workers. I bet
it would have taken far longer for the bug to be discovered if it
wasn't for my Postgres 14 VACUUM refactoring work.

-- 
Peter Geoghegan




Re: very long record lines in expanded psql output

2021-08-23 Thread Platon Pronko

Hi!

Done, here's the link: https://commitfest.postgresql.org/34/3295/

Best regards,
Platon Pronko

On 2021-08-23 21:14, Andrew Dunstan wrote:


On 8/23/21 2:00 PM, Platon Pronko wrote:

Hi!

Apparently I did forget something, and that's the patch itself :)
Thanks for Justin Pryzby for pointing this out.

Attaching the patch now.




Please add it to the commitfest, the next one starts in about a week.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com






Re: Queries that should be canceled will get stuck on secure_write function

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera  wrote:
> Aren't we talking about query cancellations that occur in response to a
> standby delay limit?  Those aren't in response to user action.

Oh, you're right. But I guess a similar problem could also occur in
response to pg_terminate_backend(), no?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 2:13 PM Stephen Frost 
wrote:> This I have to object to pretty strongly- we have got to get
away from
> the idea that just because X isn't a superuser or isn't owned by a
> superuser that it's fine to allow some non-superuser to mess with it.
> In particlar, just because a role isn't explicitly marked as a superuser
> doesn't mean that the role can't *become* a superuser, or that it hasn't
> got privileged access to the system in other ways, such as by being a
> member of other predefined roles that perhaps the role who is a member
> of pg_manage_database_objects doesn't have.  Such a check against
> modifying of "superuser owned" objects implies that it's providing some
> kind of protection against the role being able to become a superuser
> when it doesn't actually provide that protection in any kind of reliable
> fashion and instead ends up fooling the user.

I think you make a good point here, but it seems to me that we need
*something*. We need a way to create a "lead tenant" role that can
create other tenant roles and then, err, boss them around. Not only
drop the roles again, but also drop or alter or change the owner of
their objects, or really bypass any security those roles would like to
assert as against the lead tenant. If we can't see a way to create
some sort of role of that sort, then I don't think we can really say
we've solved anything much.

> This is the issue with CREATEROLE and we definitely shouldn't be
> doubling-down on that mistake, and also brings up the point that I, at
> least, had certainly hoped that part of this effort would include a way
> for roles to be created by a user with an appropriate predefined role,
> and w/o CREATEROLE (which would then be deprecated or, ideally, just
> outright removed).  I get that this doesn't have to be in the first
> patch or even patch set going down this road but the lack of discussion
> or of any coordination between this effort and the other one that is
> trying to address the CREATEROLE issue seems likely to land us in a bad
> place with two distinct approaches being used.

Is there an active effort to do something about CREATEROLE? Do you
have a link to the thread? I feel like this is one of those things
that has occasioned discussion over the years but I am not aware of an
active project or a specific proposal to do something about this.

Maybe this can be solved from the other end? Like, as opposed to
working by exception and saying, "well, everything but superusers,"
maybe we need to explicitly declare who is included. Like, perhaps we
could somehow represent the fact that role A has super-powers with
respect to roles B, C, D, and any future roles that A may create, but
not other roles that exist on the system, or something of that sort?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 11:40 AM Alvaro Herrera  wrote:
> In that case, why not improve the API with functions that return the
> values in some native datatype?  For scalars with native C types (int,
> floats, Boolean etc) this is easy enough; I bet it'll solve 99% of the
> problems or more.

Sure, but ... why bother?

The entire argument rests on the presumption that there is some harm
being done by people accessing the values directly, but I don't think
that's true. And, if it were true, it seems likely that this proposed
API would have the exact same problem, because it would let people do
exactly the same thing. And, going through this proposed API would
still be significantly more expensive than just accessing the bare
variables, because you'd at least have to do some kind of lookup based
on the GUC name to find the corresponding variable. It's just a
solution in search of a problem.

Nothing bad happens when people write extensions that access GUC
variables directly. It works totally, completely fine. Except on
Windows.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: very long record lines in expanded psql output

2021-08-23 Thread Andrew Dunstan


On 8/23/21 2:00 PM, Platon Pronko wrote:
> Hi!
>
> Apparently I did forget something, and that's the patch itself :)
> Thanks for Justin Pryzby for pointing this out.
>
> Attaching the patch now.
>
>

Please add it to the commitfest, the next one starts in about a week.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-23 Thread Stephen Frost
Greetings,

* Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > On Jul 22, 2021, at 1:01 PM, Robert Haas  wrote:
> > It's awkward. I think that we can't afford to create a separate
> > predefined role for every single thing that someone could
> > theoretically want to sever, because then we'll have a zillion of them
> > and it will be unmaintainable. So the idea was to try to break up
> > everything someone might want to do either via DDL or by setting GUCs
> > into one of three categories: internal to the database
> > (pg_database_security), facing outward toward the network
> > (pg_network_security), and facing inward toward the host
> > (pg_host_security). If we didn't have any predefined security-related
> > roles already, this would still have complications, but as things
> > stand it has more, because as you point out, pg_read_server_files
> > overlaps with pg_host_security. But what do we do about that?
> 
> I gave up on the idea of splitting all superuser functions into three roles.

I can't say that I blame you for that. :)  For my 2c, at least, the ones
proposed never really felt like they were very directly tied to specific
capabilities, which I think was one of the issues with that approach.

> Patch v5-0001 refactors the guc code to allow non-superuser roles to be 
> associated with guc variables.  Any such role can then set the variable, 
> including via "alter system set".  The patch stops short of creating any new 
> roles or assigning any roles to any guc variable.

Haven't looked at the patch yet but this does generally seem like an
interesting approach.

> Patches v5-0002 through v5-0005 create four new roles for managing host 
> resource settings, vacuum settings, autovacuum settings, and logging 
> settings.  That last one excludes "where to log" settings, because we don't 
> want the role to be able to write to arbitrary locations on the server.  
> Remaining guc variables not in these four categories continue to belong to 
> the superuser.

We do have a role today who is allowed to write to arbitrary locations
on the server, so I wonder if for those log settings we'd include a
requirement for the user to have both of those roles instead..?

> Patches v5-0006 and v5-0007 allow non-superusers to own event triggers, and 
> limit the event triggers to only running for events triggered by roles that 
> the event trigger owner belongs to.  This is backward compatible, because 
> event triggers have historically belonged only to superusers, and superusers 
> have implicit membership in all groups.

While I generally agree that this doesn't end up opening up security
issues, it's going to certainly be a change in how event triggers work
as they'll no longer *always* fire, and that seems quite at odds with
how triggers are generally thought of.  So much so that I worry about
mis-use due to this.  Then again, if we're going to go down this route
at all, I can't think of any particular way to avoid the security issues
of running the trigger for everyone when it's owned by a non-superuser.

> Patches v5-0008 through v5-0010 allow non-superusers to own subscriptions 
> while restricting the tablesync and apply workers to only work on tables that 
> the subscription owner has permissions on.  This is almost backward 
> compatible, because subscriptions have historically belonged only to 
> superusers, as above, except for unlikely scenarios where superusers have 
> given ownership to non-superusers.  In those cases, the new code will refuse 
> to apply in situations where the old code would blindly apply changes.  Does 
> anybody see a problem with this?

This doesn't particularly bother me, at least.

> Patch v5-0011 is a bug fix posted elsewhere that hasn't been committed yet 
> but which must be committed in preparation for v5-0012.

No idea what it is as I hadn't looked yet, but if it's a bug fix then
shouldn't it be separated and back-patched..?

> Patch v5-0012 creates a new role, pg_manage_database_objects, which can do 
> anything with an object that the owner could do with it, as long as the owner 
> is not a superuser.  This role is intended as a "tenant" role, and is in some 
> sense a less powerful replacement for the pg_database_security role 
> previously proposed.

This I have to object to pretty strongly- we have got to get away from
the idea that just because X isn't a superuser or isn't owned by a
superuser that it's fine to allow some non-superuser to mess with it.
In particlar, just because a role isn't explicitly marked as a superuser
doesn't mean that the role can't *become* a superuser, or that it hasn't
got privileged access to the system in other ways, such as by being a
member of other predefined roles that perhaps the role who is a member
of pg_manage_database_objects doesn't have.  Such a check against
modifying of "superuser owned" objects implies that it's providing some
kind of protection against the role being able to become a superuser
when it doesn't actually 

Re: pretty slow merge-join due rescan?

2021-08-23 Thread Pavel Stehule
po 23. 8. 2021 v 18:44 odesílatel Pavel Stehule 
napsal:

> Hi
>
> The customer reports a very slow query. I have a reproducer script. The
> dataset is not too big
>
> postgres=# \dt+
>  List of relations
> ┌┬───┬───┬───┬─┬┬─┐
> │ Schema │ Name  │ Type  │ Owner │ Persistence │Size│ Description │
> ╞╪═══╪═══╪═══╪═╪╪═╡
> │ public │ f_dep │ table │ pavel │ permanent   │ 8192 bytes │ │
> │ public │ f_emp │ table │ pavel │ permanent   │ 1001 MB│ │
> │ public │ f_fin │ table │ pavel │ permanent   │ 432 kB │ │
> │ public │ qt│ table │ pavel │ permanent   │ 1976 kB│ │
> │ public │ qtd   │ table │ pavel │ permanent   │ 87 MB  │ │
> └┴───┴───┴───┴─┴┴─┘
> (5 rows)
>
> and the query is not too complex
>
> SELECT
>  sub.a_121,
>  count(*)
> FROM (
> SELECT
>   f_fin.dt_business_day_id AS a_1056,
>   f_dep.description_id AS a_121,
>   f_emp.employee_id_id AS a_1327
> FROM  f_emp
> INNER JOIN f_dep ON
> ( f_emp.department_id_id = f_dep.id )
> INNER JOIN f_fin ON
> ( f_emp.business_day_date_id = f_fin.id )
> GROUP BY 1, 2, 3
>   ) AS sub
> INNER JOIN qt ON
> ( sub.a_1056 = qt.tt_1056_1056_b )
> LEFT OUTER JOIN qtd AS qt_2 ON
> ( ( qt.tt_1056_1056_b = qt_2.a_1056 )
> AND ( sub.a_121 = qt_2.a_121 )
> AND ( sub.a_1327 = qt_2.a_1327 ) )
> LEFT OUTER JOIN qtd AS qt_3 ON
> ( ( qt.tt_1056_1056_a = qt_3.a_1056 )
> AND ( sub.a_121 = qt_3.a_121 )
> AND ( sub.a_1327 = qt_3.a_1327 ) )
> GROUP BY 1;
>
> By default I get a good plan, and the performance is ok
> https://explain.depesz.com/s/Mr2H (about 16 sec). Unfortunately, when I
> increase work_mem, I get good plan with good performance
> https://explain.depesz.com/s/u4Ff
>
> But this depends on index only scan. In the production environment, the
> index only scan is not always available, and I see another plan (I can get
> this plan with disabled index only scan).
>
> Although the cost is almost the same, the query is about 15x slower
> https://explain.depesz.com/s/L6zP
>
> │ HashAggregate  (cost=1556129.74..1556131.74 rows=200 width=12) (actual
> time=269948.878..269948.897 rows=64 loops=1)
> │
> │   Group Key: f_dep.description_id
>
>   │
> │   Batches: 1  Memory Usage: 40kB
>
>  │
> │   Buffers: shared hit=5612 read=145051
>
>  │
> │   ->  Merge Left Join  (cost=1267976.96..1534602.18 rows=4305512
> width=4) (actual time=13699.847..268785.500 rows=4291151 loops=1)
>  │
> │ Merge Cond: ((f_emp.employee_id_id = qt_3.a_1327) AND
> (f_dep.description_id = qt_3.a_121))
>  │
> │ Join Filter: (qt.tt_1056_1056_a = qt_3.a_1056)
>
>  │
> │ Rows Removed by Join Filter: 1203659495
>
>   │
> │ Buffers: shared hit=5612 read=145051
>
>  │
>
> .
>
> │
> │ ->  Sort  (cost=209977.63..214349.77 rows=1748859 width=12) (actual
> time=979.522..81842.913 rows=1205261892 loops=1)
>  │
> │   Sort Key: qt_3.a_1327, qt_3.a_121
>
>   │
> │   Sort Method: quicksort  Memory: 144793kB
>
>  │
> │   Buffers: shared hit=2432 read=8718
>
>  │
> │   ->  Seq Scan on qtd qt_3  (cost=0.00..28638.59
> rows=1748859 width=12) (actual time=0.031..284.437 rows=1748859 loops=1)
> │ Buffers: shared hit=2432 read=8718
>
> The sort of qtd table is very fast
>
> postgres=# explain analyze select * from qtd order by a_1327, a_121;
>
> ┌──┐
> │  QUERY PLAN
>  │
>
> ╞══╡
> │ Sort  (cost=209977.63..214349.77 rows=1748859 width=27) (actual
> time=863.923...213 rows=1748859 loops=1) │
> │   Sort Key: a_1327, a_121
>  │
> │   Sort Method: quicksort  Memory: 199444kB
> │
> │   ->  Seq Scan on qtd  (cost=0.00..28638.59 rows=1748859 width=27)
> (actual time=0.035..169.385 rows=1748859 loops=1) │
> │ Planning Time: 0.473 ms
>  │
> │ Execution Time: 1226.305 ms
>  │
>
> 

Re: very long record lines in expanded psql output

2021-08-23 Thread Platon Pronko

Hi!

Apparently I did forget something, and that's the patch itself :)
Thanks for Justin Pryzby for pointing this out.

Attaching the patch now.

Best regards,
Platon Pronko

On 2021-08-23 20:51, Platon Pronko wrote:

Hi!

Please find attached the patch implementing the proposed changes.
I hope I didn't forget anything (docs, tab completion, comments, etc),
if I did forget something please tell me and I'll fix it.

Best regards,
Platon Pronko

On 2021-08-08 01:18, Andrew Dunstan wrote:


On 8/7/21 10:56 AM, Platon Pronko wrote:

Hi!


I also find this annoying and would be happy to be rid of it.


Have you tried "\pset format wrapped"? Pavel suggested it, and it
solved most of the problem for me, for example.


Yes, but it changes the data line output. Ideally, you should be able
to  modify these independently.


I agree, and I think this can be implemented, but I'm a bit afraid of
introducing an additional psql option (there's already quite a lot of
them).
I suspect primary PostgreSQL maintainers won't be happy with such an
approach.




I think I qualify as one of those ... :-)


Sorry, I'm new here, don't know who's who :)



No problem. Welcome! We're always very glad to see new contributors.




I'll start working on a new patch then. A couple questions about
specifics:

1. Can we add "expanded" in the option name, like
"xheader_expanded_width"?
I think adjusting the header row width doesn't make sense on any other
modes,
and placing that in the option name makes intent a bit clearer.




"xheader" was meant to be shorthand for "expanded output header"




2. What was "column" option in your original suggestion supposed to do?
("\pset xheader_width column|page|nnn")



It's meant to say don't print anything past the column spec, e.g.:


-[ RECORD 1 ]+
n    | 42
long_column_name | xx
-[ RECORD 2 ]+
n    | 210
long_column_name | 
xx




3. Should we bother with using this option when in "\pset border 2" mode?
I can do it for consistency, but it will still look bad.




Probably not, but since I never use it I'll let others who do weigh in
on the subject.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index fcab5c0d51..8cffc18165 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2799,6 +2799,32 @@ lo_import 152801
   
   
 
+  
+  xheader_width
+  
+  
+  Sets expanded header width to one of full,
+  column,
+  page,
+  or integer value.
+  
+
+  full is the default option - expanded header
+  is not truncated.
+  
+
+  column: don't print header line past the first
+  column.
+  
+
+  page: fit header line to terminal width.
+  
+
+  integer value: specify exact width of header line.
+  
+  
+  
+
   
   fieldsep
   
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 49d4c0e3ce..440b732d3c 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -4310,6 +4310,28 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
 			popt->topt.expanded = !popt->topt.expanded;
 	}
 
+	/* header line width in expanded mode */
+	else if (strcmp(param, "xheader_width") == 0)
+	{
+		if (value && pg_strcasecmp(value, "full") == 0)
+			popt->topt.expanded_header_width_type = PRINT_XHEADER_FULL;
+		else if (value && pg_strcasecmp(value, "column") == 0)
+			popt->topt.expanded_header_width_type = PRINT_XHEADER_COLUMN;
+		else if (value && pg_strcasecmp(value, "page") == 0)
+			popt->topt.expanded_header_width_type = PRINT_XHEADER_PAGE;
+		else if (value) {
+			popt->topt.expanded_header_width_type = PRINT_XHEADER_EXACT_WIDTH;
+			popt->topt.expanded_header_exact_width = atoi(value);
+			if (popt->topt.expanded_header_exact_width == 0) {
+pg_log_error("\\pset: allowed xheader_width values are full (default), column, page, or an number specifying exact width.");
+return false;
+			}
+		} else {
+			/* reset to default if value is empty */
+			popt->topt.expanded_header_width_type = PRINT_XHEADER_FULL;
+		}
+	}
+
 	/* field separator for CSV format */
 	else if (strcmp(param, "csv_fieldsep") == 0)
 	{
@@ -4502,6 +4524,19 @@ printPsetInfo(const char *param, printQueryOpt *popt)
 			printf(_("Expanded display is off.\n"));
 	}
 
+	/* show xheader width type */
+	else if (strcmp(param, "xheader_width") == 0)
+	{
+		if (popt->topt.expanded_header_width_type == PRINT_XHEADER_FULL)
+			printf(_("Expanded header width is 'full'.\n"));
+		else if 

Re: very long record lines in expanded psql output

2021-08-23 Thread Platon Pronko

Hi!

Please find attached the patch implementing the proposed changes.
I hope I didn't forget anything (docs, tab completion, comments, etc),
if I did forget something please tell me and I'll fix it.

Best regards,
Platon Pronko

On 2021-08-08 01:18, Andrew Dunstan wrote:


On 8/7/21 10:56 AM, Platon Pronko wrote:

Hi!


I also find this annoying and would be happy to be rid of it.


Have you tried "\pset format wrapped"? Pavel suggested it, and it
solved most of the problem for me, for example.


Yes, but it changes the data line output. Ideally, you should be able
to  modify these independently.


I agree, and I think this can be implemented, but I'm a bit afraid of
introducing an additional psql option (there's already quite a lot of
them).
I suspect primary PostgreSQL maintainers won't be happy with such an
approach.




I think I qualify as one of those ... :-)


Sorry, I'm new here, don't know who's who :)



No problem. Welcome! We're always very glad to see new contributors.




I'll start working on a new patch then. A couple questions about
specifics:

1. Can we add "expanded" in the option name, like
"xheader_expanded_width"?
I think adjusting the header row width doesn't make sense on any other
modes,
and placing that in the option name makes intent a bit clearer.




"xheader" was meant to be shorthand for "expanded output header"




2. What was "column" option in your original suggestion supposed to do?
("\pset xheader_width column|page|nnn")



It's meant to say don't print anything past the column spec, e.g.:


-[ RECORD 1 ]+
n| 42
long_column_name | xx
-[ RECORD 2 ]+
n| 210
long_column_name | 
xx




3. Should we bother with using this option when in "\pset border 2" mode?
I can do it for consistency, but it will still look bad.




Probably not, but since I never use it I'll let others who do weigh in
on the subject.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com






Re: .ready and .done files considered harmful

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 11:50 AM Bossart, Nathan  wrote:
> To handle a "cheating" archive command, I'd probably need to add a
> stat() for every time pgarch_readyXLog() returned something from
> arch_files.  I suspect something similar might be needed in Dipesh's
> patch to handle backup history files and partial WAL files.

I think he's effectively got that already, although it's probably
inside of pgarch_readyXLog(). The idea there is that instead of having
a cache of files to be returned (as in your case) he just checks
whether the next file in sequence happens to be present and if so
returns that file name. To see whether it's present, he uses stat().

> In any case, I think Dipesh's patch is the way to go.  It obviously
> will perform better in the extreme cases discussed in this thread.  I
> think it's important to make sure the patch doesn't potentially leave
> files behind to be picked up by a directory scan that might not
> happen, but there are likely ways to handle that.  In the worst case,
> perhaps we need to force a directory scan every N files to make sure
> nothing gets left behind.  But maybe we can do better.

It seems to me that we can handle that by just having the startup
process notify the archiver every time some file is ready for
archiving that's not the next one in the sequence. We have to make
sure we cover all the relevant code paths, but that seems like it
should be doable, and we have to decide on the synchronization
details, but that also seems pretty manageable, even if we haven't
totally got it sorted yet. The thing is, as soon as you go back to
forcing a directory scan every N files, you've made it formally O(N^2)
again, which might not matter in practice if the constant factor is
low enough, but I don't think it will be. Either you force the scans
every, say, 1000 files, in which case it's going to make the whole
mechanism a lot less effective in terms of getting out from under
problem cases -- or you force scans every, say, 100 files, in
which case it's not really going to cause any missed files to get
archived soon enough to make anyone happy. I doubt there is really a
happy medium in there.

I suppose the two approaches could be combined, too - remember the
first N files you think you'll encounter and then after that try
successive filenames until one is missing. That would be more
resilient against O(N^2) behavior in the face of frequent gaps. But it
might also be more engineering than is strictly required.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: archive status ".ready" files may be created too early

2021-08-23 Thread Bossart, Nathan
On 8/23/21, 10:31 AM, "alvhe...@alvh.no-ip.org"  wrote:
> On 2021-Aug-23, alvhe...@alvh.no-ip.org wrote:
>
>> The only way .ready files are created is that XLogNotifyWrite() is
>> called.  For regular WAL files during regular operation, that only
>> happens in XLogNotifyWriteSeg().  That, in turn, only happens in
>> NotifySegmentsReadyForArchive().  But if the system runs and never
>> writes WAL records that cross WAL boundaries, that function will see
>> that both earliestSegBoundary and latestSegBoundary are MaxXLogSegno,
>> and return without doing anything.  So no segments will be notified.
>
> Nevermind -- I realized that all segments get registered, not just those
> for which we generate continuation records.

Ah, okay.  BTW the other changes you mentioned made sense to me.

Nathan



Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, alvhe...@alvh.no-ip.org wrote:

> The only way .ready files are created is that XLogNotifyWrite() is
> called.  For regular WAL files during regular operation, that only
> happens in XLogNotifyWriteSeg().  That, in turn, only happens in
> NotifySegmentsReadyForArchive().  But if the system runs and never
> writes WAL records that cross WAL boundaries, that function will see
> that both earliestSegBoundary and latestSegBoundary are MaxXLogSegno,
> and return without doing anything.  So no segments will be notified.

Nevermind -- I realized that all segments get registered, not just those
for which we generate continuation records.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"




Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote:

> Sorry, I'm still not following this one.  If we skipped creating
> .ready segments due to a crash, we rely on RemoveOldXlogFiles() to
> create them as needed in the end-of-recovery checkpoint.  If a record
> fits perfectly in the end of a segment, we'll still register it as a
> boundary for the next segment (hence why we use XLByteToSeg() instead
> of XLByteToPrevSeg()).  If database activity stops completely, there
> shouldn't be anything to mark ready.

The only way .ready files are created is that XLogNotifyWrite() is
called.  For regular WAL files during regular operation, that only
happens in XLogNotifyWriteSeg().  That, in turn, only happens in
NotifySegmentsReadyForArchive().  But if the system runs and never
writes WAL records that cross WAL boundaries, that function will see
that both earliestSegBoundary and latestSegBoundary are MaxXLogSegno,
and return without doing anything.  So no segments will be notified.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: archive status ".ready" files may be created too early

2021-08-23 Thread Bossart, Nathan
On 8/23/21, 9:33 AM, "alvhe...@alvh.no-ip.org"  wrote:
> On 2021-Aug-23, Bossart, Nathan wrote:
>
>> Hm.  My expectation would be that if there are no registrations, we
>> cannot create .ready files for the flushed segments.  The scenario
>> where I can see that happening is when a record gets flushed to disk
>> prior to registration.  In that case, we'll still eventually register
>> the record and wake up the WAL writer process, which will take care of
>> creating the .ready files that were missed earlier.  Is there another
>> case you are thinking of where we could miss registration for a cross-
>> segment record altogether?
>
> I'm thinking of the case where no record cross segment boundaries ever.

Sorry, I'm still not following this one.  If we skipped creating
.ready segments due to a crash, we rely on RemoveOldXlogFiles() to
create them as needed in the end-of-recovery checkpoint.  If a record
fits perfectly in the end of a segment, we'll still register it as a
boundary for the next segment (hence why we use XLByteToSeg() instead
of XLByteToPrevSeg()).  If database activity stops completely, there
shouldn't be anything to mark ready.

Nathan



pretty slow merge-join due rescan?

2021-08-23 Thread Pavel Stehule
Hi

The customer reports a very slow query. I have a reproducer script. The
dataset is not too big

postgres=# \dt+
 List of relations
┌┬───┬───┬───┬─┬┬─┐
│ Schema │ Name  │ Type  │ Owner │ Persistence │Size│ Description │
╞╪═══╪═══╪═══╪═╪╪═╡
│ public │ f_dep │ table │ pavel │ permanent   │ 8192 bytes │ │
│ public │ f_emp │ table │ pavel │ permanent   │ 1001 MB│ │
│ public │ f_fin │ table │ pavel │ permanent   │ 432 kB │ │
│ public │ qt│ table │ pavel │ permanent   │ 1976 kB│ │
│ public │ qtd   │ table │ pavel │ permanent   │ 87 MB  │ │
└┴───┴───┴───┴─┴┴─┘
(5 rows)

and the query is not too complex

SELECT
 sub.a_121,
 count(*)
FROM (
SELECT
  f_fin.dt_business_day_id AS a_1056,
  f_dep.description_id AS a_121,
  f_emp.employee_id_id AS a_1327
FROM  f_emp
INNER JOIN f_dep ON
( f_emp.department_id_id = f_dep.id )
INNER JOIN f_fin ON
( f_emp.business_day_date_id = f_fin.id )
GROUP BY 1, 2, 3
  ) AS sub
INNER JOIN qt ON
( sub.a_1056 = qt.tt_1056_1056_b )
LEFT OUTER JOIN qtd AS qt_2 ON
( ( qt.tt_1056_1056_b = qt_2.a_1056 )
AND ( sub.a_121 = qt_2.a_121 )
AND ( sub.a_1327 = qt_2.a_1327 ) )
LEFT OUTER JOIN qtd AS qt_3 ON
( ( qt.tt_1056_1056_a = qt_3.a_1056 )
AND ( sub.a_121 = qt_3.a_121 )
AND ( sub.a_1327 = qt_3.a_1327 ) )
GROUP BY 1;

By default I get a good plan, and the performance is ok
https://explain.depesz.com/s/Mr2H (about 16 sec). Unfortunately, when I
increase work_mem, I get good plan with good performance
https://explain.depesz.com/s/u4Ff

But this depends on index only scan. In the production environment, the
index only scan is not always available, and I see another plan (I can get
this plan with disabled index only scan).

Although the cost is almost the same, the query is about 15x slower
https://explain.depesz.com/s/L6zP

│ HashAggregate  (cost=1556129.74..1556131.74 rows=200 width=12) (actual
time=269948.878..269948.897 rows=64 loops=1)
│
│   Group Key: f_dep.description_id

│
│   Batches: 1  Memory Usage: 40kB

 │
│   Buffers: shared hit=5612 read=145051

 │
│   ->  Merge Left Join  (cost=1267976.96..1534602.18 rows=4305512 width=4)
(actual time=13699.847..268785.500 rows=4291151 loops=1)
 │
│ Merge Cond: ((f_emp.employee_id_id = qt_3.a_1327) AND
(f_dep.description_id = qt_3.a_121))
 │
│ Join Filter: (qt.tt_1056_1056_a = qt_3.a_1056)

 │
│ Rows Removed by Join Filter: 1203659495

│
│ Buffers: shared hit=5612 read=145051

 │

.

  │
│ ->  Sort  (cost=209977.63..214349.77 rows=1748859 width=12) (actual
time=979.522..81842.913 rows=1205261892 loops=1)
   │
│   Sort Key: qt_3.a_1327, qt_3.a_121

│
│   Sort Method: quicksort  Memory: 144793kB

 │
│   Buffers: shared hit=2432 read=8718

 │
│   ->  Seq Scan on qtd qt_3  (cost=0.00..28638.59 rows=1748859
width=12) (actual time=0.031..284.437 rows=1748859 loops=1)
│ Buffers: shared hit=2432 read=8718

The sort of qtd table is very fast

postgres=# explain analyze select * from qtd order by a_1327, a_121;
┌──┐
│  QUERY PLAN
   │
╞══╡
│ Sort  (cost=209977.63..214349.77 rows=1748859 width=27) (actual
time=863.923...213 rows=1748859 loops=1) │
│   Sort Key: a_1327, a_121
   │
│   Sort Method: quicksort  Memory: 199444kB
│
│   ->  Seq Scan on qtd  (cost=0.00..28638.59 rows=1748859 width=27)
(actual time=0.035..169.385 rows=1748859 loops=1) │
│ Planning Time: 0.473 ms
   │
│ Execution Time: 1226.305 ms
   │
└──┘
(6 rows)

but here it returns 700x lines more and it is 70 x slower. Probably it is
because something does rescan. But why? With index only scan, I don't see
any indices of rescan.

Is it an executor or optimizer bug? Or is it a bug? I tested this behaviour
on Postgres 13 and on the fresh 

Re: [Patch] change the default value of shared_buffers in postgresql.conf.sample

2021-08-23 Thread Bruce Momjian
On Wed, Aug 18, 2021 at 08:27:19PM +0200, Magnus Hagander wrote:
> On Wed, Aug 18, 2021 at 8:16 PM Bruce Momjian  wrote:
> >
> > On Wed, Aug 18, 2021 at 02:03:56PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > I think the only question is whether this is a PG 15-only patch or a
> > > > patckpatch to PG 10;  I am in favor of the later.
> > >
> > > I think you need a lot stronger argument that this is a bug
> > > before you consider back-patching user-visible behavioral
> > > changes.
> >
> > I think the only logic to backpatching it is your statement that this is
> > cosmetic, and the new cosmetic appearance is more accurate.  However, if
> > you don't feel we need to backpatch, that is fine with me --- we have
> > gotten very few complaints about this.
> 
> +1 for making the change ,and +1 for making it in master only, no backpatch.

Patch applied to master.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote:

> Hm.  My expectation would be that if there are no registrations, we
> cannot create .ready files for the flushed segments.  The scenario
> where I can see that happening is when a record gets flushed to disk
> prior to registration.  In that case, we'll still eventually register
> the record and wake up the WAL writer process, which will take care of
> creating the .ready files that were missed earlier.  Is there another
> case you are thinking of where we could miss registration for a cross-
> segment record altogether?

I'm thinking of the case where no record cross segment boundaries ever.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: archive status ".ready" files may be created too early

2021-08-23 Thread Bossart, Nathan
On 8/23/21, 8:50 AM, "alvhe...@alvh.no-ip.org"  wrote:
> ... while reading the resulting code after backpatching to all branches,
> I realized that if there are no registrations whatsoever, then archiving
> won't do anything, which surely is the wrong thing to do.  The correct
> behavior should be "if there are no registrations, then *all* flushed
> segments can be notified".

Hm.  My expectation would be that if there are no registrations, we
cannot create .ready files for the flushed segments.  The scenario
where I can see that happening is when a record gets flushed to disk
prior to registration.  In that case, we'll still eventually register
the record and wake up the WAL writer process, which will take care of
creating the .ready files that were missed earlier.  Is there another
case you are thinking of where we could miss registration for a cross-
segment record altogether?

Nathan



Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Chapman Flack
On 08/23/21 10:57, Chapman Flack wrote:
> Maybe those cases aren't very numerous ... and maybe they're distinctive
> enough to recognize when creating one ("hmm, I am creating a check or
> assign hook that does significant work here, will it be worth exposing
> a getter API for the product of the work?").

How about a generic GetTypedConfigValue(char const *name, void *into) ?

By default the type of *into would correspond to the type of the GUC
(int for an enum) and would be returned directly from *conf->variable
by a getter hook installed by default when the GUC is defined. But the
definition could also supply a getter hook that would store a value of
a different type, or computed a different way, for a GUC where that's
appropriate.

The function return could be boolean, true if such a variable exists
and isn't a placeholder.

Pro: provides read access to the typed value of any GUC, without exposing
it to overwriting. Con: has to bsearch the GUCs every time the value
is wanted.

What I'd really like:

ObserveTypedConfigValue(char const *name, void *into, int *flags, int flag)

This would register an observer of the named GUC: whenever its typed value
changes, it is written at *into and flag is ORed into *flags.

This is more restrictive than allowing arbitrary code to be hooked into
GUC changes (as changes can happen at delicate times such as error
recovery), but allows an extension to always have the current typed
value available and to cheaply know when it has changed. Observers would
be updated in the normal course of processing a GUC change, eliminating
the bsearch-per-lookup cost of the first approach.

Thinkable?

Regards,
-Chap




Re: .ready and .done files considered harmful

2021-08-23 Thread Bossart, Nathan


On 8/23/21, 6:42 AM, "Robert Haas"  wrote:
> On Sun, Aug 22, 2021 at 10:31 PM Bossart, Nathan  wrote:
>> I ran this again on a bigger machine with 200K WAL files pending
>> archive.  The v9 patch took ~5.5 minutes, the patch I sent took ~8
>> minutes, and the existing logic took just under 3 hours.
>
> Hmm. On the one hand, 8 minutes > 5.5 minutes, and presumably the gap
> would only get wider if the number of files were larger or if reading
> the directory were slower. I am pretty sure that reading the directory
> must be much slower in some real deployments where this problem has
> come up. On the other hand, 8.8 minutes << 3 hours, and your patch
> would win if somehow we had a ton of gaps in the sequence of files.
> I'm not sure how likely that is to be the cause - probably not very
> likely at all if you aren't using an archive command that cheats, but
> maybe really common if you are. Hmm, but I think if the
> archive_command cheats by marking a bunch of files done when it is
> tasked with archiving just one, your patch will break, because, unless
> I'm missing something, it doesn't re-evaluate whether things have
> changed on every pass through the loop as Dipesh's patch does. So I
> guess I'm not quite sure I understand why you think this might be the
> way to go?

To handle a "cheating" archive command, I'd probably need to add a
stat() for every time pgarch_readyXLog() returned something from
arch_files.  I suspect something similar might be needed in Dipesh's
patch to handle backup history files and partial WAL files.

In any case, I think Dipesh's patch is the way to go.  It obviously
will perform better in the extreme cases discussed in this thread.  I
think it's important to make sure the patch doesn't potentially leave
files behind to be picked up by a directory scan that might not
happen, but there are likely ways to handle that.  In the worst case,
perhaps we need to force a directory scan every N files to make sure
nothing gets left behind.  But maybe we can do better.

> Maintaining the binary heap in lowest-priority-first order is very
> clever, and the patch does look quite elegant. I'm just not sure I
> understand the point.

This was mostly an exploratory exercise to get some numbers for the
different approaches discussed in this thread.

Nathan



Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-21, Bossart, Nathan wrote:

> > Well, the thing I realized is that these three helper functions have
> > exactly one caller each.  I think the compiler is going to inline them,
> > so there isn't going to be a function call in the assembly.  I haven't
> > verified this, though.
> 
> Good point.  It looks like they're getting inlined for me.

I still didn't like it, because it looks like we're creating an API for
which there can be only one caller.  So I expanded the functions in the
caller.  It doesn't look too bad.  However ...

... while reading the resulting code after backpatching to all branches,
I realized that if there are no registrations whatsoever, then archiving
won't do anything, which surely is the wrong thing to do.  The correct
behavior should be "if there are no registrations, then *all* flushed
segments can be notified".

I'll fix that ...

Another thing I didn't like is that you used a name ending in RecPtr for
the LSN, which gives no indication that it really is the *end* LSN, not
the start pointer.  And it won't play nice with the need to add the
*start* LSN which we'll need to implement solving the equivalent problem
for streaming replication.  I'll rename those to
earliestSegBoundaryEndPtr and latestSegBoundaryEndPtr.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.
>From 6e4c3fd6f5687ec5762de121344ce35c1c890812 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 23 Aug 2021 09:06:09 -0400
Subject: [PATCH v15] Avoid creating archive status ".ready" files too early

WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files.  Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it.  If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.

To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.

This has always been wrong, so backpatch all the way back.

Author: Nathan Bossart 
Reviewed-by: Kyotaro Horiguchi 
Reviewed-by: Ryo Matsumura 
Reviewed-by: Andrey Borodin 
Discussion: https://postgr.es/m/cbddfa01-6e40-46bb-9f98-9340f4379...@amazon.com
---
 src/backend/access/transam/timeline.c|   2 +-
 src/backend/access/transam/xlog.c| 220 +--
 src/backend/access/transam/xlogarchive.c |  17 +-
 src/backend/postmaster/walwriter.c   |   7 +
 src/backend/replication/walreceiver.c|   6 +-
 src/include/access/xlog.h|   1 +
 src/include/access/xlogarchive.h |   4 +-
 src/include/access/xlogdefs.h|   1 +
 8 files changed, 234 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index 8d0903c175..acd5c2431d 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -452,7 +452,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	if (XLogArchivingActive())
 	{
 		TLHistoryFileName(histfname, newTLI);
-		XLogArchiveNotify(histfname);
+		XLogArchiveNotify(histfname, true);
 	}
 }
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749d..8e7c3a364a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -724,6 +724,18 @@ typedef struct XLogCtlData
 	XLogRecPtr	lastFpwDisableRecPtr;
 
 	slock_t		info_lck;		/* locks shared variables shown above */
+
+	/*
+	 * Variables used to track cross-segment records.  Protected by
+	 * segtrack_lck.
+	 */
+	XLogSegNo	lastNotifiedSeg;
+	XLogSegNo	earliestSegBoundary;
+	XLogRecPtr	earliestSegBoundaryRecPtr;
+	XLogSegNo	latestSegBoundary;
+	XLogRecPtr	latestSegBoundaryRecPtr;
+
+	slock_t		segtrack_lck;		/* locks shared variables shown above */
 } XLogCtlData;
 
 static XLogCtlData *XLogCtl = NULL;
@@ -920,6 +932,7 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo,
 		   XLogSegNo *endlogSegNo);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
+static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos);
 static void CleanupBackupHistory(void);
 static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
@@ -1154,23 +1167,56 @@ XLogInsertRecord(XLogRecData *rdata,
 	END_CRIT_SECTION();
 
 	/*
-	 * Update shared LogwrtRqst.Write, if we crossed page boundary.
+	 * If we crossed page boundary, update LogwrtRqst.Write; if we crossed
+	 * 

Regexp: observable bug from careless usage of zaptreesubs

2021-08-23 Thread Tom Lane
While looking at the regexp code, I started to get uncomfortable
about the implications of commit 0c3405cf1: it means that not
only the cdissect() phase, but also the preceding DFA-check phase
(longest()/shortest()) rely on saved subexpression match positions
to be valid for the match we're currently considering.  It seemed
to me that the cdissect() recursion wasn't being careful to reset
the match pointers for an abandoned submatch before moving on to
the next attempt, meaning that dfa_backref() could conceivably get
applied using a stale match pointer.

Upon poking into it, I failed to find any bug of that exact ilk,
but what I did find was a pre-existing bug of not resetting an
abandoned match pointer at all.  That allows these fun things:

regression=# select 'abcdef' ~ '^(.)\1|\1.';
 ?column? 
--
 t
(1 row)

regression=# select 'abadef' ~ '^((.)\2|..)\2';
 ?column? 
--
 t
(1 row)

In both of these examples, the (.) capture is in an alternation
branch that subsequently fails; therefore, the later backref
should never be able to match.  But it does, because we forgot
to clear the capture's match data on the way out.

It turns out that this can be fixed using fewer, not more, zaptreesubs
calls, if we are careful to define the recursion rules precisely.
See attached.

This bug is ancient.  I verified it as far back as PG 7.4, and
it can also be reproduced in Tcl, so it's likely aboriginal to
Spencer's library.  It's not that surprising that no one has
reported it, because regexps that have this sort of possibly-invalid
backref are most likely incorrect.  In most use-cases the existing
code will fail to match, as expected, so users would probably notice
that and fix their regexps without observing that there are cases
where the match incorrectly succeeds.

regards, tom lane

diff --git a/src/backend/regex/regexec.c b/src/backend/regex/regexec.c
index aa5ba85514..2411e6561d 100644
--- a/src/backend/regex/regexec.c
+++ b/src/backend/regex/regexec.c
@@ -216,14 +216,14 @@ pg_regexec(regex_t *re,
 	if (backref && nmatch <= v->g->nsub)
 	{
 		/* need larger work area */
-		if (v->g->nsub + 1 <= LOCALMAT)
+		v->nmatch = v->g->nsub + 1;
+		if (v->nmatch <= LOCALMAT)
 			v->pmatch = mat;
 		else
-			v->pmatch = (regmatch_t *) MALLOC((v->g->nsub + 1) *
-			  sizeof(regmatch_t));
+			v->pmatch = (regmatch_t *) MALLOC(v->nmatch * sizeof(regmatch_t));
 		if (v->pmatch == NULL)
 			return REG_ESPACE;
-		v->nmatch = v->g->nsub + 1;
+		zapallsubs(v->pmatch, v->nmatch);
 	}
 	else
 	{
@@ -488,7 +488,6 @@ find(struct vars *v,
 		return REG_OKAY;
 
 	/* find submatches */
-	zapallsubs(v->pmatch, v->nmatch);
 	return cdissect(v, v->g->tree, begin, end);
 }
 
@@ -599,7 +598,6 @@ cfindloop(struct vars *v,
 	break;		/* no match with this begin point, try next */
 MDEBUG(("tentative end %ld\n", LOFF(end)));
 /* Dissect the potential match to see if it really matches */
-zapallsubs(v->pmatch, v->nmatch);
 er = cdissect(v, v->g->tree, begin, end);
 if (er == REG_OKAY)
 {
@@ -647,6 +645,8 @@ cfindloop(struct vars *v,
 
 /*
  * zapallsubs - initialize all subexpression matches to "no match"
+ *
+ * Note that p[0], the overall-match location, is not touched.
  */
 static void
 zapallsubs(regmatch_t *p,
@@ -716,8 +716,30 @@ subset(struct vars *v,
  * DFA and found that the proposed substring satisfies the DFA.  (We make
  * the caller do that because in concatenation and iteration nodes, it's
  * much faster to check all the substrings against the child DFAs before we
- * recurse.)  Also, caller must have cleared subexpression match data via
- * zaptreesubs (or zapallsubs at the top level).
+ * recurse.)
+ *
+ * A side-effect of a successful match is to save match locations for
+ * capturing subexpressions in v->pmatch[].  This is a little bit tricky,
+ * so we make the following rules:
+ * 1. Before initial entry to cdissect, all match data must have been
+ *cleared (this is seen to by zapallsubs).
+ * 2. Before any recursive entry to cdissect, the match data for that
+ *subexpression tree must be guaranteed clear (see zaptreesubs).
+ * 3. When returning REG_OKAY, each level of cdissect will have saved
+ *any relevant match locations.
+ * 4. When returning REG_NOMATCH, each level of cdissect will guarantee
+ *that its subexpression match locations are again clear.
+ * 5. No guarantees are made for error cases (i.e., other result codes).
+ * 6. When a level of cdissect abandons a successful sub-match, it will
+ *clear that subtree's match locations with zaptreesubs before trying
+ *any new DFA match or cdissect call for that subtree or any subtree
+ *to its right (that is, any subtree that could have a backref into the
+ *abandoned match).
+ * This may seem overly complicated, but it's difficult to simplify it
+ * because of the provision that match locations must be reset before
+ * any fresh DFA match (a rule that is needed to 

Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Alvaro Herrera
On 2021-Aug-23, Robert Haas wrote:

> It's also a bit unfair to say, well we have APIs for accessing GUC
> values. It's true that we do. But if the GUC variable is, say, a
> Boolean, you do not want your extension to call some function that
> does a bunch of shenanigans and returns a string so that you can then
> turn around and parse the string to recover the Boolean value. Even
> moreso if the value is an integer or a comma-separated list. You want
> to access the value as the system represents it internally, not
> duplicate the parsing logic in a way that is inefficient and
> bug-prone.

In that case, why not improve the API with functions that return the
values in some native datatype?  For scalars with native C types (int,
floats, Boolean etc) this is easy enough; I bet it'll solve 99% of the
problems or more.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: Queries that should be canceled will get stuck on secure_write function

2021-08-23 Thread Alvaro Herrera
On 2021-Aug-23, Robert Haas wrote:

> On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera  
> wrote:
> > Do we actually need new GUCs, though?  I think we should never let an
> > unresponsive client dictate what the server does, because that opens the
> > door for uncooperative or malicious clients to wreak serious havoc.  I
> > think the implementation should wait until time now+X to cancel the
> > query, but if by time now+2X (or whatever we deem reasonable -- maybe
> > now+1.1X) we're still waiting, then it's okay to just close the
> > connection.  This suggests a completely different implementation, though.
> 
> I don't quite understand what you mean by waiting until time now+X to
> cancel the query. We don't know a priori when query cancels are going
> to happen, but when they do happen we want to respond to them as
> quickly as we can. It seems reasonable to say that if we can't handle
> them within X amount of time we can resort to emergency measures, but
> that's basically what the patch does, except it uses a GUC instead of
> hardcoding X.

Aren't we talking about query cancellations that occur in response to a
standby delay limit?  Those aren't in response to user action.  What I
mean is that if the standby delay limit is exceeded, then we send a
query cancel; we expect the standby to cancel its query at that time and
then the primary can move on.  But if the standby doesn't react, then we
can have it terminate its connection.  I'm looking at the problem from
the primary's point of view rather than the standby's point of view.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: Queries that should be canceled will get stuck on secure_write function

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera  wrote:
> Do we actually need new GUCs, though?  I think we should never let an
> unresponsive client dictate what the server does, because that opens the
> door for uncooperative or malicious clients to wreak serious havoc.  I
> think the implementation should wait until time now+X to cancel the
> query, but if by time now+2X (or whatever we deem reasonable -- maybe
> now+1.1X) we're still waiting, then it's okay to just close the
> connection.  This suggests a completely different implementation, though.

I don't quite understand what you mean by waiting until time now+X to
cancel the query. We don't know a priori when query cancels are going
to happen, but when they do happen we want to respond to them as
quickly as we can. It seems reasonable to say that if we can't handle
them within X amount of time we can resort to emergency measures, but
that's basically what the patch does, except it uses a GUC instead of
hardcoding X.

> I wonder if it's possible to write a test for this.  We would have to
> send a query and then hang the client somehow.  I recently added a TAP
> test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a
> background psql that's running SELECT pg_sleep() perhaps?
> (Or maybe it's sufficient to start background psql and not pump() it?)

Starting a background process and not pumping it sounds promising,
because it seems like it would be more likely to be portable. I think
we'd want to be careful not to assume very much about how large the
output buffer is, because I'm guessing that varies by platform and
that it might in some cases be fairly large. Perhaps we could use
pg_stat_activity to wait until we block in a ClientWrite state,
although I wonder if we might find out that we sometimes block on a
different wait state than what we expect to see.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Chapman Flack
On 08/23/21 10:36, Bruce Momjian wrote:

> So the problem is that extensions only _need_ to use that API on
> Windows, so many initially don't, or that the API is too limited?

I think there can be cases where it's too limited, such as when significant
computation or validation is needed between the form of the setting known
to the GUC machinery and the form that would otherwise be available in
the global.

I'm thinking, for instance, of the old example before session_timezone
was PGDLLIMPORTed, and you'd have to GETCONFIGOPTION("timezone") and
repeat the work done by pg_tzset to validate and map the timezone name
through the timezone database, to reconstruct the value that was
otherwise already available in session_timezone.

Maybe those cases aren't very numerous ... and maybe they're distinctive
enough to recognize when creating one ("hmm, I am creating a check or
assign hook that does significant work here, will it be worth exposing
a getter API for the product of the work?").

Regards,
-Chap




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 10:15 AM Tom Lane  wrote:
> And yes, I absolutely would prohibit extensions from accessing many
> of them, if there were a reasonable way to do it.  It would be a good
> start towards establishing a defined API for extensions.

Mostly, it would make extension development more difficult for no
discernable benefit to the project.

You've made this argument many times over the years ... but we're no
closer to having an extension API than we ever were, and we continue
to get complaints about breaking stuff on Windows on a pretty regular
basis.

Honestly, it seems unimaginable that an API is ever really going to be
possible. It would be a ton of work to maintain, and we'd just end up
breaking it every time we discover that there's a new feature we want
to implement which doesn't fit into the defined API now. That's what
we do *now* with functions that third-party extensions actually call,
and with variables that they access, and it's not something that, in
my experience, is any great problem in maintaining an extension.
You're running code that is running inside somebody else's executable
and sometimes you have to adjust it for upstream changes. That's life,
and I don't think that complaints about that topic are nearly as
frequent as complaints about extensions breaking on Windows because of
missing PGDLLIMPORT markings.

And more than that, I'm pretty sure that you've previously taken the
view that we shouldn't document all the hook functions that only exist
in the backend for the purpose of extension use. I think you would
argue against a patch to go and document all the variables that are
marked PGDLLIMPORT now. So it seems to me that you're for an API when
it means that we don't have to change anything, and against an API
when it means that we don't have to change anything, which doesn't
really seem like a consistent position. I think we should be
responding to the real, expressed needs of extension developers, and
the lack of PGDLLIMPORT markings on various global variables is surely
top of the list.

It's also a bit unfair to say, well we have APIs for accessing GUC
values. It's true that we do. But if the GUC variable is, say, a
Boolean, you do not want your extension to call some function that
does a bunch of shenanigans and returns a string so that you can then
turn around and parse the string to recover the Boolean value. Even
moreso if the value is an integer or a comma-separated list. You want
to access the value as the system represents it internally, not
duplicate the parsing logic in a way that is inefficient and
bug-prone.

In short, +1 from me for the original proposal of marking all GUCs as
PGDLLIMPORT. And, heck, +1 for marking all the other global variables
that way, too. We're not solving any problem here. We're just annoying
people, mostly people who are loyal community members and steady
contributors to the project.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Julien Rouhaud
On Mon, Aug 23, 2021 at 10:15 PM Tom Lane  wrote:
>
> And yes, I absolutely would prohibit extensions from accessing many
> of them, if there were a reasonable way to do it.  It would be a good
> start towards establishing a defined API for extensions.

The v2 patch I sent does that, at least when compiling with GCC.  I
didn't find something similar for clang, but I only checked quickly.

I'm assuming that the unreasonable part is having to add some extra
attribute to the variable?  Would it be acceptable if wrapped into
some other macro, as I proposed?




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Julien Rouhaud
On Mon, Aug 23, 2021 at 10:36 PM Bruce Momjian  wrote:
>
> So the problem is that extensions only _need_ to use that API on
> Windows, so many initially don't, or that the API is too limited?

The inconvenience with that API is that it's only returning c strings,
so you gave to convert it back to the original datatype.  That's
probably why most of the extensions simply read from the original
exposed variable rather than using the API, because they're usually
written on Linux or similar, not because they want to mess up the
stored value.




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Julien Rouhaud
On Mon, Aug 23, 2021 at 10:22 PM Tom Lane  wrote:
>
> Bruce Momjian  writes:
> > On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote:
> >> By that argument, *every* globally-visible variable should be marked
> >> PGDLLIMPORT.  But the mere fact that two backend .c files need to access
>
> > No, Julien says 99% need only the GUCs, so that is not the argument I am
> > making.
>
> That's a claim unbacked by any evidence that I've seen.  More to
> the point, we already have a mechanism that extensions can/should
> use to read and write GUC settings, and it's not direct access.

I clearly didn't try all single extension available out there.  It's
excessively annoying to compile extensions on Windows, and also I
don't have a lot of dependencies installed so there are some that I
wasn't able to test since I'm lacking some other lib and given my
absolute lack of knowledge of that platform I didn't spent time trying
to install those.

I think I tested a dozen of "small" extensions (I'm assuming that the
big one like postgis would require too much effort to build, and they
probably already take care of Windows compatibility), and I only faced
this problem.  That's maybe not a representative set, but I also doubt
that I was unlucky enough to find the few only exceptions.




Re: Queries that should be canceled will get stuck on secure_write function

2021-08-23 Thread Alvaro Herrera
On 2021-Aug-23, Robert Haas wrote:

> On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于)  wrote:
> > I want to know why the interrupt is only handled when ProcDiePending
> > is true, I think query which is supposed to be canceled also should
> > respond to the signal.

Yeah, I agree.

> Well, if we're halfway through sending a message to the client and we
> abort the write, we have no way of re-establishing protocol sync,
> right? It's OK to die without sending any more data to the client,
> since then the connection is closed anyway and the loss of sync
> doesn't matter, but continuing the session doesn't seem workable.
> 
> Your proposed patch actually seems to dodge this problem and I think
> perhaps we could consider something along those lines.

Do we actually need new GUCs, though?  I think we should never let an
unresponsive client dictate what the server does, because that opens the
door for uncooperative or malicious clients to wreak serious havoc.  I
think the implementation should wait until time now+X to cancel the
query, but if by time now+2X (or whatever we deem reasonable -- maybe
now+1.1X) we're still waiting, then it's okay to just close the
connection.  This suggests a completely different implementation, though.

I wonder if it's possible to write a test for this.  We would have to
send a query and then hang the client somehow.  I recently added a TAP
test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a
background psql that's running SELECT pg_sleep() perhaps?
(Or maybe it's sufficient to start background psql and not pump() it?)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"It takes less than 2 seconds to get to 78% complete; that's a good sign.
A few seconds later it's at 90%, but it seems to have stuck there.  Did
somebody make percentages logarithmic while I wasn't looking?"
http://smylers.hates-software.com/2005/09/08/1995c749.html




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Bruce Momjian
On Mon, Aug 23, 2021 at 10:22:51AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote:
> >> By that argument, *every* globally-visible variable should be marked
> >> PGDLLIMPORT.  But the mere fact that two backend .c files need to access
> 
> > No, Julien says 99% need only the GUCs, so that is not the argument I am
> > making.
> 
> That's a claim unbacked by any evidence that I've seen.  More to
> the point, we already have a mechanism that extensions can/should
> use to read and write GUC settings, and it's not direct access.

So the problem is that extensions only _need_ to use that API on
Windows, so many initially don't, or that the API is too limited?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote:
>> By that argument, *every* globally-visible variable should be marked
>> PGDLLIMPORT.  But the mere fact that two backend .c files need to access

> No, Julien says 99% need only the GUCs, so that is not the argument I am
> making.

That's a claim unbacked by any evidence that I've seen.  More to
the point, we already have a mechanism that extensions can/should
use to read and write GUC settings, and it's not direct access.

regards, tom lane




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Bruce Momjian
On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote:
> >> Then shouldn't we try to prevent direct access on all platforms rather than
> >> only one?
> 
> > Agreed.  If Julian says 99% of the non-export problems are GUCs, and we
> > can just export them all, why not do it?  We already export every global
> > variable on Unix-like systems, and we have seen no downsides.
> 
> By that argument, *every* globally-visible variable should be marked
> PGDLLIMPORT.  But the mere fact that two backend .c files need to access

No, Julien says 99% need only the GUCs, so that is not the argument I am
making.

> some variable doesn't mean that we want any random bit of code doing so.
> 
> And yes, I absolutely would prohibit extensions from accessing many
> of them, if there were a reasonable way to do it.  It would be a good
> start towards establishing a defined API for extensions.

Well, if Unix needed it, we would have addressed this more regularly,
but since it is only Windows that has this _feature_, it is the rare
Windows cases that need adjustment, and usually as an afterthought since
Windows isn't usually a primary tested platform.

What I am saying is that if we blocked global variable access on Unix,
initial testing would have showed what needs to be exported and it would
not be as big a problem.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Added schema level support for publication.

2021-08-23 Thread vignesh C
On Tue, Aug 17, 2021 at 6:55 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Tue, Aug 17, 2021 at 6:40 AM Peter Smith 
wrote:
> >> On Mon, Aug 16, 2021 at 11:31 PM Tom Lane  wrote:
> >>> Abstractly it'd be
> >>>
> >>> createpub := CREATE PUBLICATION pubname FOR cpitem [, ... ] [ WITH
... ]
> >>>
> >>> cpitem := ALL TABLES |
> >>> TABLE name |
> >>> ALL TABLES IN SCHEMA name |
> >>> ALL SEQUENCES |
> >>> SEQUENCE name |
> >>> ALL SEQUENCES IN SCHEMA name |
> >>> name
> >>>
> >>> The grammar output would need some post-analysis to attribute the
> >>> right type to bare "name" items, but that doesn't seem difficult.
>
> >> That last bare "name" cpitem. looks like it would permit the following
syntax:
> >> CREATE PUBLICATION pub FOR a,b,c;
> >> Was that intentional?
>
> > I think so.
>
> I had supposed that we could throw an error at the post-processing stage,
> or alternatively default to assuming that such names are tables.
>
> Now you could instead make the grammar work like
>
> cpitem := ALL TABLES |
>   TABLE name [, ...] |
>   ALL TABLES IN SCHEMA name [, ...] |
>   ALL SEQUENCES |
>   SEQUENCE name [, ...] |
>   ALL SEQUENCES IN SCHEMA name [, ...]
>
> which would result in a two-level-list data structure.  I'm not sure
> that this is better, as any sort of mistake would result in a very
> uninformative generic "syntax error" from Bison.  Errors out of a
> post-processing stage could be more specific than that.

I preferred the implementation in the lines Tom Lane had proposed at [1].
Is it ok if the implementation is something like below:
CreatePublicationStmt:
CREATE PUBLICATION name FOR pub_obj_list opt_definition
{
CreatePublicationStmt *n = makeNode(CreatePublicationStmt);
n->pubname = $3;
n->options = $6;
n->pubobjects = (List *)$5;
$$ = (Node *)n;
}
;
pub_obj_list: PublicationObjSpec
{ $$ = list_make1($1); }
| pub_obj_list ',' PublicationObjSpec
{ $$ = lappend($1, $3); }
;
/* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */
PublicationObjSpec: TABLE pubobj_expr
{ }
| ALL TABLES IN_P SCHEMA pubobj_expr
{ }
| pubobj_expr
{ }
;
pubobj_expr:
any_name
{ }
| any_name '*'
{ }
| ONLY any_name
{ }
| ONLY '(' any_name ')'
{ }
| CURRENT_SCHEMA
{ }
;

I needed pubobj_expr to support the existing syntaxes supported by
relation_expr and also to handle CURRENT_SCHEMA support in case of the "FOR
ALL TABLES IN SCHEMA" feature. I changed the name to any_name to support
objects like "sch1.t1".
I felt if a user specified "FOR ALL TABLES", the user should not be allowed
to combine it with "FOR TABLE" and "FOR ALL TABLES IN SCHEMA" as "FOR ALL
TABLES" anyway will include all the tables.
Should we support the similar syntax in case of alter publication, like
"ALTER PUBLICATION pub1 ADD TABLE t1,t2, ALL TABLES IN SCHEMA sch1, sch2"
or shall we keep these separate like "ALTER PUBLICATION pub1 ADD TABLE t1,
t2"  and "ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1, sch2". I
preferred to keep it separate as we have kept ADD/DROP separately which
cannot be combined currently. I have not mentioned SEQUENCE handling
separately, the sequence will be extended in similar lines. I thought of
throwing an error at post processing, the first option that Tom Lane had
suggested if the user specified syntax like: create publication pub1 for
t1,t2;
Thoughts?

[1] -
https://www.postgresql.org/message-id/877603.1629120678%40sss.pgh.pa.us

Regards,
Vignesh


Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Tom Lane
Bruce Momjian  writes:
> On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote:
>> Then shouldn't we try to prevent direct access on all platforms rather than
>> only one?

> Agreed.  If Julian says 99% of the non-export problems are GUCs, and we
> can just export them all, why not do it?  We already export every global
> variable on Unix-like systems, and we have seen no downsides.

By that argument, *every* globally-visible variable should be marked
PGDLLIMPORT.  But the mere fact that two backend .c files need to access
some variable doesn't mean that we want any random bit of code doing so.

And yes, I absolutely would prohibit extensions from accessing many
of them, if there were a reasonable way to do it.  It would be a good
start towards establishing a defined API for extensions.

regards, tom lane




Re: Queries that should be canceled will get stuck on secure_write function

2021-08-23 Thread Robert Haas
On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于)  wrote:
> I want to know why the interrupt is only handled when ProcDiePending is true, 
> I think query which is supposed to be canceled also should respond to the 
> signal.

Well, if we're halfway through sending a message to the client and we
abort the write, we have no way of re-establishing protocol sync,
right? It's OK to die without sending any more data to the client,
since then the connection is closed anyway and the loss of sync
doesn't matter, but continuing the session doesn't seem workable.

Your proposed patch actually seems to dodge this problem and I think
perhaps we could consider something along those lines. It would be
interesting to hear what Andres thinks about this idea, since I think
he was the last one to rewrite that code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Bruce Momjian
On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote:
> On Sun, Aug 22, 2021 at 09:19:42AM -0400, Tom Lane wrote:
> > 
> > Uh, no, it's exactly *not* clear.  There are a lot of GUCs that are only
> > of interest to particular subsystems.  I do not see why being a GUC makes
> > something automatically more interesting than any other global variable.
> > Usually, the fact that one is global is only so the GUC machinery itself
> > can get at it, otherwise it'd be static in the owning module.
> > 
> > As for "extensions should be able to get at the values", the GUC machinery
> > already provides uniform mechanisms for doing that safely.  Direct access
> > to the variable's internal value would be unsafe in many cases.
> 
> Then shouldn't we try to prevent direct access on all platforms rather than
> only one?

Agreed.  If Julian says 99% of the non-export problems are GUCs, and we
can just export them all, why not do it?  We already export every global
variable on Unix-like systems, and we have seen no downsides.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-23 Thread houzj.f...@fujitsu.com
On Mon, Aug 23, 2021 8:01 PM Amit Kapila  wrote:
> On Mon, Aug 23, 2021 at 2:45 PM 
> houzj.f...@fujitsu.com wrote:
> >
> > On Mon, Aug 23, 2021 12:59 PM Amit Kapila  wrote:
> > >
> > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com 
> > >  wrote:
> > > >
> > > > Personally, I also think it will be better to make the behavior 
> > > > consistent.
> > > > Attach the new version patch make both ADD and DROP behave the
> > > > same as SET PUBLICATION which refresh all the publications.
> > > >
> > >
> > > I think we can have tests in the separate test file
> > > (alter_sub_pub.pl) like you earlier had in one of the versions. Use
> > > some meaningful names for tables instead of temp1, temp2 as you had in the
> previous version.
> > > Otherwise, the code changes look good to me.
> >
> > Thanks for the comment.
> > Attach new version patch which did the following changes.
> >
> > * move the tests to a separate test file (024_alter_sub_pub.pl)
> > * adjust the document of copy_data option
> > * add tab-complete for copy_data option when ALTER DROP publication.
> >
> 
> Thanks, the patch looks good to me. I have made some cosmetic changes in the
> attached version. It makes the test cases easier to understand.
> I am planning to push this tomorrow unless there are more comments or
> suggestions.

Thanks ! The patch looks good to me.

I noticed that the patch cannot be applied to PG14-stable,
so I attach a separate patch for back-patch(I didn’t change the patch for HEAD 
branch).

Best regards,
Hou zj




v7-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch
Description:  v7-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch


v7-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch
Description:  v7-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch


Re: .ready and .done files considered harmful

2021-08-23 Thread Robert Haas
On Sun, Aug 22, 2021 at 10:31 PM Bossart, Nathan  wrote:
> I ran this again on a bigger machine with 200K WAL files pending
> archive.  The v9 patch took ~5.5 minutes, the patch I sent took ~8
> minutes, and the existing logic took just under 3 hours.

Hmm. On the one hand, 8 minutes > 5.5 minutes, and presumably the gap
would only get wider if the number of files were larger or if reading
the directory were slower. I am pretty sure that reading the directory
must be much slower in some real deployments where this problem has
come up. On the other hand, 8.8 minutes << 3 hours, and your patch
would win if somehow we had a ton of gaps in the sequence of files.
I'm not sure how likely that is to be the cause - probably not very
likely at all if you aren't using an archive command that cheats, but
maybe really common if you are. Hmm, but I think if the
archive_command cheats by marking a bunch of files done when it is
tasked with archiving just one, your patch will break, because, unless
I'm missing something, it doesn't re-evaluate whether things have
changed on every pass through the loop as Dipesh's patch does. So I
guess I'm not quite sure I understand why you think this might be the
way to go?

Maintaining the binary heap in lowest-priority-first order is very
clever, and the patch does look quite elegant. I'm just not sure I
understand the point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: replay of CREATE TABLESPACE eats data at wal_level=minimal

2021-08-23 Thread Robert Haas
On Sun, Aug 22, 2021 at 6:59 PM Noah Misch  wrote:
> Here's what I plan to push.  Besides adding a test, I modified things so
> CREATE TABLESPACE redo continues to report an error if a non-directory exists
> under the name we seek to create.  One could argue against covering that
> corner case, but TablespaceCreateDbspace() does cover it.

By and large, LGTM, though perhaps it would be appropriate to also
credit me as the reporter of the issue.

I feel it might be slightly better to highlight somewhere, either in
the commit message or in the comments, that removing the old directory
is unsafe, because if wal_level=minimal, we may have no other copy of
the data. For me that's the key point here. I feel that the commit
message and comments inside the patch explain rather thoroughly the
possible consequences of the bug and why this particular fix was
chosen, but they're not real explicit about why there was a bug at
all.

Thanks very much for working on this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-23 Thread Amit Kapila
On Mon, Aug 23, 2021 at 2:45 PM houzj.f...@fujitsu.com
 wrote:
>
> On Mon, Aug 23, 2021 12:59 PM Amit Kapila  wrote:
> >
> > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com 
> >  wrote:
> > >
> > > Personally, I also think it will be better to make the behavior 
> > > consistent.
> > > Attach the new version patch make both ADD and DROP behave the same as
> > > SET PUBLICATION which refresh all the publications.
> > >
> >
> > I think we can have tests in the separate test file (alter_sub_pub.pl) like 
> > you
> > earlier had in one of the versions. Use some meaningful names for tables
> > instead of temp1, temp2 as you had in the previous version.
> > Otherwise, the code changes look good to me.
>
> Thanks for the comment.
> Attach new version patch which did the following changes.
>
> * move the tests to a separate test file (024_alter_sub_pub.pl)
> * adjust the document of copy_data option
> * add tab-complete for copy_data option when ALTER DROP publication.
>

Thanks, the patch looks good to me. I have made some cosmetic changes
in the attached version. It makes the test cases easier to understand.
I am planning to push this tomorrow unless there are more comments or
suggestions.

-- 
With Regards,
Amit Kapila.


v6-0001-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch
Description: Binary data


Re: replay of CREATE TABLESPACE eats data at wal_level=minimal

2021-08-23 Thread Prabhat Sahu
On Mon, Aug 23, 2021 at 4:29 AM Noah Misch  wrote:

> On Wed, Aug 18, 2021 at 10:32:10PM -0700, Noah Misch wrote:
> > On Wed, Aug 18, 2021 at 10:47:24AM -0400, Robert Haas wrote:
> > > On Tue, Aug 10, 2021 at 9:35 AM Robert Haas 
> wrote:
> > > > Oh, yeah, I think that works, actually. I was imagining a few
> problems
> > > > here, but I don't think they really exist. The redo routines for
> files
> > > > within the directory can't possibly care about having the old files
> > > > erased for them, since that wouldn't be something that would normally
> > > > happen, if there were no recent CREATE TABLESPACE involved. And
> > > > there's code further down to remove and recreate the symlink, just in
> > > > case. So I think your proposed patch might be all we need.
> > >
> > > Noah, do you plan to commit this?
> >
> > Yes.  I feel it needs a test case, which is the main reason I've queued
> the
> > task rather than just pushed what I posted last.
>
> Here's what I plan to push.  Besides adding a test,


I have reproduced the issue of data inconsistency with CREATE TABLESPACE at
wal_level=minimal,
also I have tested the fix with v0 and v1 patch, and come up with a similar
tap-testcase(as in v1). The test case looks good.

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

2021-08-23 Thread Amit Kapila
On Wed, Aug 18, 2021 at 8:09 PM Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 8/18/21 12:01 PM, Amit Kapila wrote:
> > On Wed, Aug 18, 2021 at 1:27 PM Drouvot, Bertrand  
> > wrote:
> >> Hi,
>
> I've updated the comment and prepared the back patch versions:
>

I have verified and all your patches look good to me. I'll push and
backpatch this by Wednesday unless there are more comments.

-- 
With Regards,
Amit Kapila.




Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)

2021-08-23 Thread Dilip Kumar
On Mon, Aug 23, 2021 at 1:43 PM Amit Kapila  wrote:

Note: merge comments from multiple mails

> > I think we should handle that in worker.c itself, by adding a
> > before_dsm_detach function before_shmem_exit right?
> >
>
> Yeah, I thought of handling it in worker.c similar to what you've in 0002 
> patch.
>

I have done handling in worker.c

On Mon, Aug 23, 2021 at 9:11 AM houzj.f...@fujitsu.com
 wrote:
>
> On Sat, Aug 21, 2021 8:38 PM Dilip Kumar  wrote:
>
> 1)
> +   TempTablespacePath(tempdirpath, tablespace);
> +   snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
> +tempdirpath, PG_TEMP_FILE_PREFIX,
> +(unsigned long) fileset->creator_pid, 
> fileset->number);
>
> do we need to use different filename for shared and un-shared fileset ?

I was also thinking about the same, does it make sense to name it just
""%s/%s%lu.%u.fileset"?

On Mon, Aug 23, 2021 at 1:08 PM Masahiko Sawada  wrote:

> Here are some comments on 0001 patch:
>
> +/*
> + * Initialize a space for temporary files. This API can be used by shared
> + * fileset as well as if the temporary files are used only by single backend
> + * but the files need to be opened and closed multiple times and also the
> + * underlying files need to survive across transactions.
> + *
> + * Files will be distributed over the tablespaces configured in
> + * temp_tablespaces.
> + *
> + * Under the covers the set is one or more directories which will eventually
> + * be deleted.
> + */
>
> I think it's better to mention cleaning up here like we do in the
> comment of SharedFileSetInit().

Right, done

>
> ---
> I think we need to clean up both stream_fileset and subxact_fileset on
> proc exit. In 0002 patch cleans up the fileset but I think we need to
> do that also in 0001 patch.

Done

> ---
> There still are some comments using "shared fileset" in both buffile.c
> and worker.c.

Done


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From ced3a5e0222c2f89a977add167615f09d99d107a Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 18 Aug 2021 15:52:21 +0530
Subject: [PATCH v2 1/2] Sharedfileset refactoring

Currently, sharedfileset.c is designed for a very specific purpose i.e.
one backend could create the fileset and  could be shared across multiple
backend so the fileset should be created in DSM.  But in some use cases,
we need exact behavior as sharedfileset that the files created under this
should be named files and should survive the transaction and should allow
to have feature of open and close.  In this patch  we have refactored
these files such that there will be two files a) fileset.c which will provide
general-purpose interfaces b) sharedfileset.c, which will internally used
fileset.c but this will be specific for DSM-based filesets.
---
 src/backend/replication/logical/worker.c  |  89 +++
 src/backend/storage/file/Makefile |   1 +
 src/backend/storage/file/buffile.c|  82 +-
 src/backend/storage/file/fd.c |   2 +-
 src/backend/storage/file/fileset.c| 204 +
 src/backend/storage/file/sharedfileset.c  | 239 +-
 src/backend/utils/sort/logtape.c  |   8 +-
 src/backend/utils/sort/sharedtuplestore.c |   5 +-
 src/include/storage/buffile.h |  12 +-
 src/include/storage/fileset.h |  40 +
 src/include/storage/sharedfileset.h   |  14 +-
 11 files changed, 366 insertions(+), 330 deletions(-)
 create mode 100644 src/backend/storage/file/fileset.c
 create mode 100644 src/include/storage/fileset.h

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ecaed15..d98ebab 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -39,13 +39,13 @@
  * BufFile infrastructure supports temporary files that exceed the OS file size
  * limit, (b) provides a way for automatic clean up on the error and (c) provides
  * a way to survive these files across local transactions and allow to open and
- * close at stream start and close. We decided to use SharedFileSet
+ * close at stream start and close. We decided to use FileSet
  * infrastructure as without that it deletes the files on the closure of the
  * file and if we decide to keep stream files open across the start/stop stream
  * then it will consume a lot of memory (more than 8K for each BufFile and
  * there could be multiple such BufFiles as the subscriber could receive
  * multiple start/stop streams for different transactions before getting the
- * commit). Moreover, if we don't use SharedFileSet then we also need to invent
+ * commit). Moreover, if we don't use FileSet then we also need to invent
  * a new way to pass filenames to BufFile APIs so that we are allowed to open
  * the file we desired across multiple stream-open calls for the same
  * transaction.
@@ -231,8 +231,8 @@ typedef struct 

Re: Proposal: More structured logging

2021-08-23 Thread Magnus Hagander
On Sat, Aug 21, 2021 at 2:37 AM Michael Paquier  wrote:
>
> On Fri, Aug 20, 2021 at 11:35:29AM +0200, Ronan Dunklau wrote:
> > Michael, your jsonlog module already fullfills this need. Is it something 
> > that
> > should be merged into our tree ?
>
> Yes, there is nothing technically preventing to have this stuff in
> core, of course, and that would even take care of the issues in
> detecting if the piping protocol should be used or not.
>
> Now, the last time this was proposed, there was a lot of drawback
> because the presence of the JSON keys increases significantly the size
> of the logs compared to CSV, and usually users care a lot about the
> size of the log files.  I would support the presence of JSON format
> for the logs in core, FWIW.

As long as it's optional, I don't think that drawback holds as an
argument. The same argument could be made against the cvs logs in the
first place -- they add information to every row that a lot of people
don't need. But it's optional. Leaving it up to the administrator to
choose whether they prefer the verbose-and-easier-to-parse-maybe
format or the more compact format seems like the right path for that.
I bet quite a few would actually choose json -- easier to integrate
with other systems (because newer systems love json), and unless
you're actually logging a lot of queries (which many people don't),
the size of the logfile is often not a concern at all.

In short, I would also support the presence of JSON log format in
core. (but as a proper log_destination of course -- or if it's time to
actually split that into a separaet thing, being one parameter for
log_destination and another for log_format)


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Defer selection of asynchronous subplans until the executor initialization stage

2021-08-23 Thread Etsuro Fujita
On Wed, Jun 30, 2021 at 1:50 PM Andrey Lepikhov
 wrote:
> I have completely rewritten this patch.
>
> Main idea:
>
> The async_capable field of a plan node inform us that this node could
> work in async mode. Each node sets this field based on its own logic.
> The actual mode of a node is defined by the async_capable of PlanState
> structure. It is made at the executor initialization stage.
> In this patch, only an append node could define async behaviour for its
> subplans.

I finally reviewed the patch.  One thing I noticed about the patch is
that it would break ordered Appends.  Here is such an example using
the patch:

create table pt (a int) partition by range (a);
create table loct1 (a int);
create table loct2 (a int);
create foreign table p1 partition of pt for values from (10) to (20)
server loopback1 options (table_name 'loct1');
create foreign table p2 partition of pt for values from (20) to (30)
server loopback2 options (table_name 'loct2');

explain verbose select * from pt order by a;
   QUERY PLAN
-
 Append  (cost=200.00..440.45 rows=5850 width=4)
   ->  Async Foreign Scan on public.p1 pt_1  (cost=100.00..205.60
rows=2925 width=4)
 Output: pt_1.a
 Remote SQL: SELECT a FROM public.loct1 ORDER BY a ASC NULLS LAST
   ->  Async Foreign Scan on public.p2 pt_2  (cost=100.00..205.60
rows=2925 width=4)
 Output: pt_2.a
 Remote SQL: SELECT a FROM public.loct2 ORDER BY a ASC NULLS LAST
(7 rows)

This would not always provide tuples in the required order, as async
execution would provide them from the subplans rather randomly.  I
think it would not only be too late but be not efficient to do the
planning work at execution time (consider executing generic plans!),
so I think we should avoid doing so.  (The cost of doing that work for
simple foreign scans is small, but if we support async execution for
upper plan nodes such as NestLoop as discussed before, I think the
cost for such plan nodes would not be small anymore.)

To just execute what was planned at execution time, I think we should
return to the patch in [1].  The patch was created for Horiguchi-san’s
async-execution patch, so I modified it to work with HEAD, and added a
simplified version of your test cases.  Please find attached a patch.

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/7fe10f95-ac6c-c81d-a9d3-227493eb9...@postgrespro.ru


allow-async-in-more-cases.patch
Description: Binary data


RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-23 Thread houzj.f...@fujitsu.com
On Mon, Aug 23, 2021 12:59 PM Amit Kapila  wrote:
> 
> On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com 
>  wrote:
> >
> > Personally, I also think it will be better to make the behavior consistent.
> > Attach the new version patch make both ADD and DROP behave the same as
> > SET PUBLICATION which refresh all the publications.
> >
> 
> I think we can have tests in the separate test file (alter_sub_pub.pl) like 
> you
> earlier had in one of the versions. Use some meaningful names for tables
> instead of temp1, temp2 as you had in the previous version.
> Otherwise, the code changes look good to me.

Thanks for the comment.
Attach new version patch which did the following changes.

* move the tests to a separate test file (024_alter_sub_pub.pl)
* adjust the document of copy_data option
* add tab-complete for copy_data option when ALTER DROP publication.

Best regards,
Hou zj


v5-0001-Fix-Alter-Subscription-Add-Drop-Publication-refresh-.patch
Description:  v5-0001-Fix-Alter-Subscription-Add-Drop-Publication-refresh-.patch


RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-23 Thread houzj.f...@fujitsu.com
On Mon, Aug 23, 2021 1:18 PM Masahiko Sawada  wrote:
> On Mon, Aug 23, 2021 at 1:59 PM Amit Kapila 
> wrote:
> >
> > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > Personally, I also think it will be better to make the behavior 
> > > consistent.
> > > Attach the new version patch make both ADD and DROP behave the same
> > > as SET PUBLICATION which refresh all the publications.
> > >
> >
> > I think we can have tests in the separate test file (alter_sub_pub.pl)
> > like you earlier had in one of the versions. Use some meaningful names
> > for tables instead of temp1, temp2 as you had in the previous version.
> > Otherwise, the code changes look good to me.
> 
> -   supported_opts = SUBOPT_REFRESH;
> -   if (isadd)
> -   supported_opts |= SUBOPT_COPY_DATA;
> +   supported_opts = SUBOPT_REFRESH | SUBOPT_COPY_DATA;
> 
> I think that the currently the doc says copy_data option can be specified 
> except
> in DROP PUBLICATION case, which needs to be fixed corresponding the above
> change.

Thanks for the comment.
Fixed in the new version patch.

Best regards,
Hou zj


Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA

2021-08-23 Thread Muhammad Usama
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

The patch looks fine and successfully fixes the issue of missing privileges 
issue.

The new status of this patch is: Ready for Committer


Queries that should be canceled will get stuck on secure_write function

2021-08-23 Thread 蔡梦娟(玊于)
Hi, all

Recently, I got a problem that the startup process of standby is stuck and 
keeps in a waiting state. The backtrace of startup process shows that it is 
waiting for a backend process which conflicts with recovery processing to exit, 
 the guc parameters max_standby_streaming_delay and max_standby_archive_delay 
are both set as 30 seconds, but it doesn't work. The backend process keeps 
alive, and the backtrace of this backend process show that it is waiting for 
the socket to be writeable in secure_write(). After further reading the code, I 
found that ProcessClientWriteInterrupt() in secure_write() only process 
interrupts when ProcDiePending is true, otherwise do nothing. However, snapshot 
conflicts with recovery will only set QueryCancelPending as true, so the 
response to the signal will de delayed indefinitely if the corresponding client 
is stuck, thus blocking the recovery process.

I want to know why the interrupt is only handled when ProcDiePending is true, I 
think query which is supposed to be canceled also should respond to the signal.

I also want to share a patch with you, I add a guc parameter 
max_standby_client_write_delay, if a query is supposed to be canceled, and the 
time delayed by a client exceeds max_standby_client_write_delay, then set 
ProcDiePending as true to avoid being delayed indefinitely, what do you think 
of it, hope to get your reply.

Thanks & Best Regard

0001-Set-max-client-write-delay-timeout-for-query-which-i.patch
Description: Binary data


Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)

2021-08-23 Thread Amit Kapila
On Mon, Aug 23, 2021 at 12:52 PM Dilip Kumar  wrote:
>
> On Mon, Aug 23, 2021 at 11:43 AM Amit Kapila  wrote:
> >
> > On Mon, Aug 23, 2021 at 9:53 AM Dilip Kumar  wrote:
> > >
> > > On Mon, Aug 23, 2021 at 9:11 AM houzj.f...@fujitsu.com
> > >  wrote:
> > >
> > > > 4)
> > > > -extern File SharedFileSetCreate(SharedFileSet *fileset, const char 
> > > > *name);
> > > > -extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
> > > > - int mode);
> > > > -extern bool SharedFileSetDelete(SharedFileSet *fileset, const char 
> > > > *name,
> > > > -   bool 
> > > > error_on_failure);
> > > >  extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
> > > > -extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
> > > >
> > > > I noticed the patch delete part of public api, is it better to keep the 
> > > > old api and
> > > > let them invoke new api internally ? Having said that, I didn’t find 
> > > > any open source
> > > > extension use these old api, so it might be fine to delete them.
> > >
> > > Right, those were internally used by buffile.c but now we have changed
> > > buffile.c to directly use the fileset interfaces, so we better remove
> > > them.
> > >
> >
> > I also don't see any reason to keep those exposed from
> > sharedfileset.h. I see that even in the original commit dc6c4c9dc2,
> > these APIs seem to be introduced to be used by buffile. Andres/Thomas,
> > do let us know if you think otherwise?
> >
> > One more comment:
> > I think v1-0001-Sharedfileset-refactoring doesn't have a way for
> > cleaning up worker.c temporary files on error/exit. It is better to
> > have that to make it an independent patch.
>
> I think we should handle that in worker.c itself, by adding a
> before_dsm_detach function before_shmem_exit right?
>

Yeah, I thought of handling it in worker.c similar to what you've in 0002 patch.

-- 
With Regards,
Amit Kapila.




Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o

2021-08-23 Thread Masahiko Sawada
On Sat, Aug 21, 2021 at 9:38 PM Dilip Kumar  wrote:
>
> On Wed, Aug 18, 2021 at 3:45 PM Dilip Kumar  wrote:
> >
> > On Wed, Aug 18, 2021 at 11:24 AM Amit Kapila  
> > wrote:
> > >
> > > On Tue, Aug 17, 2021 at 4:34 PM Andres Freund  wrote:
> > > >
> > > > On 2021-08-17 10:54:30 +0530, Amit Kapila wrote:
> > > > > 5. How can we provide a strict mechanism to not allow to use dsm APIs
> > > > > for non-dsm FileSet? One idea could be that we can have a variable
> > > > > (probably bool) in SharedFileSet structure which will be initialized
> > > > > in SharedFileSetInit based on whether the caller has provided dsm
> > > > > segment. Then in other DSM-based APIs, we can check if it is used for
> > > > > the wrong type.
> > > >
> > > > Well, isn't the issue here that it's not a shared file set in case you
> > > > explicitly don't want to share it? ISTM that the proper way to address
> > > > this would be to split out a FileSet from SharedFileSet that's then used
> > > > for worker.c and sharedfileset.c.
> > > >
> > >
> > > Okay, but note that to accomplish the same, we need to tweak the
> > > BufFile (buffile.c) APIs as well so that they can work with FileSet.
> > > As per the initial analysis, there doesn't seem to be any problem with
> > > that though.
> >
> > I was looking into this, so if we want to do that I think the outline
> > will look like this
> >
> > - There will be a fileset.c and fileset.h files, and we will expose a
> > new structure FileSet, which will be the same as SharedFileSet, except
> > mutext and refcount.  The fileset.c will expose FileSetInit(),
> > FileSetCreate(), FileSetOpen(), FileSetDelete() and FileSetDeleteAll()
> > interfaces.
> >
> > - sharefileset.c will internally call the fileset.c's interfaces.  The
> > SharedFileSet structure will also contain FileSet and other members
> > i.e. mutex and refcount.
> >
> > - the buffile.c's interfaces which are ending with Shared e.g.
> > BufFileCreateShared, BufFileOpenShared, should be converted to
> > BufFileCreate and
> > BufFileOpen respectively.  And the input to these interfaces can be
> > converted to FileSet instead of SharedFileSet.
>
> Here is the first draft based on the idea we discussed, 0001, splits
> sharedfileset.c in sharedfileset.c and fileset.c and 0002 is same
> patch I submitted earlier(use single fileset throughout the worker),
> just it is rebased on top of 0001.  Please let me know your thoughts.

Here are some comments on 0001 patch:

+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */

I think it's better to mention cleaning up here like we do in the
comment of SharedFileSetInit().

---
I think we need to clean up both stream_fileset and subxact_fileset on
proc exit. In 0002 patch cleans up the fileset but I think we need to
do that also in 0001 patch.

---
There still are some comments using "shared fileset" in both buffile.c
and worker.c.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Minor pg_amcheck fixes spotted while reading code

2021-08-23 Thread Daniel Gustafsson
> On 21 Aug 2021, at 02:43, Michael Paquier  wrote:
> 
> On Fri, Aug 20, 2021 at 11:42:09AM -0700, Mark Dilger wrote:
>> These look correct.
> 
> static void
> -help(const char *progname)
> +help(const char *program_name)
> These were discussed not long ago, and I recall that they were in the
> we-don't-care category.  Note for example all the tools of
> src/scripts/ and pg_dump/.

Fair enough, I had missed that thread.

--
Daniel Gustafsson   https://vmware.com/





Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)

2021-08-23 Thread Dilip Kumar
On Mon, Aug 23, 2021 at 11:43 AM Amit Kapila  wrote:
>
> On Mon, Aug 23, 2021 at 9:53 AM Dilip Kumar  wrote:
> >
> > On Mon, Aug 23, 2021 at 9:11 AM houzj.f...@fujitsu.com
> >  wrote:
> >
> > > 4)
> > > -extern File SharedFileSetCreate(SharedFileSet *fileset, const char 
> > > *name);
> > > -extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
> > > - int mode);
> > > -extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
> > > -   bool 
> > > error_on_failure);
> > >  extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
> > > -extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
> > >
> > > I noticed the patch delete part of public api, is it better to keep the 
> > > old api and
> > > let them invoke new api internally ? Having said that, I didn’t find any 
> > > open source
> > > extension use these old api, so it might be fine to delete them.
> >
> > Right, those were internally used by buffile.c but now we have changed
> > buffile.c to directly use the fileset interfaces, so we better remove
> > them.
> >
>
> I also don't see any reason to keep those exposed from
> sharedfileset.h. I see that even in the original commit dc6c4c9dc2,
> these APIs seem to be introduced to be used by buffile. Andres/Thomas,
> do let us know if you think otherwise?
>
> One more comment:
> I think v1-0001-Sharedfileset-refactoring doesn't have a way for
> cleaning up worker.c temporary files on error/exit. It is better to
> have that to make it an independent patch.

I think we should handle that in worker.c itself, by adding a
before_dsm_detach function before_shmem_exit right?  Or you are
thinking that in FileSetInit, we keep the mechanism of filesetlist
like we were doing in SharedFileSetInit?  I think that will just add
unnecessary complexity in the first patch which will eventually go
away in the second patch.  And if we do that then SharedFileSetInit
can not directly use the FileSetInit, otherwise, the dsm based fileset
will also get registered for cleanup in filesetlist so for that we
might need to pass one parameter to the FileSetInit() that whether to
register for cleanup or not and that will again not look clean because
now we are again adding the conditional cleanup, IMHO that is the same
problem what we are trying to cleanup in SharedFileSetInit by
introducing a new FileSetInit.

I think what we can do is, introduce a new function
FileSetInitInternal(), that will do what FileSetInit() is doing today
and now both SharedFileSetInit() and the FileSetInit() will call this
function, and along with that SharedFileSetInit(), will register the
dsm based cleanup and FileSetInit() will do the filesetlist based
cleanup.  But IMHO, we should try to go in this direction only if we
are sure that we want to commit the first patch and not the second.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pg_veryfybackup can fail with a valid backup for TLI > 1

2021-08-23 Thread Kyotaro Horiguchi
At Mon, 23 Aug 2021 15:46:37 +0900, Michael Paquier  wrote 
in 
> On Fri, Aug 20, 2021 at 04:47:15PM +0900, Kyotaro Horiguchi wrote:
> > Yes, backup_label looks correct.
> > 
> > backup_label (extract):
> > START WAL LOCATION: 0/528 (file 00020005)
> > CHECKPOINT LOCATION: 0/560
> > START TIMELINE: 2
> 
> Okay.  I have worked on that today, did more manual tests, and applied
> this fix down to 13.  The cherry-pick from 14 to 13 only required a
> s/$master/$primary/ in the test, which was straight-forward.  Your
> patch for 13 did that though:
> -   if (starttli != entry->tli)
> +   if (!XLogRecPtrIsInvalid(entry->begin))
> 
> So it would have caused a failure with parent TLIs that have a correct
> begin location, but we expect the opposite.  The patch for 14/HEAD had
> that right.

Mmm. Sorry for the silly mistake, and thanks for commiting this!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Proposal: More structured logging

2021-08-23 Thread Ronan Dunklau
Le vendredi 20 août 2021, 11:31:21 CEST Ronan Dunklau a écrit :
> Le jeudi 19 août 2021, 15:04:30 CEST Alvaro Herrera a écrit :
> > On 2021-Aug-13, Ronan Dunklau wrote:
> > > ereport(NOTICE,
> > > 
> > >  (errmsg("My log message")),
> > >  (errtag("EMITTER", "MYEXTENSION")),
> > >  (errtag("MSG-ID", "%d", error_message_id))
> > > 
> > > );
> > 
> > Interesting idea.   I agree this would be useful.
> > 
> > > Please find attached a very small POC patch to better demonstrate what I
> > > propose.
> > 
> > Seems like a good start.  I think a further step towards a committable
> > patch would include a test module that uses the new tagging
> > functionality.
> 
> Please find attached the original patch + a new one adding a test module.
> The test module exposes a function for generating logs with tags, and a log
> hook which format the tags in the DETAIL field for easy regression testing.
> 
> Next step would be to emit those tags in the CSV logs. I'm not sure what
> representation they should have though: a nested csv in it's own column ? A
> key => value pairs list similar to hstore ? A json object ?

I opted for a JSON representation in the CSV logs, please find attached v3 
which is the same thing as v2 with another patch for CSV log output.

> 
> Also we should probably make this available to the client too, but once
> again the format of the tag field needs to be defined. Any opinion ?


-- 
Ronan Dunklau>From e5af5d1a07e65926eee90e0d18443a673d1fcba8 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Fri, 13 Aug 2021 15:03:18 +0200
Subject: [PATCH v3 1/3] Add ErrorTag support

---
 src/backend/utils/error/elog.c | 48 ++
 src/include/utils/elog.h   | 10 +++
 2 files changed, 58 insertions(+)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index a3e1c59a82..5b9b1b8a72 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -465,6 +465,7 @@ errstart(int elevel, const char *domain)
 		edata->sqlerrcode = ERRCODE_SUCCESSFUL_COMPLETION;
 	/* errno is saved here so that error parameter eval can't change it */
 	edata->saved_errno = errno;
+	edata->tags = NIL;
 
 	/*
 	 * Any allocations for this error state level should go into ErrorContext
@@ -516,6 +517,7 @@ errfinish(const char *filename, int lineno, const char *funcname)
 	int			elevel;
 	MemoryContext oldcontext;
 	ErrorContextCallback *econtext;
+	ListCell   *lc;
 
 	recursion_depth++;
 	CHECK_STACK_DEPTH();
@@ -621,7 +623,18 @@ errfinish(const char *filename, int lineno, const char *funcname)
 		pfree(edata->constraint_name);
 	if (edata->internalquery)
 		pfree(edata->internalquery);
+	/* Every tag should have been palloc'ed */
+	if (edata->tags != NIL)
+	{
+		foreach(lc, edata->tags)
+		{
+			ErrorTag   *tag = (ErrorTag *) lfirst(lc);
 
+			pfree(tag->tagvalue);
+			pfree(tag);
+		}
+		pfree(edata->tags);
+	}
 	errordata_stack_depth--;
 
 	/* Exit error-handling context */
@@ -1192,6 +1205,41 @@ errhint_plural(const char *fmt_singular, const char *fmt_plural,
 	return 0;	/* return value does not matter */
 }
 
+int
+errtag(const char *tag, const char *fmt_value,...)
+{
+	ErrorData  *edata = [errordata_stack_depth];
+	ErrorTag   *etag;
+	MemoryContext oldcontext;
+	StringInfoData buf;
+
+	recursion_depth++;
+	CHECK_STACK_DEPTH();
+	oldcontext = MemoryContextSwitchTo(edata->assoc_context);
+	etag = palloc(sizeof(ErrorTag));
+	etag->tagname = tag;
+	initStringInfo();
+	for (;;)
+	{
+		va_list		args;
+		int			needed;
+
+		errno = edata->saved_errno;
+		va_start(args, fmt_value);
+		needed = appendStringInfoVA(, fmt_value, args);
+		va_end(args);
+		if (needed == 0)
+			break;
+		enlargeStringInfo(, needed);
+	}
+	etag->tagvalue = pstrdup(buf.data);
+	edata->tags = lappend(edata->tags, etag);
+	pfree(buf.data);
+	MemoryContextSwitchTo(oldcontext);
+	recursion_depth--;
+	return 0;
+}
+
 
 /*
  * errcontext_msg --- add a context error message text to the current error
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index f53607e12e..1c490d1b11 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -15,6 +15,7 @@
 #define ELOG_H
 
 #include 
+#include "nodes/pg_list.h"
 
 /* Error level codes */
 #define DEBUG5		10			/* Debugging messages, in categories of
@@ -193,6 +194,8 @@ extern int	errhint(const char *fmt,...) pg_attribute_printf(1, 2);
 extern int	errhint_plural(const char *fmt_singular, const char *fmt_plural,
 		   unsigned long n,...) pg_attribute_printf(1, 4) pg_attribute_printf(2, 4);
 
+extern int	errtag(const char *tag, const char *fmt_value,...) pg_attribute_printf(2, 3);
+
 /*
  * errcontext() is typically called in error context callback functions, not
  * within an ereport() invocation. The callback function can be in a different
@@ -395,11 +398,18 @@ typedef struct ErrorData
 	int			internalpos;	/* cursor index into internalquery */
 	char	   *internalquery;	/* text of internally-generated query */
 	

Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread Julien Rouhaud
On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote:
> On Sun, Aug 22, 2021 at 09:19:42AM -0400, Tom Lane wrote:
> > 
> > Uh, no, it's exactly *not* clear.  There are a lot of GUCs that are only
> > of interest to particular subsystems.  I do not see why being a GUC makes
> > something automatically more interesting than any other global variable.
> > Usually, the fact that one is global is only so the GUC machinery itself
> > can get at it, otherwise it'd be static in the owning module.
> > 
> > As for "extensions should be able to get at the values", the GUC machinery
> > already provides uniform mechanisms for doing that safely.  Direct access
> > to the variable's internal value would be unsafe in many cases.
> 
> Then shouldn't we try to prevent direct access on all platforms rather than
> only one?

So since the non currently explicitly exported GUC global variables shouldn't
be accessible by third-party code, I'm attaching a POC patch that does the
opposite of v1: enforce that restriction using a new pg_attribute_hidden()
macro, defined with GCC only, to start discussing that topic.

It would probably be better to have some other macro (e.g. PG_GLOBAL_PUBLIC and
PG_GLOBAL_PRIVATE or similar) to make declarations more consistent, but given
the amount of changes it would represent I prefer to have some feedback before
spending time on that.
>From b43c48a80f985e879a88d9d270fc1e2b283b5732 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sun, 22 Aug 2021 15:40:47 +0800
Subject: [PATCH v2] Make all GUC ariables non previously marked as PGDLLIMPORT
 private.

---
 src/backend/utils/misc/guc.c  | 18 +++---
 src/include/access/gin.h  |  2 +-
 src/include/access/tableam.h  |  4 +-
 src/include/access/toast_compression.h|  2 +-
 src/include/access/xact.h | 12 ++--
 src/include/access/xlog.h | 78 +++
 src/include/c.h   |  7 ++
 src/include/catalog/namespace.h   |  2 +-
 src/include/catalog/storage.h |  2 +-
 src/include/commands/async.h  |  2 +-
 src/include/commands/user.h   |  2 +-
 src/include/commands/vacuum.h | 12 ++--
 src/include/fmgr.h|  2 +-
 src/include/jit/jit.h | 20 +++---
 src/include/libpq/auth.h  |  4 +-
 src/include/libpq/libpq.h | 24 +++
 src/include/libpq/pqcomm.h|  2 +-
 src/include/miscadmin.h   | 22 +++
 src/include/optimizer/geqo.h  | 10 +--
 src/include/optimizer/optimizer.h |  4 +-
 src/include/optimizer/planmain.h  |  6 +-
 src/include/parser/parse_expr.h   |  2 +-
 src/include/parser/parser.h   |  4 +-
 src/include/pgstat.h  |  6 +-
 src/include/pgtime.h  |  2 +-
 src/include/postmaster/autovacuum.h   | 30 -
 src/include/postmaster/bgwriter.h |  8 +--
 src/include/postmaster/postmaster.h   | 28 
 src/include/postmaster/syslogger.h| 10 +--
 src/include/postmaster/walwriter.h|  4 +-
 src/include/replication/logicallauncher.h |  4 +-
 src/include/replication/syncrep.h |  2 +-
 src/include/replication/walreceiver.h |  6 +-
 src/include/replication/walsender.h   |  6 +-
 src/include/storage/bufmgr.h  | 20 +++---
 src/include/storage/dsm_impl.h|  4 +-
 src/include/storage/fd.h  |  2 +-
 src/include/storage/large_object.h|  2 +-
 src/include/storage/lock.h| 12 ++--
 src/include/storage/lwlock.h  |  2 +-
 src/include/storage/pg_shmem.h|  6 +-
 src/include/storage/predicate.h   |  6 +-
 src/include/storage/proc.h|  2 +-
 src/include/storage/standby.h |  8 +--
 src/include/tcop/tcopprot.h   |  6 +-
 src/include/tsearch/ts_cache.h|  2 +-
 src/include/utils/array.h |  2 +-
 src/include/utils/builtins.h  |  2 +-
 src/include/utils/bytea.h |  2 +-
 src/include/utils/elog.h  | 10 +--
 src/include/utils/guc.h   | 62 +-
 src/include/utils/pg_locale.h |  8 +--
 src/include/utils/plancache.h |  2 +-
 src/include/utils/ps_status.h |  2 +-
 src/include/utils/queryjumble.h   |  2 +-
 src/include/utils/rls.h   |  2 +-
 src/include/utils/xml.h   |  4 +-
 57 files changed, 263 insertions(+), 256 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a2e0f8de7e..ca2d6042cd 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -129,20 +129,20 @@
 #define REALTYPE_PRECISION 17
 
 /* XXX these should appear in other modules' header files */
-extern bool Log_disconnections;
-extern int	

Re: pg_veryfybackup can fail with a valid backup for TLI > 1

2021-08-23 Thread Michael Paquier
On Fri, Aug 20, 2021 at 04:47:15PM +0900, Kyotaro Horiguchi wrote:
> Yes, backup_label looks correct.
> 
> backup_label (extract):
> START WAL LOCATION: 0/528 (file 00020005)
> CHECKPOINT LOCATION: 0/560
> START TIMELINE: 2

Okay.  I have worked on that today, did more manual tests, and applied
this fix down to 13.  The cherry-pick from 14 to 13 only required a
s/$master/$primary/ in the test, which was straight-forward.  Your
patch for 13 did that though:
-   if (starttli != entry->tli)
+   if (!XLogRecPtrIsInvalid(entry->begin))

So it would have caused a failure with parent TLIs that have a correct
begin location, but we expect the opposite.  The patch for 14/HEAD had
that right.
--
Michael


signature.asc
Description: PGP signature


Re: Some leftovers of recent message cleanup?

2021-08-23 Thread Kyotaro Horiguchi
At Fri, 20 Aug 2021 19:36:02 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/08/20 11:53, Kyotaro Horiguchi wrote:
> > At Thu, 19 Aug 2021 20:29:42 +0900, Fujii Masao
> >  wrote in
> >> On 2021/08/19 17:03, Kyotaro Horiguchi wrote:
> >>> Hello.
> >>> While I was examining message translation for PG14, I found some
> >>> messages that would need to be fixed.
> >>> 0001 is a fix for perhaps-leftovers of the recent message cleanups
> >>> related to "positive integer"(fd90f6ba7a).
> >>
> >> There are still other many messages using "positive" and "negative"
> >> keywords.
> >> We should also fix them at all?
> > I'm not sure, or no if anything. My main point here is not to avoid
> > use of such kind of words, but reducing variations of the effectively
> > the same message from the view of translator burden. The two messages
> > in 0001 are in that category. I noticed those messages accidentally. I
> > don't think they are the only instance of such divergence, but I'm not
> > going to do a comprehensive examination of such divergences.. (Or do I
> > need to check that more comprehensively?)
> 
> 
> Understood.
> 
> -  errmsg("modulus for hash partition 
> must be a positive
> -  integer")));
> + errmsg("modulus for hash partition must be an integer greater than
> zero")));
> 
> "an integer greater" should be "an integer value greater" because
> we use that words in other similar log messages like "modulus for
> hash partition must be an integer value greater than zero"
> in src/backend/partitioning/partbounds.c?

Ugh... Of course. I thought I did that way but the actual file is
incorrect...  Fixed that in the attached.

> I'm thinking to back-patch this to v11 where hash partitioning
> was supported. On the other hand, the following change should

If I'm not missing something, back to 13, where the "word change" took
place?  For 11 and 12, we need to change the *both* messages.


v11, 12
./src/backend/parser/parse_utilcmd.cer")));
Binary file ./src/backend/parser/gram.o matches
./src/backend/partitioning/partbounds.cpg/preproc/ecpg.header
@@ -596,7 +596,7 @@ check_declared_list(const char *name)
{
if (connection)
if (connection && strcmp(ptr->connection, connection) 
!= 0)
-   mmerror(PARSE_ERROR, ET_WARNING, "connection %s 
is overwritten with %s by declare statement %s.", connection, ptr->connection, 
name);
+   mmerror(PARSE_ERROR, ET_WARNING, "connection %s 
is overwritten with %s by DECLARE statement %s", connection, ptr->connection, 
name);
connection = mm_strdup(ptr -> connection);
return true;
}
-- 
2.27.0



  1   2   >