Re: Support for NSS as a libpq TLS backend

2021-02-03 Thread Michael Paquier
On Tue, Feb 02, 2021 at 08:33:35PM +, Jacob Champion wrote:
> Note that this changes the error message printed during the invalid-
> root tests, because NSS is now sending the root of the chain. So the
> server's issuer is considered untrusted rather than unrecognized.

I think that it is not a good idea to attach the since-v*.diff patches
into the threads.  This causes the CF bot to fail in applying those
patches.

Could it be possible to split 0001 into two parts at least with one
patch that includes the basic changes for the build and ./configure,
and a second with the FE/BE changes?
--
Michael


signature.asc
Description: PGP signature


Re: Correct comment in StartupXLOG().

2021-02-03 Thread Dilip Kumar
On Thu, Feb 4, 2021 at 9:39 AM Amul Sul  wrote:
>
> On Thu, Feb 4, 2021 at 6:18 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Wed, 3 Feb 2021 16:36:13 +0530, Amul Sul  wrote in
> > > On Wed, Feb 3, 2021 at 2:48 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Feb 3, 2021 at 2:28 PM Amul Sul  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > SharedRecoveryState member of XLogCtl is no longer a boolean flag, 
> > > > > got changes
> > > > > in 4e87c4836ab9 to enum but, comment referring to it still referred 
> > > > > as the
> > > > > boolean flag which is pretty confusing and incorrect.
> > > >
> > > > +1 for the comment change
> >
> > Actually the "flag" has been changed to an integer (emnum), so it
> > needs a change. However, the current proposal:
> >
> >  * Now allow backends to write WAL and update the control file 
> > status in
> > -* consequence.  The boolean flag allowing backends to write WAL is
> > +* consequence.  The recovery state allowing backends to write WAL 
> > is
> >  * updated while holding ControlFileLock to prevent other backends 
> > to look
> >
> > Looks somewhat strange. The old booean had a single task to allow
> > backends to write WAL but the current state has multple states that
> > controls recovery progress. So I thnink it needs a further change.
> >
> > ===
> >  Now allow backends to write WAL and update the control file status in
> >  consequence.  The recovery state is updated to allow backends to write
> >  WAL, while holding ControlFileLock to prevent other backends to look
> >  at an inconsistent state of the control file in shared memory.
> > ===
> >
>
> This looks more accurate, added the same in the attached version. Thanks for 
> the
> correction.

Looks good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

2021-02-03 Thread Kohei KaiGai
Hello,

I noticed that CheckAttributeNamesTypes() prevents to create a table that has
more than MaxHeapAttributeNumber (1600) columns, for foreign-table also.
IIUC, this magic number comes from length of the null-bitmap can be covered
with t_hoff in HeapTupleHeaderData.
For heap-tables, it seems to me a reasonable restriction to prevent overrun of
null-bitmap. On the other hand, do we have proper reason to apply same
restrictions on foreign-tables also?

Foreign-tables have their own unique internal data structures instead of
the PostgreSQL's heap-table, and some of foreign-data can have thousands
attributes in their structured data.
I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
tables. How about your opinions?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: Typo in tablesync comment

2021-02-03 Thread Michael Paquier
On Thu, Feb 04, 2021 at 10:50:11AM +1100, Peter Smith wrote:
> OK. I attached an updated patch using this new text.

Thanks, applied.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-03 Thread Joel Jacobson
On Tue, Feb 2, 2021, at 13:51, Mark Rofail wrote:
>Array-ELEMENT-foreign-key-v17.patch

When working on my pit tool, I found another implicit type casts problem.

First an example to show a desired error message:

CREATE TABLE a (
a_id smallint,
PRIMARY KEY (a_id)
);

CREATE TABLE b (
b_id bigint,
a_ids text[],
PRIMARY KEY (b_id)
);

ALTER TABLE b ADD FOREIGN KEY (EACH ELEMENT OF a_ids) REFERENCES a;

The below error message is good:

ERROR:  foreign key constraint "b_a_ids_fkey" cannot be implemented
DETAIL:  Key column "a_ids" has element type text which does not have a default 
btree operator class that's compatible with class "int2_ops".

But if we instead make a_ids a bigint[], we don't get any error:

DROP TABLE b;

CREATE TABLE b (
b_id bigint,
a_ids bigint[],
PRIMARY KEY (b_id)
);

ALTER TABLE b ADD FOREIGN KEY (EACH ELEMENT OF a_ids) REFERENCES a;

No error, even though bigint[] isn't compatible with smallint.

We do get an error when trying to insert into the table:

INSERT INTO a (a_id) VALUES (1);
INSERT INTO b (b_id, a_ids) VALUES (2, ARRAY[1]);

ERROR:  operator does not exist: smallint[] pg_catalog.<@ bigint[]
LINE 1: ..."."a" x WHERE ARRAY ["a_id"]::pg_catalog.anyarray OPERATOR(p...
 ^
HINT:  No operator matches the given name and argument types. You might need to 
add explicit type casts.
QUERY:  SELECT 1 WHERE (SELECT pg_catalog.count(DISTINCT y) FROM 
pg_catalog.unnest($1) y) OPERATOR(pg_catalog.=) (SELECT pg_catalog.count(*) 
FROM (SELECT 1 FROM ONLY "public"."a" x WHERE ARRAY 
["a_id"]::pg_catalog.anyarray OPERATOR(pg_catalog.<@) $1::pg_catalog.anyarray 
FOR KEY SHARE OF x) z)

I wonder if we can come up with some general way to detect these
problems already at constraint creation time,
instead of having to wait for data to get the error,
similar to why compile time error are preferred over run time errors.

/Joel

Re: Removing support for COPY FROM STDIN in protocol version 2

2021-02-03 Thread Michael Paquier
On Wed, Feb 03, 2021 at 11:29:37AM -0500, Tom Lane wrote:
> Then let's kill it dead, server and libpq both.

Yeah.
--
Michael


signature.asc
Description: PGP signature


Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Michael Paquier
On Thu, Feb 04, 2021 at 11:30:09AM +0530, Bharath Rupireddy wrote:
> But sometimes we may also have to debug postmaster code, on different
> platforms maybe. I don't know how the postmaster pid from the user
> perspective will be useful in customer environments and I can't think
> of other usages of the pg_postgres_pid().

I had the same question as Tomas in mind when reading this thread, and
the use case you are mentioning sounds limited to me.  Please note
that you can already get this information by using pg_read_file() on
postmaster.pid so I see no need for an extra function.
--
Michael


signature.asc
Description: PGP signature


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-02-03 Thread Michael Paquier
On Wed, Feb 03, 2021 at 07:54:42PM +0900, Michael Paquier wrote:
> Not sure I like that.  Here is a proposal:
> "it is recommended to separately use ALTER TABLE ONLY on them so as
> any new partitions attached inherit the new tablespace value."

So, I have done more work on this stuff today, and applied that as of
c5b2860.  While reviewing my changes, I have noticed that I have
managed to break ALTER TABLE SET TABLESPACE which would have failed
when cascading to a toast relation, the extra check placed previously
in CheckRelationTableSpaceMove() being incorrect.  The most surprising
part was that we had zero in-core tests to catch this mistake, so I
have added an extra test to cover this scenario while on it.

A second thing I have come back to is allow_system_table_mods for
toast relations, and decided to just forbid TABLESPACE if attempting
to use it directly on a system table even if allow_system_table_mods
is true.  This was leading to inconsistent behaviors and weirdness in
the concurrent case because all the indexes are processed in series
after building a list.  As we want to ignore the move of toast indexes
when moving the indexes of the parent table, this was leading to extra
conditions that are not really worth supporting after thinking about
it.  One other issue was the lack of consistency when using pg_global
that was a no-op for the concurrent case but failed in the 
non-concurrent case.  I have put in place more regression tests for
all that.

Regarding the VACUUM and CLUSTER cases, I am not completely sure if
going through these for a tablespace case is the best move we can do,
as ALTER TABLE is able to mix multiple operations and all of them
require already an AEL to work.  REINDEX was different thanks to the
case of CONCURRENTLY.  Anyway, as a lot of work has been done here
already, I would recommend to create new threads about those two
topics.  I am also closing this patch in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Bharath Rupireddy
On Thu, Feb 4, 2021 at 3:27 AM Tom Lane  wrote:
>
> Tomas Vondra  writes:
> > On 2/3/21 4:08 PM, Tom Lane wrote:
> >> I'm disinclined to think that this is a good idea from a security
> >> perspective.  Maybe if it's superuser-only it'd be ok (since a
> >> superuser would have other routes to discovering the value anyway).
>
> > Is the postmaster PID really sensitive? Users with OS access can just
> > list the processes, and for users without OS access / privileges it's
> > mostly useless, no?
>
> We disallow ordinary users from finding out the data directory location,
> even though that should be equally useless to unprivileged users.  The
> postmaster PID seems like the same sort of information.  It does not
> seem like a non-administrator could have any but nefarious use for that
> value.  (Admittedly, this argument is somewhat weakened by exposing
> child processes' PIDs ... but you can't take down the whole installation
> by zapping a child process.)
>
> I'm basically in the same place you are in your other response: the
> question to ask is not "why not allow this?", but "why SHOULD we allow
> this?"

If we still think that the new function pg_postgres_pid() is useful in
some ways to the users or developers, then we can have it as a
superuser only function and error out for non-super users.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Bharath Rupireddy
On Thu, Feb 4, 2021 at 2:39 AM Tomas Vondra
 wrote:
>
> On 2/3/21 7:12 AM, Bharath Rupireddy wrote:
> > Hi,
> >
> > Can we have a new function, say pg_postgres_pid(), to return
> > postmaster PID similar to pg_backend_pid()? At times, it will be
>
> Curious question - why do you actually need PID of the postmaster? For
> debugging, I'd say it's not quite necessary - you can just attach a
> debugger to the backend and print the PostmasterPid directly. Or am I
> missing something?

But sometimes we may also have to debug postmaster code, on different
platforms maybe. I don't know how the postmaster pid from the user
perspective will be useful in customer environments and I can't think
of other usages of the pg_postgres_pid().

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: a curious case of force_parallel_mode = on with jit'ing

2021-02-03 Thread Amit Langote
On Thu, Feb 4, 2021 at 1:41 PM Tom Lane  wrote:
> Amit Langote  writes:
> > On Thu, Feb 4, 2021 at 1:08 AM Tom Lane  wrote:
> >> Works for me on HEAD (using RHEL8.3, gcc 8.3.1, LLVM 10.0.1).
>
> > Thanks for checking.  Must be my LLVM setup I guess:
>
> > $ llvm-config --version
> > 7.0.1
> > $ cat /etc/redhat-release
> > CentOS Linux release 7.7.1908 (Core)
> > $ gcc --version
> > gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2)
>
> Hmmm ... seems like an odd combination to have a newer gcc and an
> older LLVM than what RHEL8 is shipping.  Is this really the current
> recommendation on CentOS 7?

Not an official combination.  At some point last year I decided to
install a more modern gcc than what CentOS 7 officially provides and
ended up getting them through a Software Collections (scl) package
called devtoolset-9.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: pg_replication_origin_drop API potential race condition

2021-02-03 Thread Amit Kapila
On Thu, Feb 4, 2021 at 9:57 AM Peter Smith  wrote:
>
> On Wed, Feb 3, 2021 at 11:17 PM Amit Kapila  wrote:
>
> >
> > How about if we call replorigin_by_name() inside replorigin_drop after
> > acquiring the lock? Wouldn't that close this race condition? We are
> > doing something similar for pg_replication_origin_advance().
> >
>
> Yes, that seems ok.
>
> I wonder if it is better to isolate that locked portion
> (replyorigin_by_name + replorigin_drop) so that in addition to being
> called from pg_replication_origin_drop, we can call it internally from
> PG code to safely drop the origins.
>

Yeah, I think that would be really good.

-- 
With Regards,
Amit Kapila.




Re: Is Recovery actually paused?

2021-02-03 Thread Dilip Kumar
On Mon, Feb 1, 2021 at 1:41 PM Dilip Kumar  wrote:
>
> On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar  
> > wrote in
> > > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar  wrote:
> > > >
> > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Thu, 28 Jan 2021 09:55:42 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > > > > Dilip Kumar  wrote:
> > > > > > > >
> > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > +1 to just show the recovery pause state in the output 
> > > > > > > > > > > > of
> > > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, 
> > > > > > > > > > > > when "is" exists
> > > > > > > > > > > > in a function, I expect a boolean output. Others may 
> > > > > > > > > > > > have better
> > > > > > > > > > > > thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Maybe we should leave the existing function 
> > > > > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > > > > alone and add a new one with the name you suggest that 
> > > > > > > > > > > returns text.
> > > > > > > > > > > That would create less burden for tool authors.
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > > > > >
> > > > > > > > This means pg_is_wal_replay_paused is left without any change 
> > > > > > > > and this
> > > > > > > > returns whether pause is requested or not? If so, it seems good 
> > > > > > > > to modify
> > > > > > > > the documentation of this function in order to note that this 
> > > > > > > > could not
> > > > > > > > return the actual pause state.
> > > > > > >
> > > > > > > Yes, we can say that it will return true if the replay pause is
> > > > > > > requested.  I am changing that in my new patch.
> > > > > >
> > > > > > I have modified the patch, changes
> > > > > >
> > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > > > > the pause request state
> > > > > > - Now, we are not waiting for the recovery to actually get paused 
> > > > > > so I
> > > > > > think it doesn't make sense to put a lot of checkpoints to check the
> > > > > > pause requested so I have removed that check from the
> > > > > > recoveryApplyDelay but I think it better we still keep that check in
> > > > > > the WaitForWalToBecomeAvailable because it can wait forever before 
> > > > > > the
> > > > > > next wal get available.
> > > > >
> > > > > I think basically the check in WaitForWalToBecomeAvailable is 
> > > > > independent
> > > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting 
> > > > > the
> > > > > actual pause state.  This function could just return 'pause requested'
> > > > > if a pause is requested during waiting for WAL.
> > > > >
> > > > > However, I agree the change to allow recovery to transit the state to
> > > > > 'paused' during WAL waiting because 'paused' has more useful 
> > > > > information
> > > > > for users than 'pause requested'.  Returning 'paused' lets users know
> > > > > clearly that no more WAL are applied until recovery is resumed.  On 
> > > > > the
> > > > > other hand, when 'pause requested' is returned, user can't say whether
> > > > > the next WAL wiill be applied or not from this information.
> > > > >
> > > > > For the same reason, I think it is also useful to call 
> > > > > recoveryPausesHere
> > > > > in recoveryApplyDelay.
> > > >
> > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> > > > available so it can not be controlled by user so it is good to put a
> > > > check for the recovery pause,  however recoveryApplyDelay wait for the
> > > > apply delay which is configured by user and it is predictable value by
> > > > the user.  I don't have much objection to putting that check in the
> > > > recoveryApplyDelay as well but I feel it is not necessary.  Any other
> > > > thoughts on this?
> > > >
> > > > > In addition, in RecoveryRequiresIntParameter, recovery should get 
> > > > > paused
> > > > > if a parameter value has a problem.  However, 
> > > > > pg_get_wal_replay_pause_state
> > > > > will return 'pause requested' in this case. So, I think, we should 
> > > > > pass
> > > > > RECOVERY_P

Re: a curious case of force_parallel_mode = on with jit'ing

2021-02-03 Thread Tom Lane
Amit Langote  writes:
> On Thu, Feb 4, 2021 at 1:08 AM Tom Lane  wrote:
>> Works for me on HEAD (using RHEL8.3, gcc 8.3.1, LLVM 10.0.1).

> Thanks for checking.  Must be my LLVM setup I guess:

> $ llvm-config --version
> 7.0.1
> $ cat /etc/redhat-release
> CentOS Linux release 7.7.1908 (Core)
> $ gcc --version
> gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2)

Hmmm ... seems like an odd combination to have a newer gcc and an
older LLVM than what RHEL8 is shipping.  Is this really the current
recommendation on CentOS 7?

regards, tom lane




Re: pg_replication_origin_drop API potential race condition

2021-02-03 Thread Peter Smith
On Wed, Feb 3, 2021 at 11:17 PM Amit Kapila  wrote:

>
> How about if we call replorigin_by_name() inside replorigin_drop after
> acquiring the lock? Wouldn't that close this race condition? We are
> doing something similar for pg_replication_origin_advance().
>

Yes, that seems ok.

I wonder if it is better to isolate that locked portion
(replyorigin_by_name + replorigin_drop) so that in addition to being
called from pg_replication_origin_drop, we can call it internally from
PG code to safely drop the origins.


Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Single transaction in the tablesync worker?

2021-02-03 Thread Ajin Cherian
On Wed, Feb 3, 2021 at 11:38 PM Amit Kapila  wrote:

> Thanks for the report. The problem here was that the error occurred
> when we were trying to copy the large data. Now, before fetching the
> entire data we issued a rollback that led to this problem. I think the
> alternative here could be to first fetch the entire data when the
> error occurred then issue the following commands. Instead, I have
> modified the patch to perform 'drop_replication_slot' in the beginning
> if the relstate is datasync.  Do let me know if you can think of a
> better way to fix this?

I have verified that the problem is not seen after this patch. I also
agree with the approach taken for the fix,

regards,
Ajin Cherian
Fujitsu Australia




Re: New IndexAM API controlling index vacuum strategies

2021-02-03 Thread Masahiko Sawada
On Tue, Feb 2, 2021 at 12:17 PM Peter Geoghegan  wrote:
>
> On Fri, Jan 29, 2021 at 5:26 PM Peter Geoghegan  wrote:
> > It'll be essential to have good instrumentation as we do more
> > benchmarking. We're probably going to have to make subjective
> > assessments of benchmark results, based on multiple factors. That will
> > probably be the only practical way to assess how much better (or
> > worse) the patch is compared to master. This patch is more about
> > efficiency and predictability than performance per se. Which is good,
> > because that's where most of the real world problems actually are.
>
> I've been thinking about how to get this patch committed for
> PostgreSQL 14. This will probably require cutting scope, so that the
> initial commit is not so ambitious. I think that "incremental VACUUM"
> could easily take up a lot of my time for Postgres 15, and maybe even
> Postgres 16.
>
> I'm starting to think that the right short term goal should not
> directly involve bottom-up index deletion. We should instead return to
> the idea of "unifying" the vacuum_cleanup_index_scale_factor feature
> with the INDEX_CLEANUP feature, which is kind of where this whole idea
> started out at. This short term goal is much more than mere
> refactoring. It is still a whole new user-visible feature. The patch
> would teach VACUUM to skip doing any real index work within both
> ambulkdelete() and amvacuumcleanup() in many important cases.
>

I agree to cut the scope. I've also been thinking about the impact of
this patch on users.

I also think we still have a lot of things to consider. For example,
we need to consider and evaluate how incremental vacuum works for
larger tuples or larger fillfactor, etc, and need to discuss more on
the concept of leaving LP_DEAD in the space left by fillfactor is a
good idea or not. Also, we need to discuss the changes in this patch
to nbtree. Since the bottom-up index deletion is a new code for PG14,
in a case where there is a problem in that, this feature could make
things worse since this feature uses it. Perhaps we would need some
safeguard and it also needs time. From that point of view, I think
it’s a good idea to introduce these features to a different major
version. Given the current situation, I agreed that 2 months is too
short to do all things.

> Here is a more detailed explanation:
>
> Today we can skip all significant work in ambulkdelete() and
> amvacuumcleanup() when there are zero dead tuples in the table. But
> why is the threshold *precisely* zero? If we could treat cases that
> have "practically zero" dead tuples in the same way (or almost the
> same way) as cases with *exactly* zero dead tuple, that's still a big
> improvement. And it still sets an important precedent that is crucial
> for the wider "incremental VACUUM" project: the criteria for
> triggering index vacuuming becomes truly "fuzzy" for the first time.
> It is "fuzzy" in the sense that index vacuuming might not happen
> during VACUUM at all now, even when the user didn't explicitly use
> VACUUUM's INDEX_CLEANUP option, and even when more than *precisely*
> zero dead index tuples are involved (though not *much* more than zero,
> can't be too aggressive). That really is a big change.
>
> A recap on vacuum_cleanup_index_scale_factor, just to avoid confusion:
>
> The reader should note that this is very different to Masahiko's
> vacuum_cleanup_index_scale_factor project, which skips *cleanup* in
> VACUUM (not bulk delete), a question which only comes up when there
> are definitely zero dead index tuples. The unifying work I'm talking
> about now implies that we completely avoid scanning indexes during
> vacuum, even when they are known to have at least a few dead index
> tuples, and even when VACUUM's INDEX_CLEANUP emergency option is not
> in use. Which, as I just said, is a big change.

If vacuum skips both ambulkdelete and amvacuumcleanup in that case,
I'm concerned that this could increase users who are affected by the
known issue of leaking deleted pages. Currently, users who are
affected by that is only users who use INDEX_CLEANUP off. But if we
enable this feature by default, all users potentially are affected by
that issue.

>
> Thoughts on triggering criteria for new "unified" design, ~99.9%
> append-only tables:
>
> Actually, in *one* sense the difference between "precisely zero" and
> "practically zero" here *is* small. But it's still probably going to
> result in skipping reading indexes during VACUUM in many important
> cases. Like when you must VACUUM a table that is ~99.9% append-only.
> In the real world things are rarely in exact discrete categories, even
> when we imagine that they are. It's too easy to be wrong about one
> tiny detail -- like one tiny UPDATE from 4 weeks ago, perhaps. Having
> a tiny amount of "forgiveness" here is actually a *huge* improvement
> on having precisely zero forgiveness. Small and big.
>
> This should help cases that get big surprising spikes due to
> anti-w

Re: a curious case of force_parallel_mode = on with jit'ing

2021-02-03 Thread Amit Langote
On Thu, Feb 4, 2021 at 1:08 AM Tom Lane  wrote:
>
> Amit Langote  writes:
> > Has anybody seen this:
>
> Works for me on HEAD (using RHEL8.3, gcc 8.3.1, LLVM 10.0.1).

Thanks for checking.  Must be my LLVM setup I guess:

$ llvm-config --version
7.0.1
$ cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
$ gcc --version
gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2)

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Correct comment in StartupXLOG().

2021-02-03 Thread Amul Sul
On Thu, Feb 4, 2021 at 6:18 AM Kyotaro Horiguchi
 wrote:
>
> At Wed, 3 Feb 2021 16:36:13 +0530, Amul Sul  wrote in
> > On Wed, Feb 3, 2021 at 2:48 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Feb 3, 2021 at 2:28 PM Amul Sul  wrote:
> > > >
> > > > Hi,
> > > >
> > > > SharedRecoveryState member of XLogCtl is no longer a boolean flag, got 
> > > > changes
> > > > in 4e87c4836ab9 to enum but, comment referring to it still referred as 
> > > > the
> > > > boolean flag which is pretty confusing and incorrect.
> > >
> > > +1 for the comment change
>
> Actually the "flag" has been changed to an integer (emnum), so it
> needs a change. However, the current proposal:
>
>  * Now allow backends to write WAL and update the control file status 
> in
> -* consequence.  The boolean flag allowing backends to write WAL is
> +* consequence.  The recovery state allowing backends to write WAL is
>  * updated while holding ControlFileLock to prevent other backends to 
> look
>
> Looks somewhat strange. The old booean had a single task to allow
> backends to write WAL but the current state has multple states that
> controls recovery progress. So I thnink it needs a further change.
>
> ===
>  Now allow backends to write WAL and update the control file status in
>  consequence.  The recovery state is updated to allow backends to write
>  WAL, while holding ControlFileLock to prevent other backends to look
>  at an inconsistent state of the control file in shared memory.
> ===
>

This looks more accurate, added the same in the attached version. Thanks for the
correction.

Regards,
Amul
From e72137d632afd4c916da20d56491f782d62b605e Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 3 Feb 2021 23:03:20 -0500
Subject: [PATCH] Correct code comment in StartupXLOG

---
 src/backend/access/transam/xlog.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f03bd473e2b..eee664597ea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7998,17 +7998,16 @@ StartupXLOG(void)
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
-	 * consequence.  The boolean flag allowing backends to write WAL is
-	 * updated while holding ControlFileLock to prevent other backends to look
+	 * consequence.  The recovery state is updated to allow backends to write
+	 * WAL while holding ControlFileLock to prevent other backends to look
 	 * at an inconsistent state of the control file in shared memory.  There
 	 * is still a small window during which backends can write WAL and the
 	 * control file is still referring to a system not in DB_IN_PRODUCTION
 	 * state while looking at the on-disk control file.
 	 *
-	 * Also, although the boolean flag to allow WAL is probably atomic in
-	 * itself, we use the info_lck here to ensure that there are no race
-	 * conditions concerning visibility of other recent updates to shared
-	 * memory.
+	 * Also, we use the info_lck to update the recovery state to ensure that
+	 * there are no race conditions concerning visibility of other recent
+	 * updates to shared memory.
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 	ControlFile->state = DB_IN_PRODUCTION;
-- 
2.18.0



Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-02-03 Thread Bharath Rupireddy
On Wed, Feb 3, 2021 at 4:22 PM Fujii Masao  wrote:
> Maybe my explanation in the previous email was unclear. What I think is; If 
> the server-level option is explicitly specified, its setting is used whatever 
> GUC is. On the other hand, if the server-level option is NOT specified, GUC 
> setting is used. For example, if we define the server as follows, GUC setting 
> is used because the server-level option is NOT specified.
>
>  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres;
>
> If we define the server as follows, the server-level setting is used.
>
>  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres OPTIONS 
> (keep_connections 'on');

Attaching v20 patch set. Now, server level option if provided
overrides the GUC.The GUC will be used only if server level option is
not provided. And also, both server level option and GUC are named the
same - "keep_connections".

Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v20-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data


v20-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data


typo in "Determine XID horizons" comment in procarray.c

2021-02-03 Thread Jaime Casanova
Hi,

Second paragraph of this comment (procarray.c:1604) says:
* See the definition of ComputedXidHorizonsResult for the various computed

It should say ComputeXidHorizonsResult (it has an extra "d" in Computed)

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: Multiple full page writes in a single checkpoint?

2021-02-03 Thread Bruce Momjian
On Wed, Feb  3, 2021 at 08:07:16PM -0500, Bruce Momjian wrote:
> > > I can try to add a hint-bit-page-write page counter, but that might
> > > overflow, and then we will need a way to change the LSN anyway.
> > 
> > That's just a question of width...
> 
> Yeah, the hint bit counter is just delaying the inevitable, plus it
> changes the page format, which I am trying to avoid.  Also, I need this
> dummy record only if the page is marked clean, meaning a write
> to the file system already happened in the current checkpoint --- should
> not be to bad.

Here is a proof-of-concept patch to do this.  Thanks for your help.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

>From 11cb0ca7405a74105602fd63766e88c1fea8ea30 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Wed, 3 Feb 2021 22:25:39 -0500
Subject: [PATCH] hint squash commit

---
 src/backend/access/rmgrdesc/xlogdesc.c   |  6 +-
 src/backend/access/transam/xlog.c|  4 
 src/backend/access/transam/xloginsert.c  | 26 
 src/backend/replication/logical/decode.c |  1 +
 src/backend/storage/buffer/bufmgr.c  | 18 
 src/include/access/xloginsert.h  |  2 ++
 src/include/catalog/pg_control.h |  1 +
 7 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..16a967bb71 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -80,7 +80,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 
 		appendStringInfoString(buf, xlrec->rp_name);
 	}
-	else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT)
+	else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
+			 info == XLOG_HINT_LSN)
 	{
 		/* no further information to print */
 	}
@@ -185,6 +186,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_HINT_LSN:
+			id = "HINT_LSN";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f03bd473e2..f67316ee07 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10260,6 +10260,10 @@ xlog_redo(XLogReaderState *record)
 			UnlockReleaseBuffer(buffer);
 		}
 	}
+	else if (info == XLOG_HINT_LSN)
+	{
+		/* nothing to do here */
+	}
 	else if (info == XLOG_BACKUP_END)
 	{
 		XLogRecPtr	startpoint;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 7052dc245e..eebca0fc16 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -980,6 +980,32 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 	return recptr;
 }
 
+/*
+ * On the first hint bit change during a checkpoint, XLogSaveBufferForHint()
+ * writes a full page image to the WAL and returns a new LSN which can be
+ * assigned to the page.  Cluster file encryption needs a new LSN for
+ * every hint bit page write because the LSN is used in the nonce, and
+ * the nonce must be unique.  Therefore, this routine just advances the LSN
+ * so the page can be assigned a new LSN.
+ *
+ * Callable while holding just share lock on the buffer content.
+ *
+ * The LSN of the inserted wal record is returned if we had to write,
+ * InvalidXLogRecPtr otherwise.
+ *
+ * It is possible that multiple concurrent backends could attempt to write WAL
+ * records. In that case, multiple copies of the same block would be recorded
+ * in separate WAL records by different backends, though that is still OK from
+ * a correctness perspective.
+ */
+XLogRecPtr
+XLogLSNForHint(void)
+{
+	XLogBeginInsert();
+
+	return XLogInsert(RM_XLOG_ID, XLOG_HINT_LSN);
+}
+
 /*
  * Write a WAL record containing a full image of a page. Caller is responsible
  * for writing the page to disk after calling this routine.
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..2ada89c6b1 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_HINT_LSN:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..fd59f9fc1d 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3865,6 +3865,24 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 		{
 			dirtied = true;		/* Means "will be dirtied by this action" */
 
+			/*
+			 * If the buffer is clean, and lsn is valid, it must be the
+			 * first hint bit change of the checkpoint (the old p

Re: Recording foreign key relationships for the system catalogs

2021-02-03 Thread Joel Jacobson
On Wed, Feb 3, 2021, at 21:41, Joel Jacobson wrote:
>Otherwise I think it would be more natural to change both is_array and is_opt
>to boolean[] with the same cardinality as fkcols and pkcols,
>to allow unnest()ing of them as well.

Another option would perhaps be to add a new
system view in src/backend/catalog/system_views.sql

I see there are other cases with a slightly more complex view
using a function with a similar name, such as
the pg_stat_activity using pg_stat_get_activity().

Similar to this, maybe we could add a pg_catalog_foreign_keys view
using the output from pg_get_catalog_foreign_keys():

Example usage:

SELECT * FROM pg_catalog_foreign_keys
WHERE fktable = 'pg_constraint'::regclass
AND pktable = 'pg_attribute'::regclass;

fkid |fktable|   fkcol   |   pktable|  pkcol   | is_array | is_opt 
| ordinal_position
--+---+---+--+--+--++--
   48 | pg_constraint | conkey| pg_attribute | attnum   | t| t  
|1
   48 | pg_constraint | conrelid  | pg_attribute | attrelid | f| f  
|2
   49 | pg_constraint | confkey   | pg_attribute | attnum   | t| f  
|1
   49 | pg_constraint | confrelid | pg_attribute | attrelid | f| f  
|2
(4 rows)

The point of this would be to avoid unnecessary increase of data model 
complexity,
which I agree is not needed, since we only need single booleans as of today,
but to provide a more information_schema-like system view,
i.e. with columns on separate rows, with ordinal_position.

Since we don't have any "constraint_name" for these,
we need to enumerate the fks first, to let ordinal_position
be the position within each such fkid.

Here is my proposal on how to implement:

CREATE VIEW pg_catalog_foreign_keys AS
WITH
enumerate_fks AS (
SELECT
*,
ROW_NUMBER() OVER () AS fkid
FROM pg_catalog.pg_get_catalog_foreign_keys()
),
unnest_cols AS (
SELECT
C.fkid,
C.fktable,
unnest(C.fkcols) AS fkcol,
C.pktable,
unnest(C.pkcols) AS pkcol,
unnest(
CASE cardinality(fkcols)
WHEN 1 THEN ARRAY[C.is_array]
WHEN 2 THEN ARRAY[FALSE,C.is_array]
END
) AS is_array,
unnest(
CASE cardinality(fkcols)
WHEN 1 THEN ARRAY[C.is_opt]
WHEN 2 THEN ARRAY[FALSE,C.is_opt]
END
) AS is_opt
FROM enumerate_fks AS C
)
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY U.fkid
ORDER BY U.fkcol, U.pkcol
) AS ordinal_position
FROM unnest_cols AS U;

I think both the pg_get_catalog_foreign_keys() function
and this view are useful in different ways,
so it's good to provide both.

Only providing pg_get_catalog_foreign_keys() will
arguably mean some users of the function will need to implement
something like the same as above on their own, if they need the is_array and 
is_opt
value for a specific fkcol.

/Joel

Re: Is it useful to record whether plans are generic or custom?

2021-02-03 Thread Kyotaro Horiguchi
At Thu, 04 Feb 2021 10:16:47 +0900, torikoshia  
wrote in 
> Chengxi Sun, Yamada-san, Horiguchi-san,
> 
> Thanks for all your comments.
> Adding only the number of generic plan execution seems acceptable.
> 
> On Mon, Jan 25, 2021 at 2:10 PM Kyotaro Horiguchi
>  wrote:
> > Note that ActivePortal is the closest nested portal. So it gives the
> > wrong result for nested portals.
> 
> I may be wrong, but I thought it was ok since the closest nested
> portal is the portal to be executed.

After executing the inner-most portal, is_plan_type_generic has a
value for the inner-most portal and it won't be changed ever after. At
the ExecutorEnd of all the upper-portals see the value for the
inner-most portal left behind is_plan_type_generic nevertheless the
portals at every nest level are indenpendent.

> ActivePortal is used in ExecutorStart hook in the patch.
> And as far as I read PortalStart(), ActivePortal is changed to the
> portal to be executed before ExecutorStart().
> 
> If possible, could you tell me the specific case which causes wrong
> results?

Running a plpgsql function that does PREPRE in a query that does
PREPARE?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Bug in COPY FROM backslash escaping multi-byte chars

2021-02-03 Thread Kyotaro Horiguchi
At Wed, 3 Feb 2021 15:46:30 +0200, Heikki Linnakangas  wrote 
in 
> On 03/02/2021 15:38, John Naylor wrote:
> > On Wed, Feb 3, 2021 at 8:08 AM Heikki Linnakangas  > > wrote:
> >  >
> >  > Hi,
> >  >
> >  > While playing with COPY FROM refactorings in another thread, I noticed
> >  > corner case where I think backslash escaping doesn't work correctly.
> >  > Consider the following input:
> >  >
> >  > \么.foo
> > I've seen multibyte delimiters in the wild, so it's not as outlandish
> > as it seems.
> 
> We don't actually support multi-byte characters as delimiters or quote
> or escape characters:
> 
> postgres=# copy copytest from 'foo' with (delimiter '么');
> ERROR:  COPY delimiter must be a single one-byte character
> 
> > The fix is simple enough, so +1.
> 
> Thanks, I'll commit and backpatch shortly.

I'm not sure the assumption in the second hunk always holds, but
that's fine at least with Shift-JIS and -2004 since they are two-byte
encoding.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


RE: libpq debug log

2021-02-03 Thread tsunakawa.ta...@fujitsu.com
From: 'Alvaro Herrera' 
> > (41)
> > +void
> > +PQtraceEx(PGconn *conn, FILE *debug_port, int flags)
> > +{
> 
> I'm not really sure about making this a separate API call. We could just
> make it PQtrace() and increment the libpq so version.  I don't think
> it's a big deal, frankly.

If we change the function signature, we have to bump the so major version and 
thus soname.  The libpq's current so major version is 5, which hasn't been 
changed since 2006.  I'm hesitant to change it for this feature.  If you think 
we can bump the version to 6, I think we can go.


https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html
--
try to make sure that your libraries are either backwards-compatible or that 
you've incremented the version number in the soname every time you make an 
incompatible change.

When a new version of a library is binary-incompatible with the old one the 
soname needs to change. In C, there are four basic reasons that a library would 
cease to be binary compatible:

1. The behavior of a function changes so that it no longer meets its original 
specification,

2. Exported data items change (exception: adding optional items to the ends of 
structures is okay, as long as those structures are only allocated within the 
library).

3. An exported function is removed.

4. The interface of an exported function changes.
--



Regards
Takayuki Tsunakawa




possibly outdated pg_file_read() errhint

2021-02-03 Thread Mark Dilger
Hackers,

The following errhint in pg_read_file() makes little sense:

 errhint("Consider using %s, which is part of core, instead.",
 "pg_file_read()")));

Grep'ing through master, there is almost nothing named pg_file_read, and what 
does exist is dropped when upgrading to adminpack 2.0:

Perhaps this errhint made sense at some point in the past?  It looks like core 
only uses this C-function named "pg_read_file" by the SQL function named 
"pg_read_file_old", but adminpack-1.0 also used it for a SQL function named 
pg_file_read, which gets dropped in the adminpack--1.1--2.0.sql upgrade file.  
If you haven't upgraded adminpack, it makes little sense to call adminpack's 
pg_file_read() function and get a hint telling you to instead use 
pg_file_read().  But calling pg_read_file_old() and being told to use 
pg_file_read() instead also doesn't make sense, because it doesn't exist.

I was going to submit a patch for this, but the more I look at it the less I 
understand what is intended by this code.  Thoughts?


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: WIP: BRIN multi-range indexes

2021-02-03 Thread Zhihong Yu
Hi,
bq. Not sure why would that be an issue

Moving the (start > end) check is up to your discretion.

But the midpoint computation should follow text book :-)

Cheers

On Wed, Feb 3, 2021 at 4:59 PM Tomas Vondra 
wrote:

>
>
> On 2/4/21 1:49 AM, Zhihong Yu wrote:
> > Hi,
> > For 0007-Remove-the-special-batch-mode-use-a-larger--20210203.patch :
> >
> > +   /* same as preceding value, so store it */
> > +   if (compare_values(&range->values[start + i - 1],
> > +  &range->values[start + i],
> > +  (void *) &cxt) == 0)
> > +   continue;
> > +
> > +   range->values[start + n] = range->values[start + i];
> >
> > It seems the comment doesn't match the code: the value is stored when
> > subsequent value is different from the previous.
> >
>
> Yeah, you're right the comment is wrong - the code is doing exactly the
> opposite. I'll need to go through this more carefully.
>
> > For has_matching_range():
> > +   int midpoint = (start + end) / 2;
> >
> > I think the standard notion for midpoint is start + (end-start)/2.
> >
> > +   /* this means we ran out of ranges in the last step */
> > +   if (start > end)
> > +   return false;
> >
> > It seems the above should be ahead of computation of midpoint.
> >
>
> Not sure why would that be an issue, as we're not using the value and
> the values are just plain integers (so no overflows ...).
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


RE: libpq debug log

2021-02-03 Thread tsunakawa.ta...@fujitsu.com
From: Alvaro Herrera 
> I'm pretty sure I named the flag PQTRACE_SUPPRESS_TIMESTAMP (and I
> prefer SUPPRESS to NOOUTPUT), because the idea is that the timestamp is
> printed by default.  I think that's the sensible decision: applications
> prefer to have timestamps, even if there's a tiny bit of overhead.  We
> don't want to force them to pass a flag for that.  We only want the
> no-timestamp behavior in order to be able to use it for libpq internal
> testing.

Agreed.  It makes sense to print timestamps by default because this feature 
will be used to diagnose slowness outside the database server.  (I 
misunderstood the motivation to introduce the flag).




Regards
Takayuki Tsunakawa




Re: Is it useful to record whether plans are generic or custom?

2021-02-03 Thread torikoshia

Chengxi Sun, Yamada-san, Horiguchi-san,

Thanks for all your comments.
Adding only the number of generic plan execution seems acceptable.

On Mon, Jan 25, 2021 at 2:10 PM Kyotaro Horiguchi 
 wrote:

Note that ActivePortal is the closest nested portal. So it gives the
wrong result for nested portals.


I may be wrong, but I thought it was ok since the closest nested portal 
is the portal to be executed.


ActivePortal is used in ExecutorStart hook in the patch.
And as far as I read PortalStart(), ActivePortal is changed to the 
portal to be executed before ExecutorStart().


If possible, could you tell me the specific case which causes wrong 
results?


Regards,

--
Atsushi Torikoshi




Re: Multiple full page writes in a single checkpoint?

2021-02-03 Thread Bruce Momjian
On Wed, Feb  3, 2021 at 05:00:19PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2021-02-03 19:21:25 -0500, Bruce Momjian wrote:
> > On Wed, Feb  3, 2021 at 03:29:13PM -0800, Andres Freund wrote:
> > > Changing this is *completely* infeasible. In a lot of workloads it'd
> > > cause a *massive* explosion of WAL volume. Like quadratically. You'll
> > > need to find another way to generate a nonce.
> >
> > Do we often do multiple writes to the file system of the same page
> > during a single checkpoint, particularly only-hint-bit-modified pages?
> > I didn't think so.
> 
> It can easily happen. Consider ringbuffer using scans (like vacuum,
> seqscan) - they'll force the buffer out to disk soon after it's been
> dirtied. And often will read the same page again a short bit later. Or
> just any workload that's a bit bigger than shared buffers (but data is
> in the OS cache).  Subsequent scans will often have new hint bits to
> set.

Oh, good point.

> > Is the logical approach here to modify XLogSaveBufferForHint() so if a
> > page write is not needed, to create a dummy WAL record that just
> > increments the WAL location and updates the page LSN?
> > (Is there a small WAL record I should reuse?)
> 
> I think an explicit record type would be better. Or a hint record
> without an associated FPW.

OK.

> > I can try to add a hint-bit-page-write page counter, but that might
> > overflow, and then we will need a way to change the LSN anyway.
> 
> That's just a question of width...

Yeah, the hint bit counter is just delaying the inevitasble, plus it
changes the page format, which I am trying to avoid.  Also, I need this
dummy record only if the page is marked clean, meaning a write
to the file system already happened in the current checkpoint --- should
not be to bad.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Multiple full page writes in a single checkpoint?

2021-02-03 Thread Andres Freund
Hi,

On 2021-02-03 19:21:25 -0500, Bruce Momjian wrote:
> On Wed, Feb  3, 2021 at 03:29:13PM -0800, Andres Freund wrote:
> > Changing this is *completely* infeasible. In a lot of workloads it'd
> > cause a *massive* explosion of WAL volume. Like quadratically. You'll
> > need to find another way to generate a nonce.
>
> Do we often do multiple writes to the file system of the same page
> during a single checkpoint, particularly only-hint-bit-modified pages?
> I didn't think so.

It can easily happen. Consider ringbuffer using scans (like vacuum,
seqscan) - they'll force the buffer out to disk soon after it's been
dirtied. And often will read the same page again a short bit later. Or
just any workload that's a bit bigger than shared buffers (but data is
in the OS cache).  Subsequent scans will often have new hint bits to
set.


> Is the logical approach here to modify XLogSaveBufferForHint() so if a
> page write is not needed, to create a dummy WAL record that just
> increments the WAL location and updates the page LSN?
> (Is there a small WAL record I should reuse?)

I think an explicit record type would be better. Or a hint record
without an associated FPW.


> I can try to add a hint-bit-page-write page counter, but that might
> overflow, and then we will need a way to change the LSN anyway.

That's just a question of width...

Greetings,

Andres Freund




Re: WIP: BRIN multi-range indexes

2021-02-03 Thread Tomas Vondra



On 2/4/21 1:49 AM, Zhihong Yu wrote:
> Hi,
> For 0007-Remove-the-special-batch-mode-use-a-larger--20210203.patch :
> 
> +       /* same as preceding value, so store it */
> +       if (compare_values(&range->values[start + i - 1],
> +                          &range->values[start + i],
> +                          (void *) &cxt) == 0)
> +           continue;
> +
> +       range->values[start + n] = range->values[start + i];
> 
> It seems the comment doesn't match the code: the value is stored when
> subsequent value is different from the previous.
> 

Yeah, you're right the comment is wrong - the code is doing exactly the
opposite. I'll need to go through this more carefully.

> For has_matching_range():
> +       int     midpoint = (start + end) / 2;
> 
> I think the standard notion for midpoint is start + (end-start)/2.
> 
> +       /* this means we ran out of ranges in the last step */
> +       if (start > end)
> +           return false;
> 
> It seems the above should be ahead of computation of midpoint.
> 

Not sure why would that be an issue, as we're not using the value and
the values are just plain integers (so no overflows ...).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-03 Thread tsunakawa.ta...@fujitsu.com
From: Amit Kapila 
> On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
>  wrote:
> > Execute EXPLAIN on Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> >QUERY PLAN
> >
> ---
> -
> >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=44.139..44.140 rows=0 loops=1)
> >Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> >->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> >  Output: test_data1.a, test_data1.b
> >  Buffers: shared hit=5
> >  Planning:
> >Buffers: shared hit=27011
> >  Planning Time: 24.526 ms
> >  Execution Time: 44.981 ms
> >
> > Execute EXPLAIN on non-Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> >QUERY PLAN
> >
> ---
> -
> >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=72.656..72.657 rows=0 loops=1)
> >Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> >->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> >  Output: test_data1.a, test_data1.b
> >  Buffers: shared hit=5
> >  Planning:
> >Buffers: shared hit=72
> >  Planning Time: 0.135 ms
> >  Execution Time: 79.058 ms
> >
> 
> So, the results indicate that after the patch we touch more buffers
> during planning which I think is because of accessing the partition
> information, and during execution, the patch touches fewer buffers for
> the same reason. But why this can reduce the time with patch? I think
> this needs some investigation.

I guess another factor other than shared buffers is relcache and catcache.  The 
patched version loads those cached entries for all partitions of the insert 
target table during the parallel-safety check in planning, while the unpatched 
version has to gradually build those cache entries during execution.  How can 
wee confirm its effect?


Regards
Takayuki Tsunakawa




Re: Correct comment in StartupXLOG().

2021-02-03 Thread Kyotaro Horiguchi
At Wed, 3 Feb 2021 16:36:13 +0530, Amul Sul  wrote in 
> On Wed, Feb 3, 2021 at 2:48 PM Dilip Kumar  wrote:
> >
> > On Wed, Feb 3, 2021 at 2:28 PM Amul Sul  wrote:
> > >
> > > Hi,
> > >
> > > SharedRecoveryState member of XLogCtl is no longer a boolean flag, got 
> > > changes
> > > in 4e87c4836ab9 to enum but, comment referring to it still referred as the
> > > boolean flag which is pretty confusing and incorrect.
> >
> > +1 for the comment change

Actually the "flag" has been changed to an integer (emnum), so it
needs a change. However, the current proposal:

 * Now allow backends to write WAL and update the control file status in
-* consequence.  The boolean flag allowing backends to write WAL is
+* consequence.  The recovery state allowing backends to write WAL is
 * updated while holding ControlFileLock to prevent other backends to 
look

Looks somewhat strange. The old booean had a single task to allow
backends to write WAL but the current state has multple states that
controls recovery progress. So I thnink it needs a further change.

===
 Now allow backends to write WAL and update the control file status in
 consequence.  The recovery state is updated to allow backends to write
 WAL, while holding ControlFileLock to prevent other backends to look
 at an inconsistent state of the control file in shared memory.
===

> > > Also, the last part of the same comment is as:
> > >
> > > " .. although the boolean flag to allow WAL is probably atomic in
> > > itself, .",
> > >
> > > I am a bit confused here too about saying "atomic" to it, is that correct?
> > > I haven't done anything about it, only replaced the "boolean flag" to 
> > > "recovery
> > > state" in the attached patch.
> >
> > I don't think the atomic is correct, it's no more boolean so it is
> > better we get rid of this part of the comment
> 
> Thanks for the confirmation.  Updated that part in the attached version.

I think the original comment still holds except the data type.

-* Also, although the boolean flag to allow WAL is probably atomic in
-* itself, we use the info_lck here to ensure that there are no race
-* conditions concerning visibility of other recent updates to shared
-* memory.
+* Also, we use the info_lck to update the recovery state to ensure that
+* there are no race conditions concerning visibility of other recent
+* updates to shared memory.

The type RecoveryState is int, which is of the native machine size
that is considered to be atomic as well as boolean. However, I don't
object to remove the phrase since that removal doesn't change the
point of the description.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: WIP: BRIN multi-range indexes

2021-02-03 Thread Zhihong Yu
Hi,
For 0007-Remove-the-special-batch-mode-use-a-larger--20210203.patch :

+   /* same as preceding value, so store it */
+   if (compare_values(&range->values[start + i - 1],
+  &range->values[start + i],
+  (void *) &cxt) == 0)
+   continue;
+
+   range->values[start + n] = range->values[start + i];

It seems the comment doesn't match the code: the value is stored when
subsequent value is different from the previous.

For has_matching_range():
+   int midpoint = (start + end) / 2;

I think the standard notion for midpoint is start + (end-start)/2.

+   /* this means we ran out of ranges in the last step */
+   if (start > end)
+   return false;

It seems the above should be ahead of computation of midpoint.

Similar comment for the code in AssertCheckRanges().

Cheers

On Wed, Feb 3, 2021 at 3:55 PM Tomas Vondra 
wrote:

> Hi,
>
> Here's an updated and significantly improved version of the patch
> series, particularly the multi-minmax part. I've fixed a number of
> stupid bugs in that, discovered by either valgrind or stress-tests.
>
> I was surprised by some of the bugs, or rather that the existing
> regression tests failed to crash on them, so it's probably worth briefly
> discussing the details. There were two main classes of such bugs:
>
>
> 1) missing datumCopy
>
> AFAICS this happened because there were a couple missing datumCopy
> calls, and BRIN covers multiple pages, so with by-ref data types we
> added a pointer but the buffer might have gone away unexpectedly.
> Regular regression tests passed just fine, because brin_multi runs
> almost separately, so the chance of the buffer being evicted was low.
> Valgrind reported this (with a rather enigmatic message, as usual), and
> so did a simple stress-test creating many indexes concurrently. Anything
> causing aggressive eviction of buffer would do the trick, I think,
> triggering segfaults, asserts, etc.
>
>
> 2) bogus opclass definitions
>
> There were a couple opclasses referencing incorrect distance function,
> intended for a different data type. I was scratching my head WTH the
> regression tests pass, as there is a table to build multi-minmax index
> on all supported data types. The reason is pretty silly - the table is
> very small, just 100 rows, with very low fillfactor (so only couple
> values per page), and the index was created with pages_per_range=1. So
> the compaction was not triggered and the distance function was never
> actually called. I've decided to build the indexes on a larger data set
> first, to test this. But maybe this needs somewhat different approach.
>
>
> BLOOM
> -
>
> The attached patch series addresses comments from the last review. As
> for the size limit, I've defined a new macro
>
> #define BloomMaxFilterSize \
> MAXALIGN_DOWN(BLCKSZ - \
>   (MAXALIGN(SizeOfPageHeaderData + \
> sizeof(ItemIdData)) + \
>MAXALIGN(sizeof(BrinSpecialSpace)) + \
>SizeOfBrinTuple))
>
> and use that to determine if the bloom filter is too large. IMO that's
> close enough, considering that this is a best-effort check anyway (due
> to not being able to consider multi-column indexes).
>
>
> MINMAX-MULTI
> 
>
> As mentioned, there's a lot of fixes and improvements in this part, but
> the basic principle is still the same. I've kept it split into three
> parts with different approaches to building, so that it's possible to do
> benchmarks and comparisons, and pick the best one.
>
> a) 0005 - Aggressively compacts the summary, by always keeping it within
> the limit defined by values_per_range. So if the range contains more
> values, this may trigger compaction very often in some cases (e.g. for
> monotonic series).
>
> One drawback is that the more often the compactions happen, the less
> optimal the result is - the algorithm is kinda greedy, picking something
> like local optimums in each step.
>
> b) 0006 - Batch build, exactly the opposite of 0005. Accumulates all
> values in a buffer, then does a single round of compaction at the very
> end. This obviously doesn't have the "greediness" issues, but it may
> consume quite a bit of memory for some data types and/or indexes with
> large BRIN ranges.
>
> c) 0007 - A hybrid approach, using a buffer that is multiple of the
> user-specified value, with some safety min/max limits. IMO this is what
> we should use, although perhaps with some tuning of the exact limits.
>
>
> Attached is a spreadsheet with benchmark results for each of those 

Re: WIP: WAL prefetch (another approach)

2021-02-03 Thread Tomas Vondra
Hi,

I did a bunch of tests on v15, mostly to asses how much could the
prefetching help. The most interesting test I did was this:

1) primary instance on a box with 16/32 cores, 64GB RAM, NVMe SSD

2) replica on small box with 4 cores, 8GB RAM, SSD RAID

3) pause replication on the replica (pg_wal_replay_pause)

4) initialize pgbench scale 2000 (fits into RAM on primary, while on
replica it's about 4x RAM)

5) run 1h pgbench: pgbench -N -c 16 -j 4 -T 3600 test

6) resume replication (pg_wal_replay_resume)

7) measure how long it takes to catch up, monitor lag

This is nicely reproducible test case, it eliminates influence of
network speed and so on.

Attached is a chart showing the lag with and without the prefetching. In
both cases we start with ~140GB of redo lag, and the chart shows how
quickly the replica applies that. The "waves" are checkpoints, where
right after a checkpoint the redo gets much faster thanks to FPIs and
then slows down as it gets to parts without them (having to do
synchronous random reads).

With master, it'd take ~16000 seconds to catch up. I don't have the
exact number, because I got tired of waiting, but the estimate is likely
accurate (judging by other tests and how regular the progress is).

With WAL prefetching enabled (I bumped up the buffer to 2MB, and
prefetch limit to 500, but that was mostly just arbitrary choice), it
finishes in ~3200 seconds. This includes replication of the pgbench
initialization, which took ~200 seconds and where prefetching is mostly
useless. That's a damn pretty improvement, I guess!

In a way, this means the tiny replica would be able to keep up with a
much larger machine, where everything is in memory.


One comment about the patch - the postgresql.conf.sample change says:

#recovery_prefetch = on  # whether to prefetch pages logged with FPW
#recovery_prefetch_fpw = off # whether to prefetch pages logged with FPW

but clearly that comment is only for recovery_prefetch_fpw, the first
GUC enables prefetching in general.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Multiple full page writes in a single checkpoint?

2021-02-03 Thread Bruce Momjian
On Wed, Feb  3, 2021 at 03:29:13PM -0800, Andres Freund wrote:
> > Is the above case valid, and would it cause two full page writes to WAL?
> > More specifically, wouldn't it cause every write of the page to the file
> > system to use a new LSN?
> 
> No. 8) won't happen.  Look e.g. at XLogSaveBufferForHint():
> 
> /*
>  * Update RedoRecPtr so that we can make the right decision
>  */
> RedoRecPtr = GetRedoRecPtr();
> 
> /*
>  * We assume page LSN is first data on *every* page that can be passed to
>  * XLogInsert, whether it has the standard page layout or not. Since we're
>  * only holding a share-lock on the page, we must take the buffer header
>  * lock when we look at the LSN.
>  */
> lsn = BufferGetLSNAtomic(buffer);
> 
> if (lsn <= RedoRecPtr)
> /* wal log hint bit */
> 
> The RedoRecPtr is determined at 1. and doesn't change between 4) and
> 8). The LSN for 4) has to be *past* the RedoRecPtr from 1). Therefore we
> don't do another FPW.

OK, so, what is happening is that it knows the page LSN is after the
start of the current checkpoint (the redo point), so it knows not do to
a full page write again?  Smart, and makes sense.

> Changing this is *completely* infeasible. In a lot of workloads it'd
> cause a *massive* explosion of WAL volume. Like quadratically. You'll
> need to find another way to generate a nonce.

Do we often do multiple writes to the file system of the same page
during a single checkpoint, particularly only-hint-bit-modified pages?
I didn't think so.

> In the non-hint bit case you'll automatically have a higher LSN in 7/8
> though. So you won't need to do anything about getting a higher nonce.

Yes, I was counting on that.  :-)

> For the hint bit case in 8 you could consider just using any LSN generated
> after 4 (preferably already flushed to disk) - but that seems somewhat
> ugly from a debuggability POV :/. Alternatively you could just create
> tiny WAL record to get a new LSN, but that'll sometimes trigger new WAL
> flushes when the pages are dirtied.

Yes, that would make sense.  I do need the first full page write during
a checkpoint to be sure I don't have torn pages that have some part of
the page encrypted with one LSN and a second part with a different LSN. 
You are right that I don't need a second full page write during the same
checkpoint because a torn page would just restore the first full page
write and throw away the second LSN and hint bit changes, which is fine.

I hadn't gotten to ask about that until I found if the previous
assumptions were true, which they were not.

Is the logical approach here to modify XLogSaveBufferForHint() so if a
page write is not needed, to create a dummy WAL record that just
increments the WAL location and updates the page LSN?  (Is there a small
WAL record I should reuse?)  I can try to add a hint-bit-page-write page
counter, but that might overflow, and then we will need a way to change
the LSN anyway.

I am researching this so I can give a clear report on the impact of
adding this feature.  I will update the wiki once we figure this out.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Dumping/restoring fails on inherited generated column

2021-02-03 Thread Zhihong Yu
Hi,
+   if (attribute->attgenerated && !childatt->attgenerated)
+   ereport(ERROR,
...
+   if (attribute->attgenerated && childatt->attgenerated)
+   {

Looks like for the second if statement,
checking attribute->attgenerated should be enough (due to the check from
the first if statement).

Cheers

On Wed, Feb 3, 2021 at 11:18 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2021-01-29 17:41, Tom Lane wrote:
> > Also, in the example from [2],
> >
> > d3=# create table pp1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> > CREATE TABLE
> > d3=# create table cc1 (a int, b int GENERATED ALWAYS AS (a * 3) STORED);
> > CREATE TABLE
> > d3=# alter table cc1 inherit pp1;
> > ALTER TABLE
> >
> > pg_dump now omits to dump cc1's generation expression, which seems
> > strictly worse than before.  Admittedly, the backend likely ought to
> > be rejecting this scenario, but it doesn't do so today.
> >
> > [2]
> https://www.postgresql.org/message-id/661371.1601398006%40sss.pgh.pa.us
>
> Here is a WIP patch to address this.  Probably needs another look for
> column number mapping and all the usual stuff, but the basic idea should
> be okay.
>
> --
> Peter Eisentraut
> 2ndQuadrant, an EDB company
> https://www.2ndquadrant.com/
>


Re: DROP TABLE can crash the replication sync worker

2021-02-03 Thread Peter Smith
On Wed, Feb 3, 2021 at 11:49 PM Amit Kapila  wrote:
>
> On Wed, Feb 3, 2021 at 2:53 PM Peter Smith  wrote:
> >
> > Hi Hackers.
> >
> > As discovered in another thread [master] there is an *existing* bug in
> > the PG HEAD code which can happen if a DROP TABLE is done at same time
> > a replication tablesync worker is running.
> >
> > It seems the table's relid that the sync worker is using gets ripped
> > out from underneath it and is invalidated by the DROP TABLE. Any
> > subsequent use of that relid will go wrong.
> >
>
> Where exactly did you pause the tablesync worker while dropping the
> table? We acquire the lock on the table in LogicalRepSyncTableStart
> and then keep it for the entire duration of tablesync worker so drop
> table shouldn't be allowed.
>

I have a breakpoint set on LogicalRepSyncTableStart. The DROP TABLE is
done while paused on that breakpoint, so no code of
LogicalRepSyncTableStart has even executed yet.


Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Typo in tablesync comment

2021-02-03 Thread Peter Smith
On Wed, Feb 3, 2021 at 11:55 PM Amit Kapila  wrote:
>
> On Wed, Feb 3, 2021 at 1:35 PM Michael Paquier  wrote:
> >
> > On Wed, Feb 03, 2021 at 06:52:56PM +1100, Peter Smith wrote:
> >
> > > Maybe better to rewrite it more drastically?
> > >
> > > e.g
> > > -
> > >  *The catalog pg_subscription_rel is used to keep information about
> > >  *subscribed tables and their state. The catalog holds all states
> > >  *except SYNCWAIT and CATCHUP which are only in shared memory.
> > > -
> >
> > Fine by me.
> >
>
> +1.
>

OK. I attached an updated patch using this new text.


Kind Regards,
Peter Smith.
Fujitsu Australia


v2-0001-Fix-tablesync-comment-typo.patch
Description: Binary data


Re: logical replication worker accesses catalogs in error context callback

2021-02-03 Thread Andres Freund
Hi,

On 2021-02-03 16:35:29 +0530, Bharath Rupireddy wrote:
> On Sat, Jan 30, 2021 at 8:23 AM Andres Freund  wrote:
> > On 2021-01-28 11:14:03 +0530, Amit Kapila wrote:
> > > On Wed, Jan 27, 2021 at 9:38 AM Bharath Rupireddy
> > >  wrote:
> > > >
> > > > On Wed, Jan 27, 2021 at 7:48 AM Zhihong Yu  wrote:
> > > >
> > > > Thanks for pointing to the changes in the commit message. I corrected
> > > > them. Attaching v4 patch set, consider it for further review.
> > > >
> > >
> > > About 0001, have we tried to reproduce the actual bug here which means
> > > when the error_callback is called we should face some problem? I feel
> > > with the correct testcase we should hit the Assert
> > > (Assert(IsTransactionState());) in SearchCatCacheInternal because
> > > there we expect the transaction to be in a valid state. I understand
> > > that the transaction is in a broken state at that time but having a
> > > testcase to hit the actual bug makes it easy to test the fix.
> >
> > I think it's easy to hit - add a trivial DEBUG message to XLogWrite(),
> > and run anything involving logical rep using a low enough log
> > level. There might even be enough messages with a low enough level
> > already somewhere in the relevant paths.
> 
> I'm not sure how adding a DEBUG message to XLogWrite() would ensure
> the logical replication worker on the subscriber system enters
> slot_store_error_callback and hits the Assert(IsTransactionState());?
> Could you please elaborate? Or I may be missing something here to
> understand.

XLogWrite() is in a critical section, the DEBUG messages triggers error
context callbacks to be called, the callbacks allocate memory, which
triggers an assertion.

Greetings,

Andres Freund




Re: Multiple full page writes in a single checkpoint?

2021-02-03 Thread Andres Freund
Hi,

On 2021-02-03 18:05:56 -0500, Bruce Momjian wrote:
> log_hint_bits already gives us a unique nonce for the first hint bit
> change on a page during a checkpoint, but we only encrypt on page write
> to the file system, so I am researching if log_hint_bits will already
> generate a unique LSN for every page write to the file system, even if
> there are multiple hint-bit-caused page writes to the file system during
> a single checkpoint.  (We already know this works for multiple
> checkpoints.)

No, it won't:

> However, imagine these steps:
> 
> 1.  checkpoint starts
> 2.  page is modified by row or hint bit change
> 3.  page gets a new LSN and is marked as dirty
> 4.  page image is flushed to WAL
> 5.  pages is written to disk and marked as clean
> 6.  page is modified by data or hint bit change
> 7.  pages gets a new LSN and is marked as dirty
> 8.  page image is flushed to WAL
> 9.  checkpoint completes
> 10. pages is written to disk and marked as clean
> 
> Is the above case valid, and would it cause two full page writes to WAL?
> More specifically, wouldn't it cause every write of the page to the file
> system to use a new LSN?

No. 8) won't happen.  Look e.g. at XLogSaveBufferForHint():

/*
 * Update RedoRecPtr so that we can make the right decision
 */
RedoRecPtr = GetRedoRecPtr();

/*
 * We assume page LSN is first data on *every* page that can be passed to
 * XLogInsert, whether it has the standard page layout or not. Since we're
 * only holding a share-lock on the page, we must take the buffer header
 * lock when we look at the LSN.
 */
lsn = BufferGetLSNAtomic(buffer);

if (lsn <= RedoRecPtr)
/* wal log hint bit */

The RedoRecPtr is determined at 1. and doesn't change between 4) and
8). The LSN for 4) has to be *past* the RedoRecPtr from 1). Therefore we
don't do another FPW.


Changing this is *completely* infeasible. In a lot of workloads it'd
cause a *massive* explosion of WAL volume. Like quadratically. You'll
need to find another way to generate a nonce.

In the non-hint bit case you'll automatically have a higher LSN in 7/8
though. So you won't need to do anything about getting a higher nonce.

For the hint bit case in 8 you could consider just using any LSN generated
after 4 (preferrably already flushed to disk) - but that seems somewhat
ugly from a debuggability POV :/. Alternatively you could just create
tiny WAL record to get a new LSN, but that'll sometimes trigger new WAL
flushes when the pages are dirtied.

Greetings,

Andres Freund




Multiple full page writes in a single checkpoint?

2021-02-03 Thread Bruce Momjian
Cluster file encryption plans to use the LSN and page number as the
nonce for heap/index pages.  I am looking into the use of a unique nonce
during hint bit changes.  (You need to use a new nonce for re-encrypting
a page that changes.)

log_hint_bits already gives us a unique nonce for the first hint bit
change on a page during a checkpoint, but we only encrypt on page write
to the file system, so I am researching if log_hint_bits will already
generate a unique LSN for every page write to the file system, even if
there are multiple hint-bit-caused page writes to the file system during
a single checkpoint.  (We already know this works for multiple
checkpoints.)

Our docs on full_page_writes states:

When this parameter is on, the
PostgreSQL server writes the entire
content of each disk page to WAL during the first modification
of that page after a checkpoint.

and wal_log_hints states:

When this parameter is on, the
PostgreSQL server writes the entire
content of each disk page to WAL during the first modification of
that page after a checkpoint, even for non-critical modifications
of so-called hint bits.

However, imagine these steps:

1.  checkpoint starts
2.  page is modified by row or hint bit change
3.  page gets a new LSN and is marked as dirty
4.  page image is flushed to WAL
5.  pages is written to disk and marked as clean
6.  page is modified by data or hint bit change
7.  pages gets a new LSN and is marked as dirty
8.  page image is flushed to WAL
9.  checkpoint completes
10. pages is written to disk and marked as clean

Is the above case valid, and would it cause two full page writes to WAL?
More specifically, wouldn't it cause every write of the page to the file
system to use a new LSN?

If so, this means wal_log_hints is sufficient to guarantee a new nonce
for every page image, even for multiple hint bit changes and page writes
during a single checkpoint, and there is then no need for a hit bit
counter on the page --- the unique LSN does that for us.  I know
log_hint_bits was designed to fix torn pages, but it seems to also do
exactly what cluster file encryption needs.

If the above is all true, should we update the docs, READMEs, or C
comments about this?  I think the cluster file encryption patch would at
least need to document that we need to keep this behavior, because I
don't think log_hint_bits needs to behave this way for checksum
purposes because of the way full page writes are processed during crash
recovery.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: new heapcheck contrib module

2021-02-03 Thread Robert Haas
On Tue, Feb 2, 2021 at 6:10 PM Mark Dilger  wrote:
> 0001 -- no changes

Committed.

> 0002 -- fixing omissions in @pgfeutilsfiles in file 
> src/tools/msvc/Mkvcbuild.pm

Here are a few minor cosmetic issues with this patch:

- connect_utils.c lacks a file header comment.
- Some or perhaps all of the other file header comments need an update for 2021.
- There's bogus hunks in the diff for string_utils.c.

I think the rest of this looks good. I spent a long time puzzling over
whether consumeQueryResult() and processQueryResult() needed to be
moved, but then I realized that this patch actually makes them into
static functions inside parallel_slot.c, rather than public functions
as they were before. I like that. The only reason those functions need
to be moved at all is so that the scripts_parallel/parallel_slot stuff
can continue to do its thing, so this is actually a better way of
grouping things together than what we have now.

> 0003 -- no changes

I think it would be better if there were no handler by default, and
failing to set one leads to an assertion failure when we get to the
point where one would be called.

I don't think I understand the point of renaming processQueryResult
and consumeQueryResult. Isn't that just code churn for its own sake?

PGresultHandler seems too generic. How about ParallelSlotHandler or
ParallelSlotResultHandler?

I'm somewhat inclined to propose s/ParallelSlot/ConnectionSlot/g but I
guess it's better not to get sucked into renaming things.

It's a little strange that we end up with mutators to set the slot's
handler and handler context when we elsewhere feel free to monkey with
a slot's connection directly, but it's not a perfect world and I can't
think of anything I'd like better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Tom Lane
Tomas Vondra  writes:
> On 2/3/21 4:08 PM, Tom Lane wrote:
>> I'm disinclined to think that this is a good idea from a security
>> perspective.  Maybe if it's superuser-only it'd be ok (since a
>> superuser would have other routes to discovering the value anyway).

> Is the postmaster PID really sensitive? Users with OS access can just
> list the processes, and for users without OS access / privileges it's
> mostly useless, no?

We disallow ordinary users from finding out the data directory location,
even though that should be equally useless to unprivileged users.  The
postmaster PID seems like the same sort of information.  It does not
seem like a non-administrator could have any but nefarious use for that
value.  (Admittedly, this argument is somewhat weakened by exposing
child processes' PIDs ... but you can't take down the whole installation
by zapping a child process.)

I'm basically in the same place you are in your other response: the
question to ask is not "why not allow this?", but "why SHOULD we allow
this?"

regards, tom lane




Re: Tid scan improvements

2021-02-03 Thread David Rowley
Thanks for looking at this.

On Thu, 4 Feb 2021 at 10:19, Andres Freund  wrote:
> Perhaps something like
>
> typedef struct TableScanTidRange TableScanTidRange;
>
> TableScanTidRange* table_scan_tid_range_start(TableScanDesc sscan, 
> ItemPointer mintid, ItemPointer maxtid);
> bool table_scan_tid_range_nextslot(TableScanDesc sscan, TableScanTidRange 
> *range, TupleTableSlot *slot);
> void table_scan_tid_range_end(TableScanDesc sscan, TableScanTidRange* range);
>
> would work better? That'd allow an AM to have arbitrarily large state
> for a tid range scan, would make it clear what the lifetime of the
> ItemPointer mintid, ItemPointer maxtid are etc.  Wouldn't, on the API
> level, prevent multiple tid range scans from being in progress at the
> same times though :(. Perhaps we could add a TableScanTidRange* pointer
> to TableScanDesc which'd be checked/set by tableam.h which'd prevent that?

Maybe the TableScanTidRange can just have a field to store the
TableScanDesc. That way table_scan_tid_range_nextslot and
table_scan_tid_range_end can just pass the TableScanTidRange pointer.

That way it seems like it would be ok for multiple scans to be going
on concurrently as nobody should be reusing the TableScanDesc.

Does that seem ok?

David




Re: Tid scan improvements

2021-02-03 Thread Andres Freund
Hi,

On 2021-01-26 14:22:42 +1300, David Rowley wrote:
> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 33bffb6815..d1c608b176 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -325,6 +325,26 @@ typedef struct TableAmRoutine
>
> ScanDirection direction,
>
> TupleTableSlot *slot);
>  
> + /*
> +  * Return next tuple from `scan` where TID is within the defined range.
> +  * This behaves like scan_getnextslot but only returns tuples from the
> +  * given range of TIDs.  Ranges are inclusive.  This function is 
> optional
> +  * and may be set to NULL if TID range scans are not supported by the 
> AM.
> +  *
> +  * Implementations of this function must themselves handle ItemPointers
> +  * of any value. i.e, they must handle each of the following:
> +  *
> +  * 1) mintid or maxtid is beyond the end of the table; and
> +  * 2) mintid is above maxtid; and
> +  * 3) item offset for mintid or maxtid is beyond the maximum offset
> +  * allowed by the AM.
> +  */
> + bool(*scan_getnextslot_inrange) (TableScanDesc scan,
> + 
>  ScanDirection direction,
> + 
>  TupleTableSlot *slot,
> + 
>  ItemPointer mintid,
> + 
>  ItemPointer maxtid);
> +
>  
>   /* 
> 
>* Parallel table scan related functions.
> @@ -1015,6 +1035,36 @@ table_scan_getnextslot(TableScanDesc sscan, 
> ScanDirection direction, TupleTableS
>   return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, 
> slot);
>  }
>  
> +/*
> + * Return next tuple from defined TID range from `scan` and store in slot.
> + */
> +static inline bool
> +table_scan_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
> +TupleTableSlot 
> *slot, ItemPointer mintid,
> +ItemPointer maxtid)
> +{
> + /*
> +  * The planner should never make a plan which uses this function when 
> the
> +  * table AM has not defined any function for this callback.
> +  */
> + Assert(sscan->rs_rd->rd_tableam->scan_getnextslot_inrange != NULL);
> +
> + slot->tts_tableOid = RelationGetRelid(sscan->rs_rd);
> +
> + /*
> +  * We don't expect direct calls to table_scan_getnextslot_inrange with
> +  * valid CheckXidAlive for catalog or regular tables.  See detailed
> +  * comments in xact.c where these variables are declared.
> +  */
> + if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
> + elog(ERROR, "unexpected table_scan_getnextslot_inrange call 
> during logical decoding");
> +
> + return sscan->rs_rd->rd_tableam->scan_getnextslot_inrange(sscan,
> + 
>   direction,
> + 
>   slot,
> + 
>   mintid,
> + 
>   maxtid);
> +}


I don't really like that this API relies on mintid/maxtid to stay the
same across multiple scan_getnextslot_inrange() calls. I think we'd at
least need to document that that's required and what needs to be done to
scan a different set of min/max tid (or re-scan the same min/max from
scratch).

Perhaps something like

typedef struct TableScanTidRange TableScanTidRange;

TableScanTidRange* table_scan_tid_range_start(TableScanDesc sscan, ItemPointer 
mintid, ItemPointer maxtid);
bool table_scan_tid_range_nextslot(TableScanDesc sscan, TableScanTidRange 
*range, TupleTableSlot *slot);
void table_scan_tid_range_end(TableScanDesc sscan, TableScanTidRange* range);

would work better? That'd allow an AM to have arbitrarily large state
for a tid range scan, would make it clear what the lifetime of the
ItemPointer mintid, ItemPointer maxtid are etc.  Wouldn't, on the API
level, prevent multiple tid range scans from being in progress at the
same times though :(. Perhaps we could add a TableScanTidRange* pointer
to TableScanDesc which'd be checked/set by tableam.h which'd prevent that?

Greetings,

Andres Freund




Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Tomas Vondra
On 2/3/21 4:08 PM, Tom Lane wrote:
> Bharath Rupireddy  writes:
>> Can we have a new function, say pg_postgres_pid(), to return
>> postmaster PID similar to pg_backend_pid()?
> 
> I'm disinclined to think that this is a good idea from a security
> perspective.  Maybe if it's superuser-only it'd be ok (since a
> superuser would have other routes to discovering the value anyway).
> 

Is the postmaster PID really sensitive? Users with OS access can just
list the processes, and for users without OS access / privileges it's
mostly useless, no?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Tomas Vondra
On 2/3/21 7:12 AM, Bharath Rupireddy wrote:
> Hi,
> 
> Can we have a new function, say pg_postgres_pid(), to return
> postmaster PID similar to pg_backend_pid()? At times, it will be
> difficult to use OS level commands to get the postmaster pid of a
> backend to which it is connected. It's even worse if we have multiple
> postgres server instances running on the same system. I'm not sure
> whether it's safe to expose postmaster pid this way, but it will be
> useful at least for debugging purposes on say Windows or other
> non-Linux platforms where it's a bit difficult to get process id.
> Users can also look at the postmaster.pid file to figure out what's
> the current postmaster pid, if not using OS level commands, but having
> a SQL callable function makes life easier.
> 
> The function can look like this:
> Datum
> pg_postgres_pid(PG_FUNCTION_ARGS)
> {
> PG_RETURN_INT32(PostmasterPid);
> }
> 
> Thoughts?
> 

Curious question - why do you actually need PID of the postmaster? For
debugging, I'd say it's not quite necessary - you can just attach a
debugger to the backend and print the PostmasterPid directly. Or am I
missing something?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Next Commitfest Manager.

2021-02-03 Thread David Steele

On 2/3/21 3:13 PM, Stephen Frost wrote:

Greetings,

* Ibrar Ahmed (ibrar.ah...@gmail.com) wrote:

Anyone else already volunteers that? It is my first time so need some
access, if all agree.


Thanks for volunteering!

That said, our last commitfest tends to be the most difficult as it's
the last opportunity for features to land in time for the next major
release and, for my part at least, I think it'd be best to have
someone who has experience running a CF previously manage it.

To that end, I've talked to David Steele, who has run this last CF for
the past few years and we're in agreement that he's willing to run this
CF again this year, assuming there's no objections.  What we've thought
to suggest is that you follow along with David as he runs this CF and
then offer to run the July CF.  Of course, we would encourage you and
David to communicate and for you to ask David any questions you have
about how he handles things as part of the CF.  This is in line with how
other CF managers have started out also.

Open to your thoughts, as well as those of anyone else who wishes to
comment.


+1. This all sounds good to me!

--
-David
da...@pgmasters.net




Re: Recording foreign key relationships for the system catalogs

2021-02-03 Thread Joel Jacobson
On Mon, Feb 1, 2021, at 21:03, Tom Lane wrote:
>"Joel Jacobson"  writes:
>> The is_array OUT parameter doesn't say which of the possibly many fkcols 
>> that is the array column.
>
>Yeah, I didn't write the sgml docs yet, but the comments explain that
>the array is always the last fkcol.  Maybe someday that won't be
>general enough, but we can cross that bridge when we come to it.

I've now fully migrated to using pg_get_catalog_foreign_keys()
instead of my own lookup tables, and have some additional hands-on experiences
to share with you.

I struggle to come up with a clean way to make use of is_array,
without being forced to introduce some CASE logic to figure out
if the fkcol is an array or not.

The alternative to join information_schema.columns and check data_type='ARRAY' 
is almost simpler,
but that seems wrong, since we now have is_array, and using it should be 
simpler than
joining information_schema.columns.

The best approach I've come up with so far is the CASE logic below:

WITH
foreign_keys AS
(
  SELECT
fktable::text AS table_name,
unnest(fkcols) AS column_name,
pktable::text AS ref_table_name,
unnest(pkcols) AS ref_column_name,
--
-- is_array refers to the last fkcols column
--
unnest
(
  CASE cardinality(fkcols)
  WHEN 1 THEN ARRAY[is_array]
  WHEN 2 THEN ARRAY[FALSE,is_array]
  END
) AS is_array
  FROM pg_get_catalog_foreign_keys()
)

If is_array would instead have been an boolean[], the query could have been 
written:

WITH
foreign_keys AS
(
  SELECT
fktable::text AS table_name,
unnest(fkcols) AS column_name,
pktable::text AS ref_table_name,
unnest(pkcols) AS ref_column_name,
unnest(is_array) AS is_array
  FROM pg_get_catalog_foreign_keys()
)

Maybe this can be written in a simpler way already.

Otherwise I think it would be more natural to change both is_array and is_opt
to boolean[] with the same cardinality as fkcols and pkcols,
to allow unnest()ing of them as well.

This would also be a more future proof solution,
and wouldn't require a code change to code using pg_get_catalog_foreign_keys(),
if we would ever add more complex cases in the future.

But even without increased future complexity,
I think the example above demonstrates a problem already today.

Maybe there is a simpler way to achieve what I'm trying to do,
i.e. to figure out if a specific fkcol is an array or not,
using some other simpler clever trick than the CASE variant above?

/Joel



Re: Next Commitfest Manager.

2021-02-03 Thread Stephen Frost
Greetings,

* Ibrar Ahmed (ibrar.ah...@gmail.com) wrote:
> Anyone else already volunteers that? It is my first time so need some
> access, if all agree.

Thanks for volunteering!

That said, our last commitfest tends to be the most difficult as it's
the last opportunity for features to land in time for the next major
release and, for my part at least, I think it'd be best to have
someone who has experience running a CF previously manage it.

To that end, I've talked to David Steele, who has run this last CF for
the past few years and we're in agreement that he's willing to run this
CF again this year, assuming there's no objections.  What we've thought
to suggest is that you follow along with David as he runs this CF and
then offer to run the July CF.  Of course, we would encourage you and
David to communicate and for you to ask David any questions you have
about how he handles things as part of the CF.  This is in line with how
other CF managers have started out also.

Open to your thoughts, as well as those of anyone else who wishes to
comment.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Dumping/restoring fails on inherited generated column

2021-02-03 Thread Peter Eisentraut

On 2021-01-29 17:41, Tom Lane wrote:

Also, in the example from [2],

d3=# create table pp1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE
d3=# create table cc1 (a int, b int GENERATED ALWAYS AS (a * 3) STORED);
CREATE TABLE
d3=# alter table cc1 inherit pp1;
ALTER TABLE

pg_dump now omits to dump cc1's generation expression, which seems
strictly worse than before.  Admittedly, the backend likely ought to
be rejecting this scenario, but it doesn't do so today.

[2]https://www.postgresql.org/message-id/661371.1601398006%40sss.pgh.pa.us


Here is a WIP patch to address this.  Probably needs another look for 
column number mapping and all the usual stuff, but the basic idea should 
be okay.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/
From bd6ee462e5ad4a6514940e7e78d40c4f28f3dc3f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 3 Feb 2021 20:14:05 +0100
Subject: [PATCH] WIP: Fix ALTER TABLE / INHERIT with generated columns

When running ALTER TABLE t2 INHERIT t1, we must check that columns in
t2 that correspond to a generated column in t1 are also generated and
have the same generation expression.  Otherwise, this would allow
creating setups that a normal CREATE TABLE sequence would not allow.
---
 src/backend/commands/tablecmds.c| 49 +
 src/test/regress/expected/generated.out | 19 ++
 src/test/regress/sql/generated.sql  | 13 +++
 3 files changed, 81 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 420991e315..3d4fb4ce7e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13963,6 +13963,55 @@ MergeAttributesIntoExisting(Relation child_rel, 
Relation parent_rel)
 errmsg("column \"%s\" in child 
table must be marked NOT NULL",

attributeName)));
 
+   /*
+* If parent column generated, child column must be, 
too.
+*/
+   if (attribute->attgenerated && !childatt->attgenerated)
+   ereport(ERROR,
+   
(errcode(ERRCODE_DATATYPE_MISMATCH),
+errmsg("column \"%s\" in child 
table must be a generated column",
+   
attributeName)));
+
+   /*
+* Check that both generation expressions match.
+*/
+   if (attribute->attgenerated && childatt->attgenerated)
+   {
+   TupleConstr *child_constr = 
child_rel->rd_att->constr;
+   TupleConstr *parent_constr = 
parent_rel->rd_att->constr;
+   Node   *child_expr = NULL;
+   Node   *parent_expr = NULL;
+
+   Assert(child_constr != NULL);
+   Assert(parent_constr != NULL);
+
+   for (int i = 0; i < child_constr->num_defval; 
i++)
+   {
+   if (child_constr->defval[i].adnum == 
childatt->attnum)
+   {
+   child_expr = 
stringToNode(child_constr->defval[i].adbin);
+   break;
+   }
+   }
+   Assert(child_expr != NULL);
+
+   for (int i = 0; i < parent_constr->num_defval; 
i++)
+   {
+   if (parent_constr->defval[i].adnum == 
attribute->attnum)
+   {
+   parent_expr = 
stringToNode(parent_constr->defval[i].adbin);
+   break;
+   }
+   }
+   Assert(parent_expr != NULL);
+
+   if (!equal(child_expr, parent_expr))
+   ereport(ERROR,
+   
(errcode(ERRCODE_DATATYPE_MISMATCH),
+errmsg("column \"%s\" in child 
table has a conflicting generation expression",
+   
attributeName)));
+   }
+
/*
 * OK, bump the child column's inheritance count.  (If 
we fail
 * later on, this change will just roll back.)
diff --git a/src/test/regress/expected/gene

Re: Key management with tests

2021-02-03 Thread Bruce Momjian
On Wed, Feb  3, 2021 at 10:33:57AM -0500, Stephen Frost wrote:
> > I am thinking group read-access might be a requirement for cluster file
> > encryption to be effective.
> 
> People certainly do use group read-access, but I don't see that as being
> a requirement for cluster file encryption to be effective, it's just one
> thing TDE can address, among others, as discussed.

Agreed.

> > This also does not protect against users who have read access to
> > database process memory — all in-memory data pages and data
> > encryption keys are stored unencrypted in memory, so an attacker who
> > --> is able to read memory can decrypt the entire cluster.  The Postgres
> > --> operating system user and the operating system administrator, e.g.,
> > --> the root user, have such access.
> 
> That's helpful, +1.

Good.

> > > Uh, well, they could modify postgresql.conf to change the script to save
> > > the secret returned by the script before returning it to the PG server. 
> > > We could require postgresql.conf to be somewhere secure, but then how do
> > > we know that is secure?  I just don't see a clean solution here, but the
> > > idea that you write and then wait for the key to show up seems like a
> > > very valid way of attack, and it took me a while to be able to
> > > articulate it.
> 
> postgresql.conf isn't always writable by the postgres user, though
> postgresql.auto.conf is likely to always be.  I'm not sure how much of a
> concern that is, but it we wanted to take steps to explicitly address
> this issue, we could have some kind of 'secure' postgresql.conf file
> which we would encourage users to make owned by root and whose values
> wouldn't be allowed to be overridden once set.

Well, I think there is a lot more than postgresql.conf to worry about ---
see below.

> > Let's suppose you lock down your cluster --- the non-PGDATA files are
> > owned by root, postgresql.conf and pg_hba.conf are moved out of PGDATA
> > and are not writable by the database OS user, or we have the PGDATA
> > directory on another server, so the adversary can only write to the
> > remote PGDATA directory.
> > 
> > What can they do?  Well, they can't modify pg_proc to add a shared
> > library since pg_proc is encrypted, so we have to focus on files needed
> > before encryption starts or files that can't be easily encrypted.
> 
> This isn't accurate- just because it's encrypted doesn't mean they can't
> modify it.  That's exactly why integrity is important, because an
> attacker absolutely could modify the files directly and potentially
> exploit the system through those modifications.

They can't easily modify it to inject a shared object referenced into a
system column, was my point --- also see below.

> > They could create postgresql.conf.auto in PGDATA, and modify
> > cluster_key_command to capture the key, or they could modify preload
> > libraries or archive command to call a command to read memory as the PG
> > OS user and write the key out somewhere, or use the key to rewrite the
> > database files --- those wouldn't even need a database restart, just a
> > reload.
> 
> They would need to actually be able to effect that reload though.  This
> is where the question comes up as to just what attack vector we're
> trying to address.  It's certainly possible that an attacker has only
> access to the stored data in an off-line fashion (eg: a hard drive that
> was mistakenly thrown away without being properly wiped) and that's one
> of the cases which is addressed by cluster encryption.  An attacker
> might have access to the LUN that PG is running on but not to the
> running server itself, which it seems like is what you're contemplating
> here.  That's a much harder attack vector to fully protect against and
> we might need to do more than we're currently contemplating to address
> it- but I don't think we necessarily must solve for all cases in the
> first pass at this.

See below.

> > They could also modify pg_xact files so that, even though the heap/index
> > files are encrypted, how the contents of those files are interpreted
> > would change.
> 
> Yes, ideally, we'd encrypt/integrity check just about every part of the
> running system and that's one area the patch doesn't address- things
> like temporary files and other parts.

It is worse than that --- see below.

> > In summary, to detect malicious user writes, you would need to protect
> > the files used before encryption starts (root owned or owned by another
> > user?), and encrypt all files after encryption starts --- any other
> > approach would probably leave open attack vectors, and I don't think
> > there is sufficient community desire to add such boundaries.
> 
> There's going to be some attack vectors that TDE doesn't address.  We
> should identify and document those where we're able to.  We could offer
> up some mitigations (eg: strongly suggest monitoring of key utilization
> such that if the KEK is used without a reboot of the system or simil

Re: Removing support for COPY FROM STDIN in protocol version 2

2021-02-03 Thread Tom Lane
Heikki Linnakangas  writes:
> Since we're on a removal-spree, it'd also be nice to get rid of the 
> "fast-path" function call interface, PQfn(). However, libpq is using it 
> internally in the lo_*() functions, so if we remove it from the server, 
> lo_*() will stop working with old libpq versions. It would be good to 
> change those functions now to use PQexecParams() instead, so that we 
> could remove the fast-path server support in the future.

I'm disinclined to touch that.  It is considered part of protocol v3,
and there is no very good reason to suppose that nothing but libpq
is using it.  Besides, what would it really save?  fastpath.c has
not been a source of maintenance problems.

regards, tom lane




Re: Extensions not dumped when --schema is used

2021-02-03 Thread Guillaume Lelarge
Hi,

Thanks for the review.

Le mer. 3 févr. 2021 à 18:33, Asif Rehman  a écrit :

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> The patch applies cleanly and looks fine to me. However consider this
> scenario.
>
> - CREATE SCHEMA foo;
> - CREATE EXTENSION file_fdw WITH SCHEMA foo;
> - pg_dump  --file=/tmp/test.sql --exclude-schema=foo postgres
>
> This will still include the extension 'file_fdw' in the backup script.
> Shouldn't it be excluded as well?
>
> The new status of this patch is: Waiting on Author
>

This behaviour is already there without my patch, and I think it's a valid
behaviour. An extension doesn't belong to a schema. Its objects do, but the
extension doesn't.


-- 
Guillaume.


Re: Extensions not dumped when --schema is used

2021-02-03 Thread Asif Rehman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The patch applies cleanly and looks fine to me. However consider this scenario.

- CREATE SCHEMA foo;
- CREATE EXTENSION file_fdw WITH SCHEMA foo;
- pg_dump  --file=/tmp/test.sql --exclude-schema=foo postgres

This will still include the extension 'file_fdw' in the backup script. 
Shouldn't it be excluded as well?

The new status of this patch is: Waiting on Author


Re: Removing support for COPY FROM STDIN in protocol version 2

2021-02-03 Thread Heikki Linnakangas

On 03/02/2021 18:29, Tom Lane wrote:

Alvaro Herrera  writes:

On 2021-Feb-03, Tom Lane wrote:

I have a vague recollection that JDBC users still like to use
protocol 2 for some reason --- is that out of date?



[ yes, since 2016 ]


Then let's kill it dead, server and libpq both.


Ok, works for me. I'll prepare a larger patch to do that.

Since we're on a removal-spree, it'd also be nice to get rid of the 
"fast-path" function call interface, PQfn(). However, libpq is using it 
internally in the lo_*() functions, so if we remove it from the server, 
lo_*() will stop working with old libpq versions. It would be good to 
change those functions now to use PQexecParams() instead, so that we 
could remove the fast-path server support in the future.


- Heikki




Re: Removing support for COPY FROM STDIN in protocol version 2

2021-02-03 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Feb-03, Tom Lane wrote:
>> I have a vague recollection that JDBC users still like to use
>> protocol 2 for some reason --- is that out of date?

> [ yes, since 2016 ]

Then let's kill it dead, server and libpq both.

regards, tom lane




Re: join plan with unexpected var clauses

2021-02-03 Thread Tom Lane
Luc Vlaming  writes:
> Given the testcase we see that the outer semi join tries to join the 
> outer with the inner table id columns, even though the middle table id 
> column is also there. Is this expected behavior?

I don't see anything greatly wrong with it.  The planner has concluded
that _inner.id2 and middle.id1 are part of an equivalence class, so it
can form the top-level join by equating _outer.id3 to either of them.
AFAIR that choice is made at random --- there's certainly not any logic
that thinks about "well, the intermediate join output could be a bit
narrower if we choose this one instead of that one".

I think "made at random" actually boils down to "take the first usable
member of the equivalence class".  If I switch around the wording of
the first equality condition:

   ... select 1 from _inner where middle.id1 = _inner.id2

then I get a plan where the top join uses middle.id1.  However,
it's still propagating both middle.id1 and _inner.id2 up through
the bottom join, so that isn't buying anything efficiency-wise.

regards, tom lane




Re: Removing support for COPY FROM STDIN in protocol version 2

2021-02-03 Thread Alvaro Herrera
On 2021-Feb-03, Tom Lane wrote:

> I have a vague recollection that JDBC users still like to use
> protocol 2 for some reason --- is that out of date?

2016:

commit c3d8571e53cc5b702dae2f832b02c872ad44c3b7
Author: Vladimir Sitnikov 
AuthorDate: Sat Aug 6 12:22:17 2016 +0300
CommitDate: Sat Aug 13 11:27:16 2016 +0300

fix: support cases when user-provided queries have 'returning'

This change includes: drop v2 protocol support, and query parsing 
refactoring.
Currently query parse cache is still per-connection, however 
"returningColumNames"
are part of cache key, thus the parse cache can be made global.

This fixes #488 (see org.postgresql.test.jdbc3.GeneratedKeysTest)

This commit does remove all files in
pgjdbc/src/main/java/org/postgresql/core/v2/, leaving only "v3/".

-- 
Álvaro Herrera   Valdivia, Chile
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)




Re: a curious case of force_parallel_mode = on with jit'ing

2021-02-03 Thread Tom Lane
Amit Langote  writes:
> Has anybody seen this:

Works for me on HEAD (using RHEL8.3, gcc 8.3.1, LLVM 10.0.1).

regards, tom lane




Re: Removing support for COPY FROM STDIN in protocol version 2

2021-02-03 Thread Tom Lane
Heikki Linnakangas  writes:
> I propose that we remove server support for COPY FROM STDIN with 
> protocol version 2, per attached patch. Even if we could still support 
> it, it would be a very rarely used and tested codepath, prone to bugs. 
> Perhaps we could remove support for the old protocol altogether, but I'm 
> not proposing that we go that far just yet.

I'm not really on board with half-baked removal of protocol 2.
If we're going to kill it we should just kill it altogether.
(The argument that it's untested surely applies to the rest
of the P2 code as well.)

I have a vague recollection that JDBC users still like to use
protocol 2 for some reason --- is that out of date?

regards, tom lane




Removing support for COPY FROM STDIN in protocol version 2

2021-02-03 Thread Heikki Linnakangas

Hi,

The server still supports the old protocol version 2. Protocol version 3 
was introduced in PostgreSQL 7.4, so there shouldn't be many clients 
around anymore that don't support it.


COPY FROM STDIN is particularly problematic with the old protocol, 
because the end-of-copy can only be detected by the \. marker. So the 
server has to read the input one byte at a time, and check for \. as it 
goes. At [1], I'm working on a patch to change the way the encoding 
conversion is performed in COPY FROM, so that we convert the data in 
larger chunks, before scanning the input for line boundaries. We can't 
do that safely in the old protocol.


I propose that we remove server support for COPY FROM STDIN with 
protocol version 2, per attached patch. Even if we could still support 
it, it would be a very rarely used and tested codepath, prone to bugs. 
Perhaps we could remove support for the old protocol altogether, but I'm 
not proposing that we go that far just yet.


[1] 
https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi


- Heikki
>From 21ce10dd2bf43c6924645bff4d40d18663835dcf Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 3 Feb 2021 17:40:47 +0200
Subject: [PATCH 1/1] Remove support for COPY FROM with protocol version 2.

I'm working on a patch to refactor the way the encoding conversion is
performed, so that we convert the data in larger chunks, before scanning
the input for line boundaries. We can't do that, if we cannot safely try
to read ahead data past the end-of-copy marker. With the old protocol
gone, we can safely read as much as we want.
---
 src/backend/commands/copyfrom.c  |   7 -
 src/backend/commands/copyfromparse.c | 162 +++
 src/backend/commands/copyto.c|   2 +-
 src/include/commands/copyfrom_internal.h |   5 +-
 4 files changed, 50 insertions(+), 126 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c39cc736ed2..6d43d056cca 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1125,13 +1125,6 @@ CopyFrom(CopyFromState cstate)
 
 	MemoryContextSwitchTo(oldcontext);
 
-	/*
-	 * In the old protocol, tell pqcomm that we can process normal protocol
-	 * messages again.
-	 */
-	if (cstate->copy_src == COPY_OLD_FE)
-		pq_endmsgread();
-
 	/* Execute AFTER STATEMENT insertion triggers */
 	ExecASInsertTriggers(estate, target_resultRelInfo, cstate->transition_capture);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 4c74067f849..e8497cbdf00 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -46,21 +46,6 @@
  * empty statements.  See http://www.cit.gu.edu.au/~anthony/info/C/C.macros.
  */
 
-/*
- * This keeps the character read at the top of the loop in the buffer
- * even if there is more than one read-ahead.
- */
-#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \
-if (1) \
-{ \
-	if (raw_buf_ptr + (extralen) >= copy_buf_len && !hit_eof) \
-	{ \
-		raw_buf_ptr = prev_raw_ptr; /* undo fetch */ \
-		need_data = true; \
-		continue; \
-	} \
-} else ((void) 0)
-
 /* This consumes the remainder of the buffer and breaks */
 #define IF_NEED_REFILL_AND_EOF_BREAK(extralen) \
 if (1) \
@@ -118,7 +103,7 @@ static int	CopyGetData(CopyFromState cstate, void *databuf,
 		int minread, int maxread);
 static inline bool CopyGetInt32(CopyFromState cstate, int32 *val);
 static inline bool CopyGetInt16(CopyFromState cstate, int16 *val);
-static bool CopyLoadRawBuf(CopyFromState cstate);
+static bool CopyLoadRawBuf(CopyFromState cstate, int minread);
 static int	CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes);
 
 void
@@ -144,14 +129,9 @@ ReceiveCopyBegin(CopyFromState cstate)
 	else
 	{
 		/* old way */
-		if (cstate->opts.binary)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("COPY BINARY is not supported to stdout or from stdin")));
-		pq_putemptymessage('G');
-		/* any error in old protocol will make us lose sync */
-		pq_startmsgread();
-		cstate->copy_src = COPY_OLD_FE;
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FROM STDIN is not supported in protocol version 2")));
 	}
 	/* We *must* flush here to ensure FE knows it can send. */
 	pq_flush();
@@ -225,27 +205,9 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not read from COPY file: %m")));
-			if (bytesread == 0)
+			if (bytesread < maxread)
 cstate->reached_eof = true;
 			break;
-		case COPY_OLD_FE:
-
-			/*
-			 * We cannot read more than minread bytes (which in practice is 1)
-			 * because old protocol doesn't have any clear way of separating
-			 * the COPY stream from following data.  This is slow, but not any
-			 * slower than the code path was originally, and we don't care
-			 * m

Re: Key management with tests

2021-02-03 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Feb  1, 2021 at 07:47:57PM -0500, Bruce Momjian wrote:
> > On Mon, Feb  1, 2021 at 06:31:32PM -0500, Stephen Frost wrote:
> > > * Bruce Momjian (br...@momjian.us) wrote:
> > > >   The purpose of cluster file encryption is to prevent users with read
> > > >   access to the directories used to store database files and write-ahead
> > > >   log files from being able to access the data stored in those files.
> > > >   For example, when using cluster file encryption, users who have read
> > > >   access to the cluster directories for backup purposes will not be able
> > > >   to decrypt the data stored in these files.  It also protects against
> > > >   decrypted data access after media theft.
> > > 
> > > That's one valid use-case and it particularly makes sense to consider,
> > > now that we support group read-access to the data cluster.  The last
> > 
> > Do enough people use group read-access to be useful?
> 
> I am thinking group read-access might be a requirement for cluster file
> encryption to be effective.

People certainly do use group read-access, but I don't see that as being
a requirement for cluster file encryption to be effective, it's just one
thing TDE can address, among others, as discussed.

> > > line seems a bit unclear- I would update it to say:
> > > Cluster file encryption also provides data-at-rest security, protecting
> > > users from data loss should the physical media on which the cluster is
> > > stored be stolen, improperly deprovisioned (not wiped or destroyed), or
> > > otherwise ends up in the hands of an attacker.
> > 
> > I have split the section into three paragraphs, trimmed down some of the
> > suggested text, and added it.  Full version below.
> 
> Here is an updated doc description of memory reading:
> 
>   This also does not protect against users who have read access to
>   database process memory — all in-memory data pages and data
>   encryption keys are stored unencrypted in memory, so an attacker who
> -->   is able to read memory can decrypt the entire cluster.  The Postgres
> -->   operating system user and the operating system administrator, e.g.,
> -->   the root user, have such access.

That's helpful, +1.

> > > >   File system write access can allow for unauthorized file system data
> > > >   decryption if the writes can be used to weaken the system's security
> > > >   and this weakened system is later supplied with externally-stored 
> > > > keys.
> > > 
> > > This isn't very clear as to exactly what the concern is or how an
> > > attacker would be able to thwart the system if they had write access to
> > > it.  An attacker with write access could possibly attempt to replace the
> > > existing keys, but with the key wrapping that we're using, that should
> > > result in just a decryption failure (unless, of course, the attacker has
> > > the actual KEK that was used, but that's not terribly interesting to
> > > worry about since then they could just go access the files directly).
> > 
> > Uh, well, they could modify postgresql.conf to change the script to save
> > the secret returned by the script before returning it to the PG server. 
> > We could require postgresql.conf to be somewhere secure, but then how do
> > we know that is secure?  I just don't see a clean solution here, but the
> > idea that you write and then wait for the key to show up seems like a
> > very valid way of attack, and it took me a while to be able to
> > articulate it.

postgresql.conf isn't always writable by the postgres user, though
postgresql.auto.conf is likely to always be.  I'm not sure how much of a
concern that is, but it we wanted to take steps to explicitly address
this issue, we could have some kind of 'secure' postgresql.conf file
which we would encourage users to make owned by root and whose values
wouldn't be allowed to be overridden once set.

> Let's suppose you lock down your cluster --- the non-PGDATA files are
> owned by root, postgresql.conf and pg_hba.conf are moved out of PGDATA
> and are not writable by the database OS user, or we have the PGDATA
> directory on another server, so the adversary can only write to the
> remote PGDATA directory.
> 
> What can they do?  Well, they can't modify pg_proc to add a shared
> library since pg_proc is encrypted, so we have to focus on files needed
> before encryption starts or files that can't be easily encrypted.

This isn't accurate- just because it's encrypted doesn't mean they can't
modify it.  That's exactly why integrity is important, because an
attacker absolutely could modify the files directly and potentially
exploit the system through those modifications.

> They could create postgresql.conf.auto in PGDATA, and modify
> cluster_key_command to capture the key, or they could modify preload
> libraries or archive command to call a command to read memory as the PG
> OS user and write the key out somewhere, or use the key to rewrite the
> data

Re: Dumping/restoring fails on inherited generated column

2021-02-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 2021-01-29 17:41, Tom Lane wrote:
>> However ... this doesn't solve all the cases noted in this thread.
>> In the first example I gave at [1],
>> d3=# create table parent (f1 int default 2);
>> CREATE TABLE
>> d3=# create table child (f1 int default 3) inherits(parent);
>> NOTICE:  merging column "f1" with inherited definition
>> CREATE TABLE
>> d3=# create table child2() inherits(parent);
>> CREATE TABLE
>> d3=# alter table child2 alter column f1 set default 42;
>> ALTER TABLE
>> 
>> pg_dump still fails to restore child2.f1's non-inherited default.

> I can't tell what the problem is in this example.  I tried with PG11, 
> 12, and master, and the schema dump comes out with those same four 
> commands and they restore correctly AFAICT.

Oh!  Trying it now, I see that the child2 default does get restored
as a "separate default" object:

ALTER TABLE ONLY public.child2 ALTER COLUMN f1 SET DEFAULT 42;

This is a bit weird, because you'd think it would be handled
the same as the other child's default, but it isn't; that
one comes out as

CREATE TABLE public.child (
f1 integer DEFAULT 3
)
INHERITS (public.parent);

while child2 looks like

CREATE TABLE public.child2 (
)
INHERITS (public.parent);


I now suspect that I'd seen this dump of "child2" and missed the later
ALTER.  So no bug here, just pilot error.  Sorry for the noise.

regards, tom lane




Re: Dumping/restoring fails on inherited generated column

2021-02-03 Thread Peter Eisentraut

On 2021-01-29 17:41, Tom Lane wrote:

However ... this doesn't solve all the cases noted in this thread.
In the first example I gave at [1],

d3=# create table parent (f1 int default 2);
CREATE TABLE
d3=# create table child (f1 int default 3) inherits(parent);
NOTICE:  merging column "f1" with inherited definition
CREATE TABLE
d3=# create table child2() inherits(parent);
CREATE TABLE
d3=# alter table child2 alter column f1 set default 42;
ALTER TABLE

pg_dump still fails to restore child2.f1's non-inherited default.
That's probably a pre-existing problem, since it doesn't involve
GENERATED at all, but we shouldn't forget about it.



[1] https://www.postgresql.org/message-id/660925.1601397436%40sss.pgh.pa.us


I can't tell what the problem is in this example.  I tried with PG11, 
12, and master, and the schema dump comes out with those same four 
commands and they restore correctly AFAICT.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2021-02-03 Thread Peter Eisentraut

On 2021-01-17 14:38, Magnus Hagander wrote:

On Thu, Jan 7, 2021 at 11:53 AM Peter Eisentraut
 wrote:


After pondering this again, I think we can go with initdb
--no-instructions, as in your patch.

As a minor nitpick, I would leave out the

  else
  printf(_("\nSuccess.\n"));

in the --no-instructions case.


OK, thanks. I have applied it as such, with that message moved inside
the if statement.


It appears that there is an extra blank line in the initdb output before 
"Success" now.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Tom Lane
Bharath Rupireddy  writes:
> Can we have a new function, say pg_postgres_pid(), to return
> postmaster PID similar to pg_backend_pid()?

I'm disinclined to think that this is a good idea from a security
perspective.  Maybe if it's superuser-only it'd be ok (since a
superuser would have other routes to discovering the value anyway).

regards, tom lane




Re: Improve new hash partition bound check error messages

2021-02-03 Thread Peter Eisentraut

On 2021-02-02 13:26, Heikki Linnakangas wrote:

How about this?

CREATE TABLE fail_part PARTITION OF hash_parted FOR VALUES WITH (MODULUS
25, REMAINDER 3);
ERROR:  every hash partition modulus must be a factor of the next larger
modulus
DETAIL:  25 is not divisible by 10, the modulus of existing partition
"hpart_1"


I don't know if we can easily get the name of the existing partition. 
I'll have to check that.


I'm worried that this phrasing requires the user to understand that 
"divisible by" is related to "factor of", which is of course correct but 
introduces yet more complexity into this.


I'll play around with this a bit more.





Re: Can we have a new SQL callable function to get Postmaster PID?

2021-02-03 Thread Euler Taveira
On Wed, Feb 3, 2021, at 3:12 AM, Bharath Rupireddy wrote:
> Can we have a new function, say pg_postgres_pid(), to return
> postmaster PID similar to pg_backend_pid()?
It is not that difficult to read the postmaster PID using existing functions.

postgres=# SELECT (regexp_match(pg_read_file('postmaster.pid'), '\d+'))[1];
regexp_match
--
13496
(1 row)

While investigating an issue, you are probably interested in a backend PID or
one of the auxiliary processes. In both cases, it is easier to obtain the PIDs.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


a curious case of force_parallel_mode = on with jit'ing

2021-02-03 Thread Amit Langote
Hi,

Has anybody seen this:

$ psql regression
Timing is on.
psql (14devel)
Type "help" for help.

regression=# set force_parallel_mode to on;
SET
Time: 0.888 ms
regression=# set jit to on;
SET
Time: 0.487 ms
regression=# set jit_above_cost to 1;
SET
Time: 0.476 ms
regression=# SELECT p1.f1, p2.f1, p1.f1 * p2.f1 FROM POINT_TBL p1,
POINT_TBL p2 WHERE p1.f1[0] < 1;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
ERROR:  value out of range: underflow
CONTEXT:  parallel worker
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 670.801 ms
!?> \q

(gdb) bt
#0  0x7f57ac6508cd in std::_Function_handler >
()>)::{lambda(unsigned long, llvm::object::ObjectFile
const&)#3}>::_M_invoke(std::_Any_data const&, unsigned long,
llvm::object::ObjectFile const&) () from
/opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#1  0x7f57ac652578 in
llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject
>::~ConcreteLinkedObject() () from
/opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#2  0x7f57ac6527aa in std::_Rb_tree
> >, std::_Select1st
> > >, std::less, std::allocator
> > > >::_M_erase(std::_Rb_tree_node
> > >*) () from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#3  0x7f57ac65ec91 in
llvm::OrcCBindingsStack::~OrcCBindingsStack() () from
/opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#4  0x7f57ac65efaa in LLVMOrcDisposeInstance () from
/opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#5  0x7f57ae62d7bf in llvm_shutdown (code=1, arg=0) at llvmjit.c:926
#6  0x00916d00 in proc_exit_prepare (code=1) at ipc.c:209
#7  0x00916bdb in proc_exit (code=1) at ipc.c:107
#8  0x0087e8d6 in StartBackgroundWorker () at bgworker.c:832
#9  0x00892fa7 in do_start_bgworker (rw=0x2dae8a0) at postmaster.c:5833
#10 0x00893355 in maybe_start_bgworkers () at postmaster.c:6058
#11 0x00892390 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5215
#12 
#13 0x7f57d4e20933 in __select_nocancel () from /lib64/libc.so.6
#14 0x0088e00e in ServerLoop () at postmaster.c:1694
#15 0x0088d9fd in PostmasterMain (argc=5, argv=0x2d863f0) at
postmaster.c:1402
#16 0x00791197 in main (argc=5, argv=0x2d863f0) at main.c:209
(gdb)

I can make this happen with PG > v12.  Maybe there's something wrong
with my LLVM installation but just thought to ask here just in case.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: libpq debug log

2021-02-03 Thread 'Alvaro Herrera'
On 2021-Feb-03, tsunakawa.ta...@fujitsu.com wrote:

> From: 'Alvaro Herrera' 
> > > + conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)
> 
> > The rationale for that second condition is this: if the memory allocated
> > is the initial size, we don't free memory, because it would just be
> > allocated of the same size next time, and that size is not very big, so
> > it's not a big deal if we just let it be, so that it is reused if we
> > call PQtrace() again later.  However, if the allocated size is larger
> > than default, then it is possible that some previous tracing run has
> > enlarged the trace struct to a very large amount of memory, and we don't
> > want to leave that in place.
> 
> Ah, understood.  In that case, num_fields should be max_fields.

Oh, of course.


-- 
Álvaro Herrera39°49'30"S 73°17'W
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."(Michael Brusser)




Re: libpq debug log

2021-02-03 Thread 'Alvaro Herrera'
On 2021-Feb-03, tsunakawa.ta...@fujitsu.com wrote:

> (39)
> +  of tracing.  If (flags & 
> PQTRACE_OUTPUT_TIMESTAMPS) is
> +  true, then timestamp is not printed with each message.
> 
> The flag name (OUTPUT) and its description (not printed) doesn't match.
> 
> I think you can use less programmatical expression like "If 
> flags contains 
> PQTRACE_OUTPUT_TIMESTAMPS".

I copied the original style from elsewhere in the manual.

> (41)
> +void
> +PQtraceEx(PGconn *conn, FILE *debug_port, int flags)
> +{

I'm not really sure about making this a separate API call. We could just
make it PQtrace() and increment the libpq so version.  I don't think
it's a big deal, frankly.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"El conflicto es el camino real hacia la unión"




Re: libpq debug log

2021-02-03 Thread Alvaro Herrera
On 2021-Feb-03, Kyotaro Horiguchi wrote:

> Looking the doc mentioned in the comment #39:
> 
> +  flags contains flag bits describing the operating 
> mode
> +  of tracing.  If (flags & 
> PQTRACE_OUTPUT_TIMESTAMPS) is
> +  true, then timestamp is not printed with each message.
> 
> PQTRACE_OUTPUT_TIMESTAMPS is designed to *inhibit* timestamps from
> being prepended. If that is actually intended, the symbol name should
> be PQTRACE_NOOUTPUT_TIMESTAMP. Otherwise, the doc need to be fixed.

I'm pretty sure I named the flag PQTRACE_SUPPRESS_TIMESTAMP (and I
prefer SUPPRESS to NOOUTPUT), because the idea is that the timestamp is
printed by default.  I think that's the sensible decision: applications
prefer to have timestamps, even if there's a tiny bit of overhead.  We
don't want to force them to pass a flag for that.  We only want the
no-timestamp behavior in order to be able to use it for libpq internal
testing.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."  (Brian Kernighan)




Re: Bug in COPY FROM backslash escaping multi-byte chars

2021-02-03 Thread Heikki Linnakangas

On 03/02/2021 15:38, John Naylor wrote:
On Wed, Feb 3, 2021 at 8:08 AM Heikki Linnakangas > wrote:

 >
 > Hi,
 >
 > While playing with COPY FROM refactorings in another thread, I noticed
 > corner case where I think backslash escaping doesn't work correctly.
 > Consider the following input:
 >
 > \么.foo

I've seen multibyte delimiters in the wild, so it's not as outlandish as 
it seems.


We don't actually support multi-byte characters as delimiters or quote 
or escape characters:


postgres=# copy copytest from 'foo' with (delimiter '么');
ERROR:  COPY delimiter must be a single one-byte character


The fix is simple enough, so +1.


Thanks, I'll commit and backpatch shortly.

- Heikki




Re: Bug in COPY FROM backslash escaping multi-byte chars

2021-02-03 Thread John Naylor
On Wed, Feb 3, 2021 at 8:08 AM Heikki Linnakangas  wrote:
>
> Hi,
>
> While playing with COPY FROM refactorings in another thread, I noticed
> corner case where I think backslash escaping doesn't work correctly.
> Consider the following input:
>
> \么.foo

I've seen multibyte delimiters in the wild, so it's not as outlandish as it
seems. The fix is simple enough, so +1.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Typo in tablesync comment

2021-02-03 Thread Amit Kapila
On Wed, Feb 3, 2021 at 1:35 PM Michael Paquier  wrote:
>
> On Wed, Feb 03, 2021 at 06:52:56PM +1100, Peter Smith wrote:
>
> > Maybe better to rewrite it more drastically?
> >
> > e.g
> > -
> >  *The catalog pg_subscription_rel is used to keep information about
> >  *subscribed tables and their state. The catalog holds all states
> >  *except SYNCWAIT and CATCHUP which are only in shared memory.
> > -
>
> Fine by me.
>

+1.

-- 
With Regards,
Amit Kapila.




Re: DROP TABLE can crash the replication sync worker

2021-02-03 Thread Amit Kapila
On Wed, Feb 3, 2021 at 2:53 PM Peter Smith  wrote:
>
> Hi Hackers.
>
> As discovered in another thread [master] there is an *existing* bug in
> the PG HEAD code which can happen if a DROP TABLE is done at same time
> a replication tablesync worker is running.
>
> It seems the table's relid that the sync worker is using gets ripped
> out from underneath it and is invalidated by the DROP TABLE. Any
> subsequent use of that relid will go wrong.
>

Where exactly did you pause the tablesync worker while dropping the
table? We acquire the lock on the table in LogicalRepSyncTableStart
and then keep it for the entire duration of tablesync worker so drop
table shouldn't be allowed.

-- 
With Regards,
Amit Kapila.




Re: Next Commitfest Manager.

2021-02-03 Thread Ibrar Ahmed
Hi,
Anyone else already volunteers that? It is my first time so need some
access, if all agree.

On Mon, Feb 1, 2021 at 10:17 PM Ibrar Ahmed  wrote:

> As Commitfest 2021-01 is now closed. I am volunteering to manage next
> commitfest.
>
>
> --
> Ibrar Ahmed
>


-- 
Ibrar Ahmed


Re: Single transaction in the tablesync worker?

2021-02-03 Thread Amit Kapila
On Wed, Feb 3, 2021 at 1:28 PM Ajin Cherian  wrote:
>
> On Tue, Feb 2, 2021 at 9:03 PM Ajin Cherian  wrote:
>
> > I am sorry, my above steps were not correct. I think the reason for
> > the failure I was seeing were some other steps I did prior to this. I
> > will recreate this and update you with the appropriate steps.
>
> The correct steps are as follows:
>
> Publisher:
>
> postgres=# CREATE TABLE tab_rep (a int primary key);
> CREATE TABLE
> postgres=# INSERT INTO tab_rep SELECT generate_series(1,100);
> INSERT 0 100
> postgres=# CREATE PUBLICATION tap_pub FOR ALL TABLES;
> CREATE PUBLICATION
>
> Subscriber:
> postgres=# CREATE TABLE tab_rep (a int primary key);
> CREATE TABLE
> postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
> dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
> NOTICE:  created replication slot "tap_sub" on publisher
> CREATE SUBSCRIPTION
> postgres=# ALTER SUBSCRIPTION tap_sub enable;
> ALTER SUBSCRIPTION
>
> Allow the tablesync to complete and then drop the subscription, the
> table remains full and restarting the subscription should fail with a
> constraint violation during tablesync but it does not.
>
>
> Subscriber:
> postgres=# drop subscription tap_sub ;
> NOTICE:  dropped replication slot "tap_sub" on publisher
> DROP SUBSCRIPTION
> postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
> dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
> NOTICE:  created replication slot "tap_sub" on publisher
> CREATE SUBSCRIPTION
> postgres=# ALTER SUBSCRIPTION tap_sub enable;
> ALTER SUBSCRIPTION
>
> This takes the subscriber into an error loop but no mention of what
> the error was:
>

Thanks for the report. The problem here was that the error occurred
when we were trying to copy the large data. Now, before fetching the
entire data we issued a rollback that led to this problem. I think the
alternative here could be to first fetch the entire data when the
error occurred then issue the following commands. Instead, I have
modified the patch to perform 'drop_replication_slot' in the beginning
if the relstate is datasync.  Do let me know if you can think of a
better way to fix this?

-- 
With Regards,
Amit Kapila.


v26-0001-Tablesync-Solution1.patch
Description: Binary data


Re: pg_replication_origin_drop API potential race condition

2021-02-03 Thread Amit Kapila
On Wed, Jan 27, 2021 at 4:58 AM Peter Smith  wrote:
>
> Hi Hackers.
>
> As discovered elsewhere [ak0125] there is a potential race condition
> in the pg_replication_origin_drop API
>
> The current code of pg_replication_origin_drop looks like:
> 
> roident = replorigin_by_name(name, false);
> Assert(OidIsValid(roident));
>
> replorigin_drop(roident, true);
> 
>
> Users cannot deliberately drop a non-existent origin
> (replorigin_by_name passes missing_ok = false) but there is still a
> small window where concurrent processes may be able to call
> replorigin_drop for the same valid roident.
>
> Locking within replorigin_drop guards against concurrent drops so the
> 1st execution will succeed, but then the 2nd execution would give
> internal cache error: elog(ERROR, "cache lookup failed for replication
> origin with oid %u", roident);
>
> Some ideas to fix this include:
> 1. Do nothing except write a comment about this in the code. The
> internal ERROR is not ideal for a user API there is no great harm
> done.
> 2. Change the behavior of replorigin_drop to be like
> replorigin_drop_IF_EXISTS, so the 2nd execution of this race would
> silently do nothing when it finds the roident is already gone.
> 3. Same as 2, but make the NOP behavior more explicit by introducing a
> new "missing_ok" parameter for replorigin_drop.
>

How about if we call replorigin_by_name() inside replorigin_drop after
acquiring the lock? Wouldn't that close this race condition? We are
doing something similar for pg_replication_origin_advance().

-- 
With Regards,
Amit Kapila.




pg_dump: Add const decorations

2021-02-03 Thread Peter Eisentraut
Something that came out of work on pg_dump recently.  I added const 
decorations to the *info arguments of the dump* functions, to clarify 
that they don't modify that argument.  Many other nearby functions 
modify their arguments, so this can help clarify these different APIs a bit.
From a2ec95a397a1e8a075a56079c6928ba3b1d2b6ce Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 3 Feb 2021 13:10:24 +0100
Subject: [PATCH] pg_dump: Add const decorations

Add const decorations to the *info arguments of the dump* functions,
to clarify that they don't modify that argument.  Many other nearby
functions modify their arguments, so this can help clarify these
different APIs a bit.
---
 src/bin/pg_dump/pg_backup.h  |   2 +-
 src/bin/pg_dump/pg_backup_archiver.h |   4 +-
 src/bin/pg_dump/pg_dump.c| 344 +--
 src/bin/pg_dump/pg_dump.h|   2 +-
 4 files changed, 176 insertions(+), 176 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 9d0056a569..eea9f30a79 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -248,7 +248,7 @@ typedef int DumpId;
  * Function pointer prototypes for assorted callback methods.
  */
 
-typedef int (*DataDumperPtr) (Archive *AH, void *userArg);
+typedef int (*DataDumperPtr) (Archive *AH, const void *userArg);
 
 typedef void (*SetupWorkerPtrType) (Archive *AH);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h 
b/src/bin/pg_dump/pg_backup_archiver.h
index a8ea5c7eae..9d0f03d562 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -382,7 +382,7 @@ struct _tocEntry
int nDeps;  /* number of 
dependencies */
 
DataDumperPtr dataDumper;   /* Routine to dump data for object */
-   void   *dataDumperArg;  /* Arg for above routine */
+   const void *dataDumperArg;  /* Arg for above routine */
void   *formatData; /* TOC Entry data specific to file 
format */
 
/* working state while dumping/restoring */
@@ -421,7 +421,7 @@ typedef struct _archiveOpts
const DumpId *deps;
int nDeps;
DataDumperPtr dumpFn;
-   void   *dumpArg;
+   const void *dumpArg;
 } ArchiveOpts;
 #define ARCHIVE_OPTS(...) &(ArchiveOpts){__VA_ARGS__}
 /* Called to add a TOC entry */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 39da742e32..3f1f404e4c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -159,8 +159,8 @@ static void expand_table_name_patterns(Archive *fout,
   
SimpleOidList *oids,
   bool 
strict_names);
 static NamespaceInfo *findNamespace(Oid nsoid);
-static void dumpTableData(Archive *fout, TableDataInfo *tdinfo);
-static void refreshMatViewData(Archive *fout, TableDataInfo *tdinfo);
+static void dumpTableData(Archive *fout, const TableDataInfo *tdinfo);
+static void refreshMatViewData(Archive *fout, const TableDataInfo *tdinfo);
 static void guessConstraintInheritance(TableInfo *tblinfo, int numTables);
 static void dumpComment(Archive *fout, const char *type, const char *name,
const char *namespace, const 
char *owner,
@@ -174,53 +174,53 @@ static void dumpSecLabel(Archive *fout, const char *type, 
const char *name,
 static int findSecLabels(Archive *fout, Oid classoid, Oid objoid,
  SecLabelItem **items);
 static int collectSecLabels(Archive *fout, SecLabelItem **items);
-static void dumpDumpableObject(Archive *fout, DumpableObject *dobj);
-static void dumpNamespace(Archive *fout, NamespaceInfo *nspinfo);
-static void dumpExtension(Archive *fout, ExtensionInfo *extinfo);
-static void dumpType(Archive *fout, TypeInfo *tyinfo);
-static void dumpBaseType(Archive *fout, TypeInfo *tyinfo);
-static void dumpEnumType(Archive *fout, TypeInfo *tyinfo);
-static void dumpRangeType(Archive *fout, TypeInfo *tyinfo);
-static void dumpUndefinedType(Archive *fout, TypeInfo *tyinfo);
-static void dumpDomain(Archive *fout, TypeInfo *tyinfo);
-static void dumpCompositeType(Archive *fout, TypeInfo *tyinfo);
-static void dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo);
-static void dumpShellType(Archive *fout, ShellTypeInfo *stinfo);
-static void dumpProcLang(Archive *fout, ProcLangInfo *plang);
-static void dumpFunc(Archive *fout, FuncInfo *finfo);
-static void dumpCast(Archive *fout, CastInfo *cast);
-static void dumpTransform(Archive *fout, TransformInfo *transform);
-static void dumpOpr(Archive *fout, OprInfo *oprinfo);
-static void dumpAccessMethod(Archive *fout, AccessMethodInfo *oprinfo);
-static void dumpOpclass(Archive *fout, OpclassInfo *opcinfo);
-static void dumpOpfamily(Archive *fout, Opf

Bug in COPY FROM backslash escaping multi-byte chars

2021-02-03 Thread Heikki Linnakangas

Hi,

While playing with COPY FROM refactorings in another thread, I noticed 
corner case where I think backslash escaping doesn't work correctly. 
Consider the following input:


\么.foo

I hope that came through in this email correctly as UTF-8. The string 
contains a sequence of: backslash, multibyte-character and a dot.


The documentation says:


Backslash characters (\) can be used in the COPY data to quote data
characters that might otherwise be taken as row or column delimiters


So I believe escaping multi-byte characters is supposed to work, and it 
usually does.


However, let's consider the same string in Big5 encoding (in hex escaped 
format):


\x5ca45c2e666f6f

The first byte 0x5c, is the backslash. The multi-byte character consists 
of two bytes: 0xa4 0x5c. Note that the second byte is equal to a backslash.


That confuses the parser in CopyReadLineText, so that you get an error:

postgres=# create table copytest (t text);
CREATE TABLE
postgres=# \copy copytest from 'big5-skip-test.data' with (encoding 'big5');
ERROR:  end-of-copy marker corrupt
CONTEXT:  COPY copytest, line 1

What happens is that when the parser sees the backslash, it looks ahead 
at the next byte, and when it's not a dot, it skips over it:



else if (!cstate->opts.csv_mode)

/*
 * If we are here, it means we found a 
backslash followed by
 * something other than a period.  In non-CSV 
mode, anything
 * after a backslash is special, so we skip 
over that second
 * character too.  If we didn't do that \\. 
would be
 * considered an eof-of copy, while in non-CSV 
mode it is a
 * literal backslash followed by a period.  In 
CSV mode,
 * backslashes are not special, so we want to 
process the
 * character after the backslash just like a 
normal character,
 * so we don't increment in those cases.
 */
raw_buf_ptr++;


However, in a multi-byte encoding that might "embed" ascii characters, 
it should skip over the next *character*, not byte.


Attached is a pretty straightforward patch to fix that. Anyone see a 
problem with this?


- Heikki
>From 632549e79dc98cb338d7fc22888d8b6237a47d10 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 3 Feb 2021 14:00:32 +0200
Subject: [PATCH 1/1] Fix a corner-case in COPY FROM backslash processing.

If a multi-byte character is escaped with a backslash in TEXT mode input,
we didn't always treat the escape correctly. If:

- a multi-byte character is escaped with a backslash, and
- the second byte of the character is 0x5C, i.e. the ASCII code of a
  backslash (\), and
- the next character is a dot (.), then

copyreadline we would incorrectly interpret the sequence as an end-of-copy
marker (\.). this can only happen in encodings that can "embed" ascii
characters as the second byte. one example of such sequence is
'\x5ca45c2e666f6f' in Big5 encoding. If you put that in a file, and
load it with COPY FROM, you'd incorrectly get an "end-of-copy marker
corrupt" error.
---
 src/backend/commands/copyfromparse.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 798e18e013..c20ec482db 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1084,7 +1084,7 @@ CopyReadLineText(CopyFromState cstate)
 break;
 			}
 			else if (!cstate->opts.csv_mode)
-
+			{
 /*
  * If we are here, it means we found a backslash followed by
  * something other than a period.  In non-CSV mode, anything
@@ -1095,8 +1095,17 @@ CopyReadLineText(CopyFromState cstate)
  * backslashes are not special, so we want to process the
  * character after the backslash just like a normal character,
  * so we don't increment in those cases.
+ *
+ * In principle, we would need to use pg_encoding_mblen() to
+ * skip over the character in some encodings, like at the end
+ * of the loop.  But if it's a multi-byte character, it cannot
+ * have any special meaning and skipping isn't necessary for
+ * correctness.  We can fall through to process it normally
+ * instead.
  */
-raw_buf_ptr++;
+if (!IS_HIGHBIT_SET(c2))
+	raw_buf_ptr++;
+			}
 		}
 
 		/*
-- 
2.30.0



Re: Dumping/restoring fails on inherited generated column

2021-02-03 Thread Peter Eisentraut

On 2021-01-29 17:41, Tom Lane wrote:

Peter Eisentraut  writes:

I've had another go at this, and I've found a solution that appears to
address all the issues I'm aware of.  It's all very similar to the
previously discussed patches.  The main difference is that previous
patches had attempted to use something like tbinfo->attislocal to
determine whether a column was inherited, but that's not correct.  This
patch uses the existing logic in flagInhAttrs() to find whether there is
a matching parent column with a generation expression.  I've added
pg_dump test cases here to check the different variations that the code
addresses.


This is a clear improvement on the current situation, and given that
this issue is over a year old, I think we should push and back-patch
this in time for February's releases.


done

I will continue working on the other issues that we have been discussing.


However ... this doesn't solve all the cases noted in this thread.
In the first example I gave at [1],

d3=# create table parent (f1 int default 2);
CREATE TABLE
d3=# create table child (f1 int default 3) inherits(parent);
NOTICE:  merging column "f1" with inherited definition
CREATE TABLE
d3=# create table child2() inherits(parent);
CREATE TABLE
d3=# alter table child2 alter column f1 set default 42;
ALTER TABLE

pg_dump still fails to restore child2.f1's non-inherited default.
That's probably a pre-existing problem, since it doesn't involve
GENERATED at all, but we shouldn't forget about it.

Also, in the example from [2],

d3=# create table pp1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE
d3=# create table cc1 (a int, b int GENERATED ALWAYS AS (a * 3) STORED);
CREATE TABLE
d3=# alter table cc1 inherit pp1;
ALTER TABLE

pg_dump now omits to dump cc1's generation expression, which seems
strictly worse than before.  Admittedly, the backend likely ought to
be rejecting this scenario, but it doesn't do so today.

Neither of these points seem like a reason to reject this patch,
they're just adjacent work that remains to be done.

regards, tom lane

[1] https://www.postgresql.org/message-id/660925.1601397436%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/661371.1601398006%40sss.pgh.pa.us





--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Correct comment in StartupXLOG().

2021-02-03 Thread Amul Sul
On Wed, Feb 3, 2021 at 2:48 PM Dilip Kumar  wrote:
>
> On Wed, Feb 3, 2021 at 2:28 PM Amul Sul  wrote:
> >
> > Hi,
> >
> > SharedRecoveryState member of XLogCtl is no longer a boolean flag, got 
> > changes
> > in 4e87c4836ab9 to enum but, comment referring to it still referred as the
> > boolean flag which is pretty confusing and incorrect.
>
> +1 for the comment change
>
> > Also, the last part of the same comment is as:
> >
> > " .. although the boolean flag to allow WAL is probably atomic in
> > itself, .",
> >
> > I am a bit confused here too about saying "atomic" to it, is that correct?
> > I haven't done anything about it, only replaced the "boolean flag" to 
> > "recovery
> > state" in the attached patch.
>
> I don't think the atomic is correct, it's no more boolean so it is
> better we get rid of this part of the comment

Thanks for the confirmation.  Updated that part in the attached version.

Regards,
Amul


v2_fix_comment_in_StartupXLOG.patch
Description: Binary data


Re: logical replication worker accesses catalogs in error context callback

2021-02-03 Thread Bharath Rupireddy
On Sat, Jan 30, 2021 at 8:23 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-01-28 11:14:03 +0530, Amit Kapila wrote:
> > On Wed, Jan 27, 2021 at 9:38 AM Bharath Rupireddy
> >  wrote:
> > >
> > > On Wed, Jan 27, 2021 at 7:48 AM Zhihong Yu  wrote:
> > >
> > > Thanks for pointing to the changes in the commit message. I corrected
> > > them. Attaching v4 patch set, consider it for further review.
> > >
> >
> > About 0001, have we tried to reproduce the actual bug here which means
> > when the error_callback is called we should face some problem? I feel
> > with the correct testcase we should hit the Assert
> > (Assert(IsTransactionState());) in SearchCatCacheInternal because
> > there we expect the transaction to be in a valid state. I understand
> > that the transaction is in a broken state at that time but having a
> > testcase to hit the actual bug makes it easy to test the fix.
>
> I think it's easy to hit - add a trivial DEBUG message to XLogWrite(),
> and run anything involving logical rep using a low enough log
> level. There might even be enough messages with a low enough level
> already somewhere in the relevant paths.

I'm not sure how adding a DEBUG message to XLogWrite() would ensure
the logical replication worker on the subscriber system enters
slot_store_error_callback and hits the Assert(IsTransactionState());?
Could you please elaborate? Or I may be missing something here to
understand.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-02-03 Thread Michael Paquier
On Wed, Feb 03, 2021 at 01:35:26PM +0300, Alexey Kondratov wrote:
> This check for OidIsValid() seems to be excessive, since you moved the whole
> ACL check under 'if (tablespacename != NULL)' here.

That's more consistent with ATPrepSetTableSpace().

> +SELECT relid, parentrelid, level FROM
> pg_partition_tree('tbspace_reindex_part_index')
> +  ORDER BY relid, level;
> +SELECT relid, parentrelid, level FROM
> pg_partition_tree('tbspace_reindex_part_index')
> +  ORDER BY relid, level;
> 
> Why do you do the same twice in a row? It looks like a typo. Maybe it was
> intended to be called for partitioned table AND index.

Yes, my intention was to show the tree of the set of tables.  It is
not really interesting for this test anyway, so let's just remove this
extra query.
--
Michael


signature.asc
Description: PGP signature


Re: logical replication worker accesses catalogs in error context callback

2021-02-03 Thread Bharath Rupireddy
On Thu, Jan 28, 2021 at 11:14 AM Amit Kapila  wrote:
>
> On Wed, Jan 27, 2021 at 9:38 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jan 27, 2021 at 7:48 AM Zhihong Yu  wrote:
> >
> > Thanks for pointing to the changes in the commit message. I corrected
> > them. Attaching v4 patch set, consider it for further review.
> >
>
> About 0001, have we tried to reproduce the actual bug here which means
> when the error_callback is called we should face some problem? I feel
> with the correct testcase we should hit the Assert
> (Assert(IsTransactionState());) in SearchCatCacheInternal because
> there we expect the transaction to be in a valid state. I understand
> that the transaction is in a broken state at that time but having a
> testcase to hit the actual bug makes it easy to test the fix.

I have not tried hitting the Assert(IsTransactionState() in
SearchCatCacheInternal. To do that, I need to figure out hitting
"incorrect binary data format in logical replication column" error in
either slot_modify_data or slot_store_data so that we will enter the
error callback slot_store_error_callback and then IsTransactionState()
should return false i.e. txn shouldn't be in TRANS_INPROGRESS. I'm not
sure how to do these things. I also looked at the commits 42c80c696e9
and ddfc9cb05 that added Assert(IsTransactionState()); in
SearchCatCacheInternal and RelationIdGetRelation for any relevant
tests or discussions, but I couldn't find anything.

Any further ideas in reproducing the issue?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-02-03 Thread Michael Paquier
On Wed, Feb 03, 2021 at 12:53:42AM -0600, Justin Pryzby wrote:
> On Wed, Feb 03, 2021 at 03:37:39PM +0900, Michael Paquier wrote:
>> index 627b36300c..4ee3951ca0 100644
>> --- a/doc/src/sgml/ref/reindex.sgml
>> +++ b/doc/src/sgml/ref/reindex.sgml
>> @@ -293,8 +311,30 @@ REINDEX [ ( > class="parameter">option [, ...] ) ] { IN
>> respectively. Each partition of the specified partitioned relation is
>> reindexed in a separate transaction. Those commands cannot be used inside
>> a transaction block when working on a partitioned table or index.
>> +   If a REINDEX command fails when run on a partitioned
>> +   relation, and TABLESPACE was specified, then it may 
>> not
>> +   have moved all indexes to the new tablespace. Re-running the command
>> +   will rebuild again all the partitions and move previously-unprocessed
> 
> remove "again"

Okay.

>> +   indexes to the new tablespace.
>> +  
>> +  
>> +  
>> +   When using the TABLESPACE clause with
>> +   REINDEX on a partitioned index or table, only the
>> +   tablespace references of the partitions are updated. As partitioned 
>> indexes
> 
> I think you should say "of the LEAF partitions ..".  The intermediate,
> partitioned tables are also "partitions" (partitioned partitions if you like).

Indeed, I can see how that's confusing.

>> +   are not updated, it is recommended to separately use
>> +   ALTER TABLE ONLY on them to achieve that.
> 
> Maybe say: "..to set the default tablespace of any new partitions created in
> the future".

Not sure I like that.  Here is a proposal:
"it is recommended to separately use ALTER TABLE ONLY on them so as
any new partitions attached inherit the new tablespace value."
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-02-03 Thread Fujii Masao




On 2021/02/03 13:56, Bharath Rupireddy wrote:

On Tue, Feb 2, 2021 at 9:45 AM Fujii Masao  wrote:

One merit of keep_connections that I found is that we can use it even
when connecting to the older PostgreSQL that doesn't support
idle_session_timeout. Also it seems simpler to use keep_connections
rather than setting idle_session_timeout in multiple remote servers.
So I'm inclined to add this feature, but I'd like to hear more opinions.


Thanks.


And, just using idle_session_timeout on a remote server may not help
us completely. Because the remote session may go away, while we are
still using that cached connection in an explicit txn on the local
session. Our connection retry will also not work because we are in the
middle of an xact, so the local explicit txn gets aborted.


Regarding idle_in_transaction_session_timeout, this seems true. But
I was thinking that idle_session_timeout doesn't cause this issue because
it doesn't close the connection in the middle of transaction. No?


You are right. idle_session_timeout doesn't take effect when in the
middle of an explicit txn. I missed this point.


Here are some review comments.

-   (used_in_current_xact && !keep_connections))
+   (used_in_current_xact &&
+   (!keep_connections || !entry->keep_connection)))

The names of GUC and server-level option should be the same,
to make the thing less confusing?


We can have GUC name keep_connections as there can be multiple
connections within a local session and I can change the server level
option keep_connection to keep_connections because a single foreign
server can have multiple connections as we have seen that in the use
case identified by you. I will change that in the next patch set.


IMO the server-level option should override GUC. IOW, GUC setting
should be used only when the server-level option is not specified.
But the above code doesn't seem to do that. Thought?


Note that default values for GUC and server level option are on i.e.
connections are cached.

The main intention of the GUC is to not set server level options to
false for all the foreign servers in case users don't want to keep any
foreign server connections. If the server level option overrides GUC,
then even if users set GUC to off, they have to set the server level
option to false for all the foreign servers.


Maybe my explanation in the previous email was unclear. What I think is; If the 
server-level option is explicitly specified, its setting is used whatever GUC 
is. On the other hand, if the server-level option is NOT specified, GUC setting 
is used. For example, if we define the server as follows, GUC setting is used 
because the server-level option is NOT specified.

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres;

If we define the server as follows, the server-level setting is used.

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres OPTIONS 
(keep_connections 'on');


For example, log_autovacuum_min_duration GUC and reloption work in the similar 
way. That is, reloption setting overrides GUC. If reltion is not specified, GUC 
is used.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-02-03 Thread Alexey Kondratov

On 2021-02-03 09:37, Michael Paquier wrote:

On Tue, Feb 02, 2021 at 10:32:19AM +0900, Michael Paquier wrote:

On Mon, Feb 01, 2021 at 06:28:57PM +0300, Alexey Kondratov wrote:
> Hm, IIUC, REINDEX CONCURRENTLY doesn't use either of them. It directly uses
> index_create() with a proper tablespaceOid instead of
> SetRelationTableSpace(). And its checks structure is more restrictive even
> without tablespace change, so it doesn't use CheckRelationTableSpaceMove().

Sure.  I have not checked the patch in details, but even with that it
would be much safer to me if we apply the same sanity checks
everywhere.  That's less potential holes to worry about.


Thanks Alexey for the new patch.  I have been looking at the main
patch in details.

/*
-* Don't allow reindex on temp tables of other backends ... their 
local

-* buffer manager is not going to cope.
+* We don't support moving system relations into different 
tablespaces

+* unless allow_system_table_mods=1.
 */
If you remove the check on RELATION_IS_OTHER_TEMP() in
reindex_index(), you would allow the reindex of a temp relation owned
by a different session if its tablespace is not changed, so this
cannot be removed.

+!allowSystemTableMods && IsSystemRelation(iRel))
 ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot reindex temporary tables of other 
sessions")));

+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied: \"%s\" is a system 
catalog",

+RelationGetRelationName(iRel;
Indeed, a system relation with a relfilenode should be allowed to move
under allow_system_table_mods.  I think that we had better move this
check into CheckRelationTableSpaceMove() instead of reindex_index() to
centralize the logic.  ALTER TABLE does this business in
RangeVarCallbackForAlterRelation(), but our code path opening the
relation is different for the non-concurrent case.

+   if (OidIsValid(params->tablespaceOid) &&
+   IsSystemClass(relid, classtuple))
+   {
+   if (!allowSystemTableMods)
+   {
+   /* Skip all system relations, if not 
allowSystemTableMods *

I don't see the need for having two warnings here to say the same
thing if a relation is mapped or not mapped, so let's keep it simple.



Yeah, I just wanted to separate mapped and system relations, but 
probably it is too complicated.




I have found that the test suite was rather messy in its
organization.  Table creations were done first with a set of tests not
really ordered, so that was really hard to follow.  This has also led
to a set of tests that were duplicated, while other tests have been
missed, mainly some cross checks for the concurrent and non-concurrent
behaviors.  I have reordered the whole so as tests on catalogs, normal
tables and partitions are done separately with relations created and
dropped for each set.  Partitions use a global check for tablespaces
and relfilenodes after one concurrent reindex (didn't see the point in
doubling with the non-concurrent case as the same code path to select
the relations from the partition tree is taken).  An ACL test has been
added at the end.

The case of partitioned indexes was kind of interesting and I thought
about that a couple of days, and I took the decision to ignore
relations that have no storage as you did, documenting that ALTER
TABLE can be used to update the references of the partitioned
relations.  The command is still useful with this behavior, and the
tests I have added track that.

Finally, I have reworked the docs, separating the limitations related
to system catalogs and partitioned relations, to be more consistent
with the notes at the end of the page.



Thanks for working on this.

+   if (tablespacename != NULL)
+   {
+   params.tablespaceOid = get_tablespace_oid(tablespacename, 
false);
+
+   /* Check permissions except when moving to database's default */
+   if (OidIsValid(params.tablespaceOid) &&

This check for OidIsValid() seems to be excessive, since you moved the 
whole ACL check under 'if (tablespacename != NULL)' here.


+   params.tablespaceOid != MyDatabaseTableSpace)
+   {
+   AclResult   aclresult;


+CREATE INDEX regress_tblspace_test_tbl_idx ON regress_tblspace_test_tbl 
(num1);

+-- move to global tablespace move fails

Maybe 'move to global tablespace, fail', just to match a style of the 
previous comments.


+REINDEX (TABLESPACE pg_global) INDEX regress_tblspace_test_tbl_idx;


+SELECT relid, parentrelid, level FROM 
pg_partition_tree('tbspace_reindex_part_index')

+  ORDER BY relid, level;
+SELECT relid, parentrelid, level FROM 
pg_partition_tree('tbspace_reindex_part_index')

+  ORDER BY relid, level;

Why do you do the same twice in a row? It looks like a typo. Maybe it 
was intended to be called for pa

Re: POC: Cleaning up orphaned files using undo logs

2021-02-03 Thread Amit Kapila
On Wed, Feb 3, 2021 at 2:45 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Antonin Houska 
> > not really an "ordinary patch".
> >
> > [1]
> > https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhR
> > q-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com
>
> I'm a bit interested in zheap-related topics.  I'm reading this discussion to 
> see what I can do.  (But this thread is too long... there are still 13,000 
> lines out of 45,000 lines.)
>
> What's the latest patch set to look at to achieve the undo infrastructure and 
> its would-be first user, orphan file cleanup?  As far as I've read, multiple 
> people posted multiple patch sets, and I don't see how they are related.
>

I feel it is good to start with the latest patch-set posted by Antonin [1].

[1] - https://www.postgresql.org/message-id/87363.1611941415%40antos

-- 
With Regards,
Amit Kapila.




Re: Single transaction in the tablesync worker?

2021-02-03 Thread Peter Smith
On Wed, Feb 3, 2021 at 2:51 PM Peter Smith  wrote:
>
> On Wed, Feb 3, 2021 at 1:34 PM Amit Kapila  wrote:
> >
> > On Wed, Feb 3, 2021 at 6:38 AM Peter Smith  wrote:
> > >
> > > On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Feb 2, 2021 at 3:31 PM Peter Smith  
> > > > wrote:
> > > > >
> > > > > After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also
> > > > > tried a simple test where I do a DROP TABLE with very bad timing for
> > > > > the tablesync worker. It seems that doing this can cause the sync
> > > > > worker's MyLogicalRepWorker->relid to become invalid.
> > > > >
> > > >
> > > > I think this should be fixed by latest patch because I have disallowed
> > > > drop of a table when its synchronization is in progress. You can check
> > > > once and let me know if the issue still exists?
> > > >
> > >
> > > FYI - I confirmed that the problem scenario that I reported yesterday
> > > is no longer possible because now the V25 patch is disallowing the
> > > DROP TABLE while the tablesync is still running.
> > >
> >
> > Thanks for the confirmation. BTW, can you please check if we can
> > reproduce that problem without this patch? If so, we might want to
> > apply this fix irrespective of this patch. If not, why not?
> >
>
> Yes, this was an existing postgres bug. It is independent of the patch.
>
> I can reproduce exactly the same stacktrace using the HEAD src pulled @ 3/Feb.
>
> PSA my test logs showing the details.
>

Since this is an existing PG bug independent of this patch, I spawned
a new thread [ps0202] to deal with this problem.


[ps0202] 
https://www.postgresql.org/message-id/CAHut%2BPu7Z4a%3Domo%2BTvK5Gub2hxcJ7-3%2BBu1FO_%2B%2BfpFTW0oQfQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Printing backtrace of postgres processes

2021-02-03 Thread Bharath Rupireddy
On Wed, Feb 3, 2021 at 1:49 PM vignesh C  wrote:
>
> On Wed, Feb 3, 2021 at 1:00 PM Tom Lane  wrote:
> >
> > vignesh C  writes:
> > > On Mon, Feb 1, 2021 at 11:04 AM Bharath Rupireddy
> > >  wrote:
> > >> Are these superuser and permission checks enough from a security
> > >> standpoint that we don't expose some sensitive information to the
> > >> user?
> >
> > > This will just print the backtrace of the current backend. Users
> > > cannot get password information from this.
> >
> > Really?
> >
> > A backtrace normally exposes the text of the current query, for
> > instance, which could contain very sensitive data (passwords in ALTER
> > USER, customer credit card numbers in ordinary data, etc etc).  We
> > don't allow the postmaster log to be seen by any but very privileged
> > users; it's not sane to think that this data is any less
> > security-critical than the postmaster log.
> >
> > This point is entirely separate from the question of whether
> > triggering stack traces at inopportune moments could cause system
> > malfunctions, but that question is also not to be ignored.
> >
> > TBH, I'm leaning to the position that this should be superuser
> > only.  I do NOT agree with the idea that ordinary users should
> > be able to trigger it, even against backends theoretically
> > belonging to their own userid.  (Do I need to point out that
> > some levels of the call stack might be from security-definer
> > functions with more privilege than the session's nominal user?)
> >
>
> I had seen that the log that will be logged will be something like:
> postgres: test postgres [local]
> idle(ProcessClientReadInterrupt+0x3a) [0x9500ec]
> postgres: test postgres [local] idle(secure_read+0x183) [0x787f43]
> postgres: test postgres [local] idle() [0x7919de]
> postgres: test postgres [local] idle(pq_getbyte+0x32) [0x791a8e]
> postgres: test postgres [local] idle() [0x94fc16]
> postgres: test postgres [local] idle() [0x950099]
> postgres: test postgres [local] idle(PostgresMain+0x6d5) [0x954bd5]
> postgres: test postgres [local] idle() [0x898a09]
> postgres: test postgres [local] idle() [0x89838f]
> postgres: test postgres [local] idle() [0x894953]
> postgres: test postgres [local] idle(PostmasterMain+0x116b) [0x89422a]
> postgres: test postgres [local] idle() [0x79725b]
> /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f6e68d7]
> postgres: test postgres [local] idle() [0x484249]
> I was not sure if we would be able to get any secure information from
> this. I did not notice the function arguments being printed. I felt
> the function name, offset  and the return address will be logged. I
> might be missing something here.
> Thoughts?

First of all, we need to see if the output of pg_print_backtrace shows
up function parameter addresses or only function start addresses along
with line and file information when attached to gdb. In either case,
IMO, it will be easy for experienced hackers(I'm not one though) to
calculate and fetch the query string or other function parameters or
the variables inside the functions from the stack by looking at the
code (which is available openly, of course).

Say, if a backend is in a long running scan or insert operation, then
pg_print_backtrace is issued from another session, the
exec_simple_query function shows up query_string. Below is captured
from attached gdb though, I'm not sure whether the logged backtrace
will have function address or the function parameters addresses, I
think we can check that by having a long running query which
frequently checks interrupts and issue pg_print_backtrace from another
session to that backend. Now, attach gdb to the backend in which the
query is running, then take bt, see if the logged backtrace and the
gdb bt have the same or closer addresses.

#13 0x5644f4320729 in exec_simple_query (
query_string=0x5644f6771bf0 "select pg_backend_pid();") at postgres.c:1240
#14 0x5644f4324ff4 in PostgresMain (argc=1, argv=0x7ffd819bd5e0,
dbname=0x5644f679d2b8 "postgres", username=0x5644f679d298 "bharath")
at postgres.c:4394
#15 0x5644f4256f9d in BackendRun (port=0x5644f67935c0) at postmaster.c:4484
#16 0x5644f4256856 in BackendStartup (port=0x5644f67935c0) at
postmaster.c:4206
#17 0x5644f4252a11 in ServerLoop () at postmaster.c:1730
#18 0x5644f42521aa in PostmasterMain (argc=3, argv=0x5644f676b1f0)
at postmaster.c:1402
#19 0x5644f4148789 in main (argc=3, argv=0x5644f676b1f0) at main.c:209

As suggested by Tom, I'm okay if this function is callable only by the
superusers. In that case, the superusers can fetch the backtrace and
send it for further analysis in case of any hangs or issues.

Others may have better thoughts.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




DROP TABLE can crash the replication sync worker

2021-02-03 Thread Peter Smith
Hi Hackers.

As discovered in another thread [master] there is an *existing* bug in
the PG HEAD code which can happen if a DROP TABLE is done at same time
a replication tablesync worker is running.

It seems the table's relid that the sync worker is using gets ripped
out from underneath it and is invalidated by the DROP TABLE. Any
subsequent use of that relid will go wrong. In the particular test
case which found this, the result was a stack trace when a LOG message
tried to display the table name of the bad relid.

PSA the patch code to fix this. The patch disallows DROP TABLE while
any associated tablesync worker is still running. This fix was already
confirmed OK in the other thread [v25]


[master] 
https://www.postgresql.org/message-id/CAHut%2BPtSO4WsZwx8z%3D%2BYp_OWpxFmmFi5WX6OmYJzULNa2NV89g%40mail.gmail.com
[v25] 
https://www.postgresql.org/message-id/CAHut%2BPtAKP1FoHbUEWN%2Ba%3D8Pg_njsJKc9Zoz05A_ewJSvjX2MQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-DROP-TABLE-breaks-sync-worker-relid.patch
Description: Binary data


Re: Correct comment in StartupXLOG().

2021-02-03 Thread Dilip Kumar
On Wed, Feb 3, 2021 at 2:28 PM Amul Sul  wrote:
>
> Hi,
>
> SharedRecoveryState member of XLogCtl is no longer a boolean flag, got changes
> in 4e87c4836ab9 to enum but, comment referring to it still referred as the
> boolean flag which is pretty confusing and incorrect.

+1 for the comment change

> Also, the last part of the same comment is as:
>
> " .. although the boolean flag to allow WAL is probably atomic in
> itself, .",
>
> I am a bit confused here too about saying "atomic" to it, is that correct?
> I haven't done anything about it, only replaced the "boolean flag" to 
> "recovery
> state" in the attached patch.

I don't think the atomic is correct, it's no more boolean so it is
better we get rid of this part of the comment

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: POC: Cleaning up orphaned files using undo logs

2021-02-03 Thread tsunakawa.ta...@fujitsu.com
From: Antonin Houska 
> not really an "ordinary patch".
> 
> [1]
> https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhR
> q-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com

I'm a bit interested in zheap-related topics.  I'm reading this discussion to 
see what I can do.  (But this thread is too long... there are still 13,000 
lines out of 45,000 lines.)

What's the latest patch set to look at to achieve the undo infrastructure and 
its would-be first user, orphan file cleanup?  As far as I've read, multiple 
people posted multiple patch sets, and I don't see how they are related.


Regards
Takayuki Tsunakawa






Re: POC: Cleaning up orphaned files using undo logs

2021-02-03 Thread Antonin Houska
Bruce Momjian  wrote:

> On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote:
> > Antonin Houska  wrote:
> > > Well, on repeated run of the test I could also hit the first one. I could 
> > > fix
> > > it and will post a new version of the patch (along with some other small
> > > changes) this week.
> > 
> > Attached is the next version. Changes done:
> 
> Yikes, this patch is 23k lines, and most of it looks like added lines of
> code.  Is this size expected?

Yes, earlier versions of this patch, e.g. [1], were of comparable size. It's
not really an "ordinary patch".

[1] 
https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhRq-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




  1   2   >