date:20200511

Re: A comment fix

2020-05-11 Thread Michael Paquier

On Mon, May 11, 2020 at 02:22:36PM +0900, Michael Paquier wrote:
> Looks right to me, so will fix if there are no objections.
> read_local_xlog_page() uses the replay location when in recovery.

Done this part now.
--
Michael


signature.asc
Description: PGP signature

Re: refactoring basebackup.c

2020-05-11 Thread Sumanta Mukherjee

Hi Robert,
Please see my comments inline below.

On Tue, May 12, 2020 at 12:33 AM Robert Haas  wrote:

> Yeah, that needs to be tested. Right now the chunk size is 32kB but it
> might be a good idea to go larger. Another thing is that right now the
> chunk size is tied to the protocol message size, and I'm not sure
> whether the size that's optimal for disk reads is also optimal for
> protocol messages.


One needs a balance between the number of packets to be sent across the
network and so if the size
of reading from the disk and the network packet size could be unified then
it might provide a better optimization.


>
> I don't think it's a particularly bad thing that we include a small
> amount of progress for sending an empty file, a directory, or a
> symlink. That could make the results more meaningful if you have a
> database with lots of empty relations in it. However, I agree that the
> effect of compression shouldn't be included. To get there, I think we
> need to redesign the wire protocol. Right now, the server has no way
> of letting the client know how many uncompressed bytes it's sent, and
> the client has no way of figuring it out without uncompressing, which
> seems like something we want to avoid.
>
>
  I agree here too, except that if we have too many tar files one might
cringe
  but sending the xtra amt from these tar files looks okay to me.


> There are some other problems with the current wire protocol, too:
>
> 1. The syntax for the BASE_BACKUP command is large and unwieldy. We
> really ought to adopt an extensible options syntax, like COPY, VACUUM,
> EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
> with bespoke lexer and parser support.
>
> 2. The client is sent a list of tablespaces and is supposed to use
> that to expect an equal number of archives, computing the name for
> each one on the client side from the tablespace info. However, I think
> we should be able to support modes like "put all the tablespaces in a
> single archive" or "send a separate archive for every 256GB" or "ship
> it all to the cloud and don't send me any archives". To get there, I
> think we should have the server send the archive name to the clients,
> and the client should just keep receiving the next archive until it's
> told that there are no more. Then if there's one archive or ten
> archives or no archives, the client doesn't have to care. It just
> receives what the server sends until it hears that there are no more.
> It also doesn't know how the server is naming the archives; the server
> can, for example, adjust the archive names based on which compression
> format is being chosen, without knowledge of the server's naming
> conventions needing to exist on the client side.
>
>   One thing to remember here could be that an optimization would need to
be made between the number of options
  we provide and people coming back and saying which combinations do not
work
  For example, if a user script has "put all the tablespaces in a single
archive" and later on somebody makes some
  script changes to break it down at "256 GB" and there is a conflict then
which one takes precedence needs to be chosen.
  When the number of options like this become very large this could lead to
some complications.


> I think we should keep support for the current BASE_BACKUP command but
> either add a new variant using an extensible options, or else invent a
> whole new command with a different name (BACKUP, SEND_BACKUP,
> whatever) that takes extensible options. This command should send back
> all the archives and the backup manifest using a single COPY stream
> rather than multiple COPY streams. Within the COPY stream, we'll
> invent a sub-protocol, e.g. based on the first letter of the message,
> e.g.:
>
> t = Tablespace boundary. No further message payload. Indicates, for
> progress reporting purposes, that we are advancing to the next
> tablespace.
> f = Filename. The remainder of the message payload is the name of the
> next file that will be transferred.
> d = Data. The next four bytes contain the number of uncompressed bytes
> covered by this message, for progress reporting purposes. The rest of
> the message is payload, possibly compressed. Could be empty, if the
> data is being shipped elsewhere, and these messages are only being
> sent to update the client's notion of progress.
>

  Here I support this.


> I thought about that a bit, too. There might be some way to unify that
> by having some common context object that's defined by basebackup.c
> and all archivers get it, so that they have some commonly-desired
> details without needing bespoke code, but I'm not sure at this point
> whether that will actually produce a nicer result. Even if we don't
> have it initially, it seems like it wouldn't be very hard to add it
> later, so I'm not too stressed about it.
>

--Sumanta Mukherjee
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

>
>
>

Re: Parallel copy

2020-05-11 Thread Amit Kapila

On Mon, May 11, 2020 at 11:52 PM Robert Haas  wrote:
>
> > Case-3:
> > In copy command, for performing foreign key checks, we take KEY SHARE
> > lock on primary key table rows which inturn will increment the command
> > counter and updates the snapshot.  Now, as we share the snapshots at
> > the beginning of the command, we can't allow it to be changed later.
> > So, unless we do something special for it, I think we can't allow
> > parallelism in such cases.
>
> This sounds like much more of a problem to me; it'd be a significant
> restriction that would kick in routine cases where the user isn't
> doing anything particularly exciting. The command counter presumably
> only needs to be updated once per command, so maybe we could do that
> before we start parallelism. However, I think we would need to have
> some kind of dynamic memory structure to which new combo CIDs can be
> added by any member of the group, and then discovered by other members
> of the group later. At the end of the parallel operation, the leader
> must discover any combo CIDs added by others to that table before
> destroying it, even if it has no immediate use for the information. We
> can't allow a situation where the group members have inconsistent
> notions of which combo CIDs exist or what their mappings are, and if
> KEY SHARE locks are being taken, new combo CIDs could be created.
>

AFAIU, we don't generate combo CIDs for this case.  See below code in
heap_lock_tuple():

/*
* Store transaction information of xact locking the tuple.
*
* Note: Cmax is meaningless in this context, so don't set it; this avoids
* possibly generating a useless combo CID.  Moreover, if we're locking a
* previously updated tuple, it's important to preserve the Cmax.
*
* Also reset the HOT UPDATE bit, but only if there's no update; otherwise
* we would break the HOT chain.
*/
tuple->t_data->t_infomask &= ~HEAP_XMAX_BITS;
tuple->t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
tuple->t_data->t_infomask |= new_infomask;
tuple->t_data->t_infomask2 |= new_infomask2;

I don't understand why we need to do something special for combo CIDs
if they are not generated during this operation?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] hs_standby_disallowed test fix

2020-05-11 Thread Fujii Masao





On 2020/05/12 12:05, Fujii Masao wrote:



On 2020/05/12 8:03, Michail Nikolaev wrote:

Hello.

There is a recent commit about changes in way read-only commands are
prevented to be executed [1].

It seems like hs_standby_disallowed test is broken now.

So, a simple patch to fix the test is attached.


Thanks for the report and patch! LGTM.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Is it useful to record whether plans are generic or custom?

2020-05-11 Thread Atsushi Torikoshi

Hi,

When we run prepared statements, as far as I know, running
EXPLAIN is the only way to know whether executed plans are
generic or custom.
There seems no way to know how many times a prepared
statement was executed as generic and custom.

I think it may be useful to record the number of generic
and custom plans mainly to ensure the prepared statements
are executed as expected plan type.
If people also feel it's useful,  I'm going to think about adding
columns such as 'generic plans' and 'custom plans' to
pg_stat_statements.

As you know, pg_stat_statements can now track not only
'calls' but 'plans', so we can presume which plan type
was executed from them.
When both 'calls' and 'plans' were incremented, plan
type would be custom. When only 'calls' was incremented,
it would be generic.
But considering the case such as only the plan phase has
succeeded and the execution phase has failed, this
presumption can be wrong.

Thoughts?


Regards,

--
Atsushi Torikoshi

Re: Parallel copy

2020-05-11 Thread Amit Kapila

On Mon, May 11, 2020 at 11:52 PM Robert Haas  wrote:
>
> I wonder why you're still looking at this instead of looking at just
> speeding up the current code, especially the line splitting,
>

Because the line splitting is just 1-2% of overall work in common
cases.  See the data shared by Vignesh for various workloads [1].  The
time it takes is in range of 0.5-12% approximately and for cases like
a table with few indexes, it is not more than 1-2%.

[1] - 
https://www.postgresql.org/message-id/CALDaNm3r8cPsk0Vo_-6AXipTrVwd0o9U2S0nCmRdku1Dn-Tpqg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PG 13 release notes, first draft

2020-05-11 Thread Kyotaro Horiguchi

At Mon, 11 May 2020 20:12:04 -0400, Bruce Momjian  wrote in 
> On Thu, May  7, 2020 at 09:22:02PM -0700, Noah Misch wrote:
> > On Thu, May 07, 2020 at 09:38:34AM -0400, Bruce Momjian wrote:
> > > > > > - Crash recovery was losing tuples written via COPY TO.  This fixes 
> > > > > > the bug.
> > > > > 
> > > > > This was not backpatched?
> > > > 
> > > > Right.
> > > 
> > > Oh.  So you are saying we could lose COPY data on a crash, even after a
> > > commit.  That seems bad.  Can you show me the commit info?  I can't find
> > > it.
> > 
> > commit c6b9204
> > Author: Noah Misch 
> > AuthorDate: Sat Apr 4 12:25:34 2020 -0700
> > Commit: Noah Misch 
> > CommitDate: Sat Apr 4 12:25:34 2020 -0700
> > 
> > Skip WAL for new relfilenodes, under wal_level=minimal.
> > 
> > Until now, only selected bulk operations (e.g. COPY) did this.  If a
> > given relfilenode received both a WAL-skipping COPY and a WAL-logged
> > operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
> > src/backend/access/transam/README section "Skipping WAL for New
> > RelFileNode" for the new coding rules.  Maintainers of table access
> > methods should examine that section.
> 
> OK, so how do we want to document this?  Do I mention in the text below
> the WAL skipping item that this fixes a bug where a mix of simultaneous
> COPY and INSERT into a table could lose rows during crash recovery, or
> create a new item?

FWIW, as dicussed upthread, I suppose that the API change is not going
to be in relnotes.

something like this?

- Fix bug of WAL-skipping optimiazation 

Previously a trasaction doing both of COPY and a WAL-logged operations
like INSERT while wal_level=minimal can lead to loss of COPY'ed rows
through crash recovery.  Also this fix extends the WAL-skipping
optimiazation to all kinds of bulk insert operations.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Two fsync related performance issues?

2020-05-11 Thread Fujii Masao





On 2020/05/12 9:42, Paul Guo wrote:

Hello hackers,

1. StartupXLOG() does fsync on the whole data directory early in the crash 
recovery. I'm wondering if we could skip some directories (at least the 
pg_log/, table directories) since wal, etc could ensure consistency.


I agree that we can skip log directory but I'm not sure if skipping
table directory is really safe. Also ISTM that we can skip the directories
that those contents are removed or zeroed during recovery,
for example, pg_snapshots, pg_substrans, etc.


Here is the related code.

       if (ControlFile->state != DB_SHUTDOWNED &&
           ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
       {
           RemoveTempXlogFiles();
           SyncDataDirectory();
       }

I have this concern since I saw an issue in a real product environment that the 
startup process needs 10+ seconds to start wal replay after relaunch due to 
elog(PANIC) (it was seen on postgres based product Greenplum but it is a common 
issue in postgres also). I highly suspect the delay was mostly due to this. 
Also it is noticed that on public clouds fsync is much slower than that on 
local storage so the slowness should be more severe on cloud. If we at least 
disable fsync on the table directories we could skip a lot of file fsync - this 
may save a lot of seconds during crash recovery.

2.  CheckPointTwoPhase()

This may be a small issue.

See the code below,

for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
     RecreateTwoPhaseFile(gxact->xid, buf, len);

RecreateTwoPhaseFile() writes a state file for a prepared transaction and does 
fsync. It might be good to do fsync for all files once after writing them, given 
the kernel is able to do asynchronous flush when writing those file contents. If 
the TwoPhaseState->numPrepXacts is large we could do batching to avoid the fd 
resource limit. I did not test them yet but this should be able to speed up 
checkpoint/restartpoint a bit.

Any thoughts?


It seems worth making the patch and measuring the performance improvement.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: PG 13 release notes, first draft

2020-05-11 Thread Tom Lane

Bruce Momjian  writes:
> On Mon, May 11, 2020 at 11:08:35PM -0400, Chapman Flack wrote:
>> 'specify' ?

> I like that word if Tom prefers it.

'specify' works for me.

regards, tom lane

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 11:08:35PM -0400, Chapman Flack wrote:
> On 05/11/20 22:48, Bruce Momjian wrote:
> > On Mon, May 11, 2020 at 10:07:23PM -0400, Tom Lane wrote:
> >> Bruce Momjian  writes:
> >>>   Allow Unicode escapes, e.g., E'\u', U&'\', to represent any
> >>>   character available in the database encoding, even when the database
> >>>   encoding is not UTF-8 (Tom Lane)
> >>
> >> How about "to be used for" instead of "to represent"?  "Represent" kind
> >> of sounds like we're using these on output, which we aren't.
> > 
> > Uh, I think "used for" is worse though, since we are not using it.  I
> > don't get the "output" feel of the word at all.
> 
> 'specify' ?

I like that word if Tom prefers it.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Chapman Flack

On 05/11/20 22:48, Bruce Momjian wrote:
> On Mon, May 11, 2020 at 10:07:23PM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>>> Allow Unicode escapes, e.g., E'\u', U&'\', to represent any
>>> character available in the database encoding, even when the database
>>> encoding is not UTF-8 (Tom Lane)
>>
>> How about "to be used for" instead of "to represent"?  "Represent" kind
>> of sounds like we're using these on output, which we aren't.
> 
> Uh, I think "used for" is worse though, since we are not using it.  I
> don't get the "output" feel of the word at all.

'specify' ?

-Chap

Re: PG 13 release notes, first draft

2020-05-11 Thread Amit Kapila

On Tue, May 12, 2020 at 6:11 AM Justin Pryzby  wrote:
>
> On Mon, May 11, 2020 at 10:52:41AM +0530, Amit Kapila wrote:
> > > 1.  We have allowed an (auto)vacuum to display additional information
> > > about heap or index in case of an error in commit b61d161c14 [1].
> > > Now, in general, it might not be worth saying much about error
> > > information but I think this one could help users in case they have
> > > some corruption.  For example, if one of the indexes on a relation has
> > > some corrupted data (due to bad hardware or some bug), it will let the
> > > user know the index information, and the user can take appropriate
> > > action like either Reindex or maybe drop and recreate the index to
> > > overcome the problem.
>
> I'm not opposed to including it, but I think it's still true that the user
> doesn't need to know in advance that the error message will be additionally
> helpful in the event of corruption.  If we were to include more "error" items,
> we might also include these:
>
> 71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error 
> reporting
> 17a28b03645e27d73bf69a95d7569b61e58f06eb Improve the internal implementation 
> of ereport().
> 05f18c6b6b6e4b44302ee20a042cedc664532aa2 Added relation name in error 
> messages for constraint checks.
> 33753ac9d7bc83dd9ccee9d5e678ed78a0725b4e Add object names to partition 
> integrity violations.
>

I think the first one (Add backtrace support for error reporting)
seems to be quite useful as it can help to detect the problems faster.
I think having a simple rule as Bruce has w.r.t "error messages" makes
it easier to decide whether to take a particular commit or not but I
feel some of these could help users to know the new functionality and
might encourage them to upgrade to the new version.  Sure, nobody is
going to move due to only these features but along with other things,
improved error handling is a good thing to know.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] hs_standby_disallowed test fix

2020-05-11 Thread Fujii Masao





On 2020/05/12 8:03, Michail Nikolaev wrote:

Hello.

There is a recent commit about changes in way read-only commands are
prevented to be executed [1].

It seems like hs_standby_disallowed test is broken now.

So, a simple patch to fix the test is attached.


Thanks for the report and patch! LGTM.

I just wonder why standbycheck regression test doesn't run by default
in buildfarm. Which caused us not to notice this issue long time. Maybe
because it's difficult to set up hot-standby environment in the
regression test? If so, we might need to merge standbycheck test into
TAP tests for recovery.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: PG 13 release notes, first draft

2020-05-11 Thread Tatsuro Yamada


On 2020/05/12 9:34, Bruce Momjian wrote:

Could you add "Vinayak Pokale" as a co-author of the following feature since
I sometimes read his old patch to create a patch [1] ?

===
E.1.3.1.6. System Views

-  Add system view pg_stat_progress_analyze to report analyze progress (Álvaro 
Herrera, Tatsuro Yamada)

+  Add system view pg_stat_progress_analyze to report analyze progress (Álvaro 
Herrera, Tatsuro Yamada, Vinayak Pokale)
===


Done.



Hi Bruce,

Thank you!


Regards,
Tatsuro Yamada

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 10:23:53PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Well, are you suggesting a new section because the glossary shouldn't be
> > listed under source code, or because you want the function reformatting
> > added?  We just need to understand what the purpose is.  We already have
> > the glossary listed, since that is new and user-visible.
> 
> The implication of what you say here is that "is it user-visible?"
> is a criterion for whether to release-note something.  By that logic
> we probably *should* relnote the function table layout changes, because
> they sure as heck are user-visible.  People might or might not notice
> addition of a glossary, but I think just about every user consults
> the function/operator tables regularly.
> 
> I concur with Alvaro's position that if we are listing documentation
> changes, pushing them under "Source Code" is not the way to do it.
> That subsection has always been understood to be "stuff you don't
> care about if you're not a hacker".
> 
> So that sort of leads me to the conclusion that "major documentation
> changes" might be a reasonable sub-heading for the release notes.

OK, section and item added, patch attached,

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/release-13.sgml b/doc/src/sgml/release-13.sgml
index 83470e9ba1..34211e2b68 100644
--- a/doc/src/sgml/release-13.sgml
+++ b/doc/src/sgml/release-13.sgml
@@ -2331,7 +2331,7 @@ Use the directory of the pg_upgrade binary as the default new 'bindir' location

 

-Source Code
+Documentation
 
 
 
@@ -2347,6 +2347,27 @@ Add a glossary to the documentation (Corey Huinker, Jrgen Purtz, Roger Har
 
 
 
+
+
+
+
+Reformat tables containing function information for better clarity (Tom Lane)
+
+
+
+
+
+
+   
+
+   
+Source Code
+
+
+

Re: PG 13 release notes, first draft

2020-05-11 Thread Amit Kapila

On Tue, May 12, 2020 at 6:36 AM Bruce Momjian  wrote:
>
> On Mon, May 11, 2020 at 07:41:55PM -0500, Justin Pryzby wrote:
> > On Mon, May 11, 2020 at 10:52:41AM +0530, Amit Kapila wrote:
> > > One more observation:
> > >
> > > Allow inserts to trigger autovacuum activity (Laurenz Albe, Darafei
> > > Praliaskouski)
> > > This new behavior allows pages to be set as all-visible, which then
> > > allows index-only scans, ...
> >
> > > The above sentence sounds to mean that this feature allows index-only
> > > scans in more number of cases after this feature.  Is that what you
> > > intend to say? If so, is that correct?  Because I think this will
> > > allow index-only scans to skip "Heap Fetches" in more cases.
> >
> > I think what you mean is that the autovacuum feature, in addition to
> > encouraging the *planner* to choose an indexonly scan, will *also* allow (at
> > execution time) fewer heap fetches for a plan which would have
> > already/previously used IOS.  Right ?  So maybe it should say "allows OR
> > IMPROVES index-only scans" or "allows plans which use IOS to run more
> > efficiently".
>
> Yes, I see your point now.  New text is:
>
> This new behavior reduces the work necessary when the table
> needs to be frozen and allows pages to be set as all-visible.
> All-visible pages allow index-only scans to access fewer heap rows.
>

The next text LGTM.  Thanks.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: No core file generated after PostgresNode->start

2020-05-11 Thread Tom Lane

Robert Haas  writes:
> On Mon, May 11, 2020 at 4:24 PM Antonin Houska  wrote:
>> Could "sysctl kernel.core_pattern" be the problem? I discovered this setting
>> sometime when I also couldn't find the core dump on linux.

> Well, I'm running on macOS and the core files normally show up in
> /cores, but in this case they didn't.

I have a standing note to check the permissions on /cores after any macOS
upgrade, because every so often Apple decides that that directory ought to
be read-only.

regards, tom lane

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 10:07:23PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I like your wording, but the "that encoding" wasn't clear enough for me,
> > so I reworded it to:
> 
> > Allow Unicode escapes, e.g., E'\u', U&'\', to represent any
> > character available in the database encoding, even when the database
> > encoding is not UTF-8 (Tom Lane)
> 
> How about "to be used for" instead of "to represent"?  "Represent" kind
> of sounds like we're using these on output, which we aren't.

Uh, I think "used for" is worse though, since we are not using it.  I
don't get the "output" feel of the word at all.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Tue, May 12, 2020 at 10:54:52AM +0900, Michael Paquier wrote:
> On Mon, May 11, 2020 at 07:18:56PM -0400, Bruce Momjian wrote:
> > On Fri, May  8, 2020 at 11:55:33AM +0900, Michael Paquier wrote:
> >> Should e2e02191 be added to the notes?  This commit means that we
> >> actually dropped support for Windows 2000 (finally) at run-time.
> > 
> > Oh, yes.  This is much more important than the removal of support for
> > non-ELF BSD systems, which I already listed.  The new text is:
> > 
> > Remove support for Windows 2000 (Michael Paquier)
> 
> Sounds fine to me.
> 
> >> At the same time I see no mention of 79dfa8af, which added better
> >> error handling when backends the SSL context with incorrect bounds.
> > 
> > I skipped that commit since people don't normally care about better
> > error messages until they see the error message, and then they are happy
> > it is there, unless this is some chronic error message problem we are
> > fixing.
> 
> Okay.
> 
> > I thought this fell into the previous category about error messages, but
> > coloring is different.  Can we say these utilities now honor the color
> > environment variables?
> 
> Exactly, I actually became aware of that possibility after plugging
> in the common logging APIs to oid2name and vacuumlo as of fc8cb94b so
> this was not mentioned in the log message.  And anything using
> src/common/logging.c can make use of the colorized output with
> PG_COLOR[S] set.
> 
> > Are these the only new ones?
> 
> I can recall an extra one in this case: pgbench as of 30a3e77.  And I
> don't see any new callers of pg_logging_init() in the stuff that
> already existed in ~12.

I am not sure we even mentioned this in 12.  Should we document this
somewhere?  Maybe a blog posting?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: No core file generated after PostgresNode->start

2020-05-11 Thread Robert Haas

On Mon, May 11, 2020 at 4:24 PM Antonin Houska  wrote:
> Could "sysctl kernel.core_pattern" be the problem? I discovered this setting
> sometime when I also couldn't find the core dump on linux.

Well, I'm running on macOS and the core files normally show up in
/cores, but in this case they didn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PG 13 release notes, first draft

2020-05-11 Thread Tom Lane

Bruce Momjian  writes:
> Well, are you suggesting a new section because the glossary shouldn't be
> listed under source code, or because you want the function reformatting
> added?  We just need to understand what the purpose is.  We already have
> the glossary listed, since that is new and user-visible.

The implication of what you say here is that "is it user-visible?"
is a criterion for whether to release-note something.  By that logic
we probably *should* relnote the function table layout changes, because
they sure as heck are user-visible.  People might or might not notice
addition of a glossary, but I think just about every user consults
the function/operator tables regularly.

I concur with Alvaro's position that if we are listing documentation
changes, pushing them under "Source Code" is not the way to do it.
That subsection has always been understood to be "stuff you don't
care about if you're not a hacker".

So that sort of leads me to the conclusion that "major documentation
changes" might be a reasonable sub-heading for the release notes.

regards, tom lane

Re: PG 13 release notes, first draft

2020-05-11 Thread Alvaro Herrera

On 2020-May-11, Bruce Momjian wrote:

> On Mon, May 11, 2020 at 08:34:56PM -0400, Alvaro Herrera wrote:

> > Yes, we had to touch the source code in order to add documentation; but
> > so what?  Everything touches the source code, but that doesn't mean it
> > should be listed there.  I don't see what's the problem with having a
> > new subsection in the relnotes entitled "Documentation" where these two
> > items appears (glossary + new doc table format) are listed.  It's not
> > like it's going to cost us a lot of space or anything.

> Well, are you suggesting a new section because the glossary shouldn't be
> listed under source code, or because you want the function reformatting
> added?  We just need to understand what the purpose is.  We already have
> the glossary listed, since that is new and user-visible.

IMO the table reformatting change is significant enough to be
noteworthy.  I'm suggesting that a new Documentation subsection would
list both that and the glossary, separately from Source Code -- so it'd
be E.1.3.10 and Source Code would be E.1.3.11.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG 13 release notes, first draft

2020-05-11 Thread Tom Lane

Bruce Momjian  writes:
> I like your wording, but the "that encoding" wasn't clear enough for me,
> so I reworded it to:

>   Allow Unicode escapes, e.g., E'\u', U&'\', to represent any
>   character available in the database encoding, even when the database
>   encoding is not UTF-8 (Tom Lane)

How about "to be used for" instead of "to represent"?  "Represent" kind
of sounds like we're using these on output, which we aren't.

regards, tom lane

Re: pg13: xlogreader API adjust

2020-05-11 Thread Kyotaro Horiguchi

At Mon, 11 May 2020 16:33:36 -0400, Alvaro Herrera  
wrote in 
> Hello
> 
> Per discussion in thread [1], I propose the following patch to give
> another adjustment to the xlogreader API.  This results in a small but
> not insignificat net reduction of lines of code.  What this patch does
> is adjust the signature of these new xlogreader callbacks, making the
> API simpler.  The changes are:
> 
> * the segment_open callback installs the FD in xlogreader state itself,
>   instead of passing the FD back.  This was suggested by Kyotaro
>   Horiguchi in that thread[2].
> 
> * We no longer pass segcxt to segment_open; it's in XLogReaderState, 
>   which is already an argument.
> 
> * We no longer pass seg/segcxt to WALRead; instead, that function takes
>   them from XLogReaderState, which is already an argument.
>   (This means XLogSendPhysical has to drink more of the fake_xlogreader
>   kool-aid.)
> 
> I claim the reason to do it now instead of pg14 is to make it simpler
> for third-party xlogreader callers to adjust.
> 
> (Some might be thinking that I do this to avoid an API change later, but
> my guts tell me that we'll adjust xlogreader again in pg14 for the
> encryption stuff and other reasons, so.)
> 
> 
> [1] https://postgr.es/m/20200406025651.fpzdb5yyb7qyh...@alap3.anarazel.de
> [2] 
> https://postgr.es/m/20200508.114228.963995144765118400.horikyota@gmail.com

The simplified interface of WALRead looks far better to me since it no
longer has unreasonable duplicates of parameters.  I agree to the
discussion about third-party xlogreader callers but not sure about
back-patching burden.

I'm not sure the reason for wal_segment_open and WalSndSegmentOpen
being modified different way about error handling of BasicOpenFile, I
prefer the WalSndSegmentOpen way.  However, that difference doesn't
harm anything so I'm fine with the current patch.


+   fake_xlogreader.seg = *sendSeg;
+   fake_xlogreader.segcxt = *sendCxt;

fake_xlogreader.seg is a different instance from *sendSeg.  WALRead
modifies fake_xlogreader.seg but does not modify *sendSeg. Thus the
change doesn't persist.  On the other hand WalSndSegmentOpen reads
*sendSeg, which is not under control of WALRead.

Maybe we had better to make fake_xlogreader be a global variable of
walsender.c that covers the current sendSeg and sendCxt.

regards.


-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PG 13 release notes, first draft

2020-05-11 Thread Michael Paquier

On Mon, May 11, 2020 at 07:18:56PM -0400, Bruce Momjian wrote:
> On Fri, May  8, 2020 at 11:55:33AM +0900, Michael Paquier wrote:
>> Should e2e02191 be added to the notes?  This commit means that we
>> actually dropped support for Windows 2000 (finally) at run-time.
> 
> Oh, yes.  This is much more important than the removal of support for
> non-ELF BSD systems, which I already listed.  The new text is:
> 
>   Remove support for Windows 2000 (Michael Paquier)

Sounds fine to me.

>> At the same time I see no mention of 79dfa8af, which added better
>> error handling when backends the SSL context with incorrect bounds.
> 
> I skipped that commit since people don't normally care about better
> error messages until they see the error message, and then they are happy
> it is there, unless this is some chronic error message problem we are
> fixing.

Okay.

> I thought this fell into the previous category about error messages, but
> coloring is different.  Can we say these utilities now honor the color
> environment variables?

Exactly, I actually became aware of that possibility after plugging
in the common logging APIs to oid2name and vacuumlo as of fc8cb94b so
this was not mentioned in the log message.  And anything using
src/common/logging.c can make use of the colorized output with
PG_COLOR[S] set.

> Are these the only new ones?

I can recall an extra one in this case: pgbench as of 30a3e77.  And I
don't see any new callers of pg_logging_init() in the stuff that
already existed in ~12.
--
Michael


signature.asc
Description: PGP signature

Re: A patch for get origin from commit_ts.

2020-05-11 Thread Michael Paquier

On Mon, May 11, 2020 at 04:43:11PM +0800, movead...@highgo.ca wrote:
> But I can not fond any code to get 'origin' from commit_ts, just like it is
> producing data which nobody cares about. Can I know what's the purpose
> of the 'origin' in commit_ts? Do you think we should add some support
> to the careless data?

I have not thought about this matter, but it seems to me that you
should add this patch to the upcoming commit fest for evaluation:
https://commitfest.postgresql.org/28/

This is going to take a couple of months though as the main focus
lately is the stability of 13.
--
Michael


signature.asc
Description: PGP signature

Re: PG 13 release notes, first draft

2020-05-11 Thread Peter Geoghegan

On Mon, May 11, 2020 at 6:23 PM Bruce Momjian  wrote:
> OK, I was able to add some of it cleanly:
>
> This allows efficient btree indexing of low cardinality columns by
> storing duplicate keys only once.  Users upgrading with pg_upgrade
> will need to use REINDEX to make use of this feature.

That seems like a reasonable compromise. Thanks!

-- 
Peter Geoghegan

Re: PG 13 release notes, first draft

2020-05-11 Thread Amit Langote

On Tue, May 12, 2020 at 8:51 AM Bruce Momjian  wrote:
> On Fri, May  8, 2020 at 12:07:09PM +0900, Amit Langote wrote:
> > I have attached a patch with my suggestions above.
>
> OK, I slightly modified the wording of your first change, patch
> attached.

Thanks.  I checked that what you committed looks fine.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 09:15:43PM -0400, Bruce Momjian wrote:
> On Mon, May 11, 2020 at 05:33:40PM -0700, Peter Geoghegan wrote:
> > On Mon, May 11, 2020 at 5:14 PM Bruce Momjian  wrote:
> > > Agreed.  How is this?
> > >
> > > This allows efficient btree indexing of low cardinality columns.
> > > Users upgrading with pg_upgrade will need to use REINDEX to make 
> > > use of
> > > this feature.
> > 
> > I still think that the release notes should say that the key is only
> > stored once, while TIDs that identify table rows are stored together
> > as an array. Everything that's helpful (or harmful) about the feature
> > happens as a consequence of that. This isn't hard to grasp
> > intuitively, and is totally in line with how things like Oracle bitmap
> > indexes are presented to ordinary users.
> 
> I still don't think these details belong in the release notes.

OK, I was able to add some of it cleanly:

This allows efficient btree indexing of low cardinality columns by
storing duplicate keys only once.  Users upgrading with pg_upgrade
will need to use REINDEX to make use of this feature.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 05:33:40PM -0700, Peter Geoghegan wrote:
> On Mon, May 11, 2020 at 5:14 PM Bruce Momjian  wrote:
> > Agreed.  How is this?
> >
> > This allows efficient btree indexing of low cardinality columns.
> > Users upgrading with pg_upgrade will need to use REINDEX to make 
> > use of
> > this feature.
> 
> I still think that the release notes should say that the key is only
> stored once, while TIDs that identify table rows are stored together
> as an array. Everything that's helpful (or harmful) about the feature
> happens as a consequence of that. This isn't hard to grasp
> intuitively, and is totally in line with how things like Oracle bitmap
> indexes are presented to ordinary users.

I still don't think these details belong in the release notes.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 07:41:01PM -0500, Justin Pryzby wrote:
> |Allow function call backtraces of errors to be logged (Peter Eisentraut, 
> Álvaro Herrera)
> |Server variable backtrace_functions specifies which C functions should 
> generate backtraces on error.
> 
> I think the details in the description are eclipsing the most important thing:
> backtraces on Assert().  I would say "Support for showing backtraces on 
> error".

Uh, you mean this adds backtraces for errors and asserts?  Are
non-developers running assert builds?

> Regarding this one:
> |Add system view pg_shmem_allocations to display shared memory usage (Andres 
> Freund, Robert Haas)
> |WHAT IS THE ENTRY WITH NO NAME?
> 
> There seems to be two special, "unnamed" cases:
> src/backend/storage/ipc/shmem.c-/* output shared memory allocated but 
> not counted via the shmem index */
> src/backend/storage/ipc/shmem.c:values[0] = 
> CStringGetTextDatum("");
> ...
> src/backend/storage/ipc/shmem.c-/* output as-of-yet unused shared 
> memory */
> src/backend/storage/ipc/shmem.c-nulls[0] = true;
> 
> That seems to be adequately documented:
> https://www.postgresql.org/docs/devel/view-pg-shmem-allocations.html
> |NULL for unused memory and  for anonymous allocations.

OK, thanks.  Comment removed.

> I would remove this part:
> "Previously, this could only be set at server start."

OK, you rae saying it is already clear, agreed, removed.


-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 05:09:54PM -0400, Alvaro Herrera wrote:
> On 2020-May-11, Alvaro Herrera wrote:
> 
> > Hello
> > 
> > > +
> > > +
> > > +
> > > +Add psql commands to report operator classes and operator families 
> > > (Sergey Cherkashin, Nikita Glukhov, Alexander Korotkov)
> > > +
> > 
> > I think this item should list the commands in question:
> > \dA, \dAc, \dAf, \dAo, \dAp
> > (All the other psql entries in the relnotes do that).
> 
> Sorry, it's the last four only -- \dA is an older command.

OK, confirmed, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 07:41:55PM -0500, Justin Pryzby wrote:
> On Mon, May 11, 2020 at 10:52:41AM +0530, Amit Kapila wrote:
> > One more observation:
> > 
> > Allow inserts to trigger autovacuum activity (Laurenz Albe, Darafei
> > Praliaskouski)
> > This new behavior allows pages to be set as all-visible, which then
> > allows index-only scans, ...
> 
> > The above sentence sounds to mean that this feature allows index-only
> > scans in more number of cases after this feature.  Is that what you
> > intend to say? If so, is that correct?  Because I think this will
> > allow index-only scans to skip "Heap Fetches" in more cases.
> 
> I think what you mean is that the autovacuum feature, in addition to
> encouraging the *planner* to choose an indexonly scan, will *also* allow (at
> execution time) fewer heap fetches for a plan which would have
> already/previously used IOS.  Right ?  So maybe it should say "allows OR
> IMPROVES index-only scans" or "allows plans which use IOS to run more
> efficiently".

Yes, I see your point now.  New text is:

This new behavior reduces the work necessary when the table
needs to be frozen and allows pages to be set as all-visible.
All-visible pages allow index-only scans to access fewer heap rows.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 08:34:56PM -0400, Alvaro Herrera wrote:
> On 2020-May-11, Bruce Momjian wrote:
> 
> > We have long discussed how much of the release notes is to reward
> > behavior, and we have settled on having the names on the items, and the
> > Acknowledgments section at the bottom.
> 
> Yes, we had to touch the source code in order to add documentation; but
> so what?  Everything touches the source code, but that doesn't mean it
> should be listed there.  I don't see what's the problem with having a
> new subsection in the relnotes entitled "Documentation" where these two
> items appears (glossary + new doc table format) are listed.  It's not
> like it's going to cost us a lot of space or anything.
> 
> I don't think there is any circularity argument BTW -- we're not going
> to document that we added release notes.  And changing the table format
> is not entirely pointless, given that we've historically had trouble
> with these tables (read: they're totally unusable in PDF).

Well, are you suggesting a new section because the glossary shouldn't be
listed under source code, or because you want the function reformatting
added?  We just need to understand what the purpose is.  We already have
the glossary listed, since that is new and user-visible.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 04:50:50PM -0400, Alvaro Herrera wrote:
> Hello
> 
> > +
> > +
> > +
> > +Add psql commands to report operator classes and operator families (Sergey 
> > Cherkashin, Nikita Glukhov, Alexander Korotkov)
> > +
> 
> I think this item should list the commands in question:
> \dA, \dAc, \dAf, \dAo, \dAp
> (All the other psql entries in the relnotes do that).

Good idea.  I added this paragraph:

The new commands are \dAc, \dAf, \dAo, and \dAp.

I didn't see any changes to \dA except regression tests.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Two fsync related performance issues?

2020-05-11 Thread Paul Guo

Hello hackers,

1. StartupXLOG() does fsync on the whole data directory early in the crash
recovery. I'm wondering if we could skip some directories (at least the
pg_log/, table directories) since wal, etc could ensure consistency. Here
is the related code.

  if (ControlFile->state != DB_SHUTDOWNED &&
  ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
  {
  RemoveTempXlogFiles();
  SyncDataDirectory();
  }

I have this concern since I saw an issue in a real product environment that
the startup process needs 10+ seconds to start wal replay after relaunch
due to elog(PANIC) (it was seen on postgres based product Greenplum but it
is a common issue in postgres also). I highly suspect the delay was mostly
due to this. Also it is noticed that on public clouds fsync is much slower
than that on local storage so the slowness should be more severe on cloud.
If we at least disable fsync on the table directories we could skip a lot
of file fsync - this may save a lot of seconds during crash recovery.

2.  CheckPointTwoPhase()

This may be a small issue.

See the code below,

for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
RecreateTwoPhaseFile(gxact->xid, buf, len);

RecreateTwoPhaseFile() writes a state file for a prepared transaction and
does fsync. It might be good to do fsync for all files once after writing
them, given the kernel is able to do asynchronous flush when writing those
file contents. If the TwoPhaseState->numPrepXacts is large we could do
batching to avoid the fd resource limit. I did not test them yet but this
should be able to speed up checkpoint/restartpoint a bit.

Any thoughts?

Regards.

Re: PG 13 release notes, first draft

2020-05-11 Thread Justin Pryzby

On Mon, May 11, 2020 at 10:52:41AM +0530, Amit Kapila wrote:
> > 1.  We have allowed an (auto)vacuum to display additional information
> > about heap or index in case of an error in commit b61d161c14 [1].
> > Now, in general, it might not be worth saying much about error
> > information but I think this one could help users in case they have
> > some corruption.  For example, if one of the indexes on a relation has
> > some corrupted data (due to bad hardware or some bug), it will let the
> > user know the index information, and the user can take appropriate
> > action like either Reindex or maybe drop and recreate the index to
> > overcome the problem.

I'm not opposed to including it, but I think it's still true that the user
doesn't need to know in advance that the error message will be additionally
helpful in the event of corruption.  If we were to include more "error" items,
we might also include these:

71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error 
reporting
17a28b03645e27d73bf69a95d7569b61e58f06eb Improve the internal implementation of 
ereport().
05f18c6b6b6e4b44302ee20a042cedc664532aa2 Added relation name in error messages 
for constraint checks.
33753ac9d7bc83dd9ccee9d5e678ed78a0725b4e Add object names to partition 
integrity violations.

> One more observation:
> 
> Allow inserts to trigger autovacuum activity (Laurenz Albe, Darafei
> Praliaskouski)
> This new behavior allows pages to be set as all-visible, which then
> allows index-only scans, ...

> The above sentence sounds to mean that this feature allows index-only
> scans in more number of cases after this feature.  Is that what you
> intend to say? If so, is that correct?  Because I think this will
> allow index-only scans to skip "Heap Fetches" in more cases.

I think what you mean is that the autovacuum feature, in addition to
encouraging the *planner* to choose an indexonly scan, will *also* allow (at
execution time) fewer heap fetches for a plan which would have
already/previously used IOS.  Right ?  So maybe it should say "allows OR
IMPROVES index-only scans" or "allows plans which use IOS to run more
efficiently".

Separate from Amit's comment, I suggest to say:
| This new behavior allows autovacuum to set pages as all-visible, which then
| allows index-only scans, ...

..otherwise it sounds like this feature implemented the concept of
"all-visible".

-- 
Justin

Re: PG 13 release notes, first draft

2020-05-11 Thread Justin Pryzby

|Allow function call backtraces of errors to be logged (Peter Eisentraut, 
Álvaro Herrera)
|Server variable backtrace_functions specifies which C functions should 
generate backtraces on error.

I think the details in the description are eclipsing the most important thing:
backtraces on Assert().  I would say "Support for showing backtraces on error".

Regarding this one:
|Add system view pg_shmem_allocations to display shared memory usage (Andres 
Freund, Robert Haas)
|WHAT IS THE ENTRY WITH NO NAME?

There seems to be two special, "unnamed" cases:
src/backend/storage/ipc/shmem.c-/* output shared memory allocated but 
not counted via the shmem index */
src/backend/storage/ipc/shmem.c:values[0] = 
CStringGetTextDatum("");
...
src/backend/storage/ipc/shmem.c-/* output as-of-yet unused shared 
memory */
src/backend/storage/ipc/shmem.c-nulls[0] = true;

That seems to be adequately documented:
https://www.postgresql.org/docs/devel/view-pg-shmem-allocations.html
|NULL for unused memory and  for anonymous allocations.

I would remove this part:
"Previously, this could only be set at server start."

-- 
Justin

Re: PG 13 release notes, first draft

2020-05-11 Thread Alvaro Herrera

On 2020-May-11, Bruce Momjian wrote:

> We have long discussed how much of the release notes is to reward
> behavior, and we have settled on having the names on the items, and the
> Acknowledgments section at the bottom.

Yes, we had to touch the source code in order to add documentation; but
so what?  Everything touches the source code, but that doesn't mean it
should be listed there.  I don't see what's the problem with having a
new subsection in the relnotes entitled "Documentation" where these two
items appears (glossary + new doc table format) are listed.  It's not
like it's going to cost us a lot of space or anything.

I don't think there is any circularity argument BTW -- we're not going
to document that we added release notes.  And changing the table format
is not entirely pointless, given that we've historically had trouble
with these tables (read: they're totally unusable in PDF).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 03:19:50PM +0900, Tatsuro Yamada wrote:
> Hi Bruce,
> 
> On 2020/05/05 12:16, Bruce Momjian wrote:
> > I have committed the first draft of the PG 13 release notes.  You can
> > see them here:
> > 
> > https://momjian.us/pgsql_docs/release-13.html
> > 
> > It still needs markup, word wrap, and indenting.  The community doc
> > build should happen in a few hours.
> 
> Thanks for working on this! :-D
> 
> Could you add "Vinayak Pokale" as a co-author of the following feature since
> I sometimes read his old patch to create a patch [1] ?
> 
> ===
> E.1.3.1.6. System Views
> 
> -  Add system view pg_stat_progress_analyze to report analyze progress 
> (Álvaro Herrera, Tatsuro Yamada)
> 
> +  Add system view pg_stat_progress_analyze to report analyze progress 
> (Álvaro Herrera, Tatsuro Yamada, Vinayak Pokale)
> ===

Done.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Peter Geoghegan

On Mon, May 11, 2020 at 5:14 PM Bruce Momjian  wrote:
> Agreed.  How is this?
>
> This allows efficient btree indexing of low cardinality columns.
> Users upgrading with pg_upgrade will need to use REINDEX to make use 
> of
> this feature.

I still think that the release notes should say that the key is only
stored once, while TIDs that identify table rows are stored together
as an array. Everything that's helpful (or harmful) about the feature
happens as a consequence of that. This isn't hard to grasp
intuitively, and is totally in line with how things like Oracle bitmap
indexes are presented to ordinary users.

-- 
Peter Geoghegan

Re: PG 13 release notes, first draft (ltree dot star)

2020-05-11 Thread Bruce Momjian

On Sun, May 10, 2020 at 03:09:47PM -0500, Justin Pryzby wrote:
> > In ltree, when using adjacent asterisks with braces, e.g. "*{2}.*{3}", 
> > properly interpret that as "*{5}" (Nikita Glukhov)
> 
> I think that should say ".*" not "*", as in:
> 
> > In ltree, when using adjacent asterisks with braces, e.g. ".*{2}.*{3}", 
> > properly interpret that as "*{5}" (Nikita Glukhov)
> 
> The existing text clearly came from the commit message, which (based on its
> regression tests) I think was the source of the missing dot.
> 
> commit 9950c8aadf0edd31baec74a729d47d94af636c06
> Author: Tom Lane 
> Date:   Sat Mar 28 18:31:05 2020 -0400
> 
> Fix lquery's behavior for consecutive '*' items.
> 
> Something like "*{2}.*{3}" should presumably mean the same as
> "*{5}", but it didn't.  Improve that.
> ...

OK, fixed to be:

In ltree, when using adjacent asterisks with braces, e.g. ".*{2}.*{3}",
properly interpret that as ".*{5}" (Nikita Glukhov)

I added two dots.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 10:52:41AM +0530, Amit Kapila wrote:
> One more observation:
> 
> Allow inserts to trigger autovacuum activity (Laurenz Albe, Darafei
> Praliaskouski)
> This new behavior allows pages to be set as all-visible, which then
> allows index-only scans, ...
> 
> The above sentence sounds to mean that this feature allows index-only
> scans in more number of cases after this feature.  Is that what you
> intend to say? If so, is that correct?  Because I think this will

Yes.

> allow index-only scans to skip "Heap Fetches" in more cases.

Uh, by definition an index-only scan only scans the index, not the heap,
right?  It is true there are fewer heap fetches, but fewer heap features
I thought mean more index-only scans.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Sat, May  9, 2020 at 11:16:27AM +0530, Amit Kapila wrote:
> On Tue, May 5, 2020 at 8:46 AM Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 13 release notes.  You can
> > see them here:
> >
> > https://momjian.us/pgsql_docs/release-13.html
> >
> 
> Thanks for the work.  I was today going through the release notes and
> was wondering whether we should consider adding information about some
> other work done for PG13.
> 1.  We have allowed an (auto)vacuum to display additional information
> about heap or index in case of an error in commit b61d161c14 [1].
> Now, in general, it might not be worth saying much about error
> information but I think this one could help users in case they have
> some corruption.  For example, if one of the indexes on a relation has
> some corrupted data (due to bad hardware or some bug), it will let the
> user know the index information, and the user can take appropriate
> action like either Reindex or maybe drop and recreate the index to
> overcome the problem.

I mentioned my approach to error message changes in a previous email
today:

I skipped that commit since people don't normally care about
better error messages until they see the error message, and then
they are happy it is there, unless this is some chronic error
message problem we are fixing.

> 2. In the "Source Code" section, we can add information about
> infrastructure enhancement for parallelism.  Basically, "Allow
> relation extension and page lock to conflict among parallel-group
> members" [2][3].  This will allow improving the parallelism further in
> many cases like (a) we can allow multiple workers to operate on a heap
> and index in a parallel vacuum, (b) we can allow parallel Inserts,
> etc.

Uh, if there is no user-visible behavior change, this seems too low
level for the release notes.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Fri, May  8, 2020 at 09:14:01AM +0200, Fabien COELHO wrote:
> > It seems (a) pointless
> 
> I disagree, on the very principle of free software values as a social
> movement.
> 
> Documentation improvements should be encouraged, and recognizing these in
> the release notes contributes to do that for what is a lot of unpaid work
> given freely by many people. I do not see this as "pointless", on the
> contrary, having something "free" in a mostly mercantile world is odd enough
> to deserve some praise.
> 
> How many hours have you spent on the function operator table improvements?
> If someone else had contributed that and only that to a release, would it
> not justify two lines of implicit thanks somewhere down in the release
> notes?
> 
> Moreover adding a documentation section costs next to nothing, so what is
> the actual point of not doing it? Also, having some documentation
> improvements listed under "source code" does not make sense: writing good,
> precise and structured English is not "source code".

We have long discussed how much of the release notes is to reward
behavior, and we have settled on having the names on the items, and the
Acknowledgments section at the bottom.  If you want to revisit that
decision, you should start a new thread because doing it for just this
item doesn't make sense.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 05:05:29PM -0700, Peter Geoghegan wrote:
> On Mon, May 11, 2020 at 4:10 PM Bruce Momjian  wrote:
> > > think that you should point out that deduplication works by storing
> > > the duplicates in the obvious way: Only storing the key once per
> > > distinct value (or once per distinct combination of values in the case
> > > of multi-column indexes), followed by an array of TIDs (i.e. a posting
> > > list). Each TID points to a separate row in the table.
> >
> > These are not details that should be in the release notes since the
> > internal representation is not important for its use.
> 
> I am not concerned about describing the specifics of the on-disk
> representation, and I don't feel too strongly about the storage
> parameter (leave it out). I only ask that the wording convey the fact
> that the deduplication feature is not just a quantitative improvement
> -- it's a qualitative behavioral change, that will help data
> warehousing in particular. This wasn't the case with the v12 work on
> B-Tree duplicates (as I said last year, I thought of the v12 stuff as
> fixing a problem more than an enhancement).
> 
> With the deduplication feature added to Postgres v13, the B-Tree code
> can now gracefully deal with low cardinality data by compressing the
> duplicates as needed. This is comparable to bitmap indexes in
> proprietary database systems, but without most of their disadvantages
> (in particular around heavyweight locking, deadlocks that abort
> transactions, etc). It's easy to imagine this making a big difference
> with analytics workloads. The v12 work made indexes with lots of
> duplicates 15%-30% smaller (compared to v11), but the v13 work can
> make them 60% - 80% smaller in many common cases (compared to v12). In
> extreme cases indexes might even be ~12x smaller (though that will be
> rare).

Agreed.  How is this?

This allows efficient btree indexing of low cardinality columns.
Users upgrading with pg_upgrade will need to use REINDEX to make use of
this feature.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Event trigger code comment duplication

2020-05-11 Thread David G. Johnston

Skimming through the code in event_trigger.c and noticed that while most of
the stanzas that reference IsUnderPostmaster refer back to the code comment
beginning on line 675 the block for table rewrite copied it in
verbatim starting at line 842.  The currentEventTriggerState comment at
lines 730 and 861 seem to be the same too.

https://github.com/postgres/postgres/blob/60c90c16c1885bb9aa2047b66f958b865c5d397e/src/backend/commands/event_trigger.c#L675
https://github.com/postgres/postgres/blob/60c90c16c1885bb9aa2047b66f958b865c5d397e/src/backend/commands/event_trigger.c#L730

https://github.com/postgres/postgres/blob/60c90c16c1885bb9aa2047b66f958b865c5d397e/src/backend/commands/event_trigger.c#L842
https://github.com/postgres/postgres/blob/60c90c16c1885bb9aa2047b66f958b865c5d397e/src/backend/commands/event_trigger.c#L861

https://github.com/postgres/postgres/blob/4d1563717fb1860168a40b852e1d61a33ecdd62f/src/backend/commands/event_trigger.c#L785

I did also notice a difference with the test on line 861 compared to line
785 though I unable to evaluate whether the absence of a "rewriteList" is
expected (there is a "dropList" at 785 ...).

David J.

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Thu, May  7, 2020 at 09:22:02PM -0700, Noah Misch wrote:
> On Thu, May 07, 2020 at 09:38:34AM -0400, Bruce Momjian wrote:
> > > > > - Crash recovery was losing tuples written via COPY TO.  This fixes 
> > > > > the bug.
> > > > 
> > > > This was not backpatched?
> > > 
> > > Right.
> > 
> > Oh.  So you are saying we could lose COPY data on a crash, even after a
> > commit.  That seems bad.  Can you show me the commit info?  I can't find
> > it.
> 
> commit c6b9204
> Author: Noah Misch 
> AuthorDate: Sat Apr 4 12:25:34 2020 -0700
> Commit: Noah Misch 
> CommitDate: Sat Apr 4 12:25:34 2020 -0700
> 
> Skip WAL for new relfilenodes, under wal_level=minimal.
> 
> Until now, only selected bulk operations (e.g. COPY) did this.  If a
> given relfilenode received both a WAL-skipping COPY and a WAL-logged
> operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
> src/backend/access/transam/README section "Skipping WAL for New
> RelFileNode" for the new coding rules.  Maintainers of table access
> methods should examine that section.

OK, so how do we want to document this?  Do I mention in the text below
the WAL skipping item that this fixes a bug where a mix of simultaneous
COPY and INSERT into a table could lose rows during crash recovery, or
create a new item?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Peter Geoghegan

On Mon, May 11, 2020 at 4:10 PM Bruce Momjian  wrote:
> > think that you should point out that deduplication works by storing
> > the duplicates in the obvious way: Only storing the key once per
> > distinct value (or once per distinct combination of values in the case
> > of multi-column indexes), followed by an array of TIDs (i.e. a posting
> > list). Each TID points to a separate row in the table.
>
> These are not details that should be in the release notes since the
> internal representation is not important for its use.

I am not concerned about describing the specifics of the on-disk
representation, and I don't feel too strongly about the storage
parameter (leave it out). I only ask that the wording convey the fact
that the deduplication feature is not just a quantitative improvement
-- it's a qualitative behavioral change, that will help data
warehousing in particular. This wasn't the case with the v12 work on
B-Tree duplicates (as I said last year, I thought of the v12 stuff as
fixing a problem more than an enhancement).

With the deduplication feature added to Postgres v13, the B-Tree code
can now gracefully deal with low cardinality data by compressing the
duplicates as needed. This is comparable to bitmap indexes in
proprietary database systems, but without most of their disadvantages
(in particular around heavyweight locking, deadlocks that abort
transactions, etc). It's easy to imagine this making a big difference
with analytics workloads. The v12 work made indexes with lots of
duplicates 15%-30% smaller (compared to v11), but the v13 work can
make them 60% - 80% smaller in many common cases (compared to v12). In
extreme cases indexes might even be ~12x smaller (though that will be
rare).

-- 
Peter Geoghegan

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Fri, May  8, 2020 at 12:07:09PM +0900, Amit Langote wrote:
> > OK, I used this wording:
> >
> > Allow logical replication into partitioned tables on subscribers 
> > (Amit
> > Langote)
> >
> > Previously, subscribers could only receive rows into non-partitioned
> > tables.
> 
> This is fine, thanks.
> 
> I have attached a patch with my suggestions above.

OK, I slightly modified the wording of your first change, patch
attached.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/release-13.sgml b/doc/src/sgml/release-13.sgml
index 239586c04b..51ebb9b2ba 100644
--- a/doc/src/sgml/release-13.sgml
+++ b/doc/src/sgml/release-13.sgml
@@ -273,7 +273,7 @@ Allow partitionwise joins to happen in more cases (Ashutosh Bapat, Etsuro Fujita
 
 
 For example, partitionwise joins can now happen between partitioned
-tables where the ancestors do not exactly match.
+tables even when their partition bounds do not match exactly.
 
 
 
@@ -307,7 +307,7 @@ Allow partitioned tables to be logically replicated via publications (Amit Lango
 
 
 Previously, partitions had to be replicated individually.  Now partitioned tables can be published explicitly causing all partitions to be automatically published.  Addition/removal of partitions from
-partitioned tables are automatically added/removed from publications.  The CREATE PUBLICATION option publish_via_partition_root controls whether partitioned tables are published as their own or their ancestors.
+partitioned tables are automatically added/removed from publications.  The CREATE PUBLICATION option publish_via_partition_root controls whether changes to partitions are published as their own or their ancestor's.

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Fri, May  8, 2020 at 11:55:33AM +0900, Michael Paquier wrote:
> On Mon, May 04, 2020 at 11:16:00PM -0400, Bruce Momjian wrote:
> > I have committed the first draft of the PG 13 release notes.  You can
> > see them here:
> > 
> > https://momjian.us/pgsql_docs/release-13.html
> > 
> > It still needs markup, word wrap, and indenting.  The community doc
> > build should happen in a few hours.
> 
> Thanks Bruce for compiling the release notes.  Here are some comments
> from me, after looking at the state of the notes as of f2ff203.
> 
> Should e2e02191 be added to the notes?  This commit means that we
> actually dropped support for Windows 2000 (finally) at run-time.

Oh, yes.  This is much more important than the removal of support for
non-ELF BSD systems, which I already listed.  The new text is:

Remove support for Windows 2000 (Michael Paquier)

> At the same time I see no mention of 79dfa8af, which added better
> error handling when backends the SSL context with incorrect bounds.

I skipped that commit since people don't normally care about better
error messages until they see the error message, and then they are happy
it is there, unless this is some chronic error message problem we are
fixing.

> What about fc8cb94, which basically means that vacuumlo and oid2name
> are able to now support coloring output for their logging?

I thought this fell into the previous category about error messages, but
coloring is different.  Can we say these utilities now honor the color
environment variables?  Are these the only new ones?

> 
> Document color support (Peter Eisentraut)
> 
> [...]
> 
> THIS WAS NOT DOCUMENTED BEFORE?
> 
> Not sure that there is a point to add that to the release notes.

OK, removed.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Thu, May  7, 2020 at 11:54:12AM -0700, Peter Geoghegan wrote:
> Hi Bruce,
> 
> On Mon, May 4, 2020 at 8:16 PM Bruce Momjian  wrote:
> > I have committed the first draft of the PG 13 release notes.  You can
> > see them here:
> >
> > https://momjian.us/pgsql_docs/release-13.html
> 
> I see that you have an entry for the deduplication feature:
> 
> "More efficiently store duplicates in btree indexes (Anastasia
> Lubennikova, Peter Geoghegan)"
> 
> I would like to provide some input on this. Fortunately it's much
> easier to explain than the B-Tree work that went into Postgres 12. I
  -

Well, that's good!  :-)

> think that you should point out that deduplication works by storing
> the duplicates in the obvious way: Only storing the key once per
> distinct value (or once per distinct combination of values in the case
> of multi-column indexes), followed by an array of TIDs (i.e. a posting
> list). Each TID points to a separate row in the table.

These are not details that should be in the release notes since the
internal representation is not important for its use.

> It won't be uncommon for this to make indexes as much as 3x smaller
> (it depends on a number of different factors that you can probably
> guess). I wrote a summary of how it works for power users in the
> B-Tree documentation chapter, which you might want to link to in the
> release notes:
> 
> https://www.postgresql.org/docs/devel/btree-implementation.html#BTREE-DEDUPLICATION
> 
> Users that pg_upgrade will have to REINDEX to actually use the
> feature, regardless of which version they've upgraded from. There are
> also some limited caveats about the data types that can use
> deduplication, and stuff like that -- see the documentation section I
> linked to.

I have added text to this about pg_upgrade:

Users upgrading with pg_upgrade will need to use REINDEX to make
use of this feature.

> Finally, you might want to note that the feature is enabled by
> default, and can be disabled by setting the "deduplicate_items" index
> storage option to "off". (We have yet to make a final decision on
> whether the feature should be enabled before the first stable release
> of Postgres 13, though -- I have an open item for that.)

Well, again, I don't think the average user needs to know this can be
disabled.  They can look at the docs of this feature to see that.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

[PATCH] hs_standby_disallowed test fix

2020-05-11 Thread Michail Nikolaev

Hello.

There is a recent commit about changes in way read-only commands are
prevented to be executed [1].

It seems like hs_standby_disallowed test is broken now.

So, a simple patch to fix the test is attached.

Thanks,
Michail.

[1] 
https://www.postgresql.org/message-id/flat/154701965766.11631.2240747476287499810.pgcf%40coridan.postgresql.org#168075c6e89267e11b862aa0c55b1910
diff --git a/src/test/regress/expected/hs_standby_disallowed.out b/src/test/regress/expected/hs_standby_disallowed.out
index dff0953e9a..8d3cafa5ce 100644
--- a/src/test/regress/expected/hs_standby_disallowed.out
+++ b/src/test/regress/expected/hs_standby_disallowed.out
@@ -64,7 +64,7 @@ SELECT count(*) FROM hs1;
 (1 row)
 
 COMMIT PREPARED 'foobar';
-ERROR:  COMMIT PREPARED cannot run inside a transaction block
+ERROR:  cannot execute COMMIT PREPARED during recovery
 ROLLBACK;
 BEGIN;
 SELECT count(*) FROM hs1;
@@ -86,7 +86,7 @@ SELECT count(*) FROM hs1;
 (1 row)
 
 ROLLBACK PREPARED 'foobar';
-ERROR:  ROLLBACK PREPARED cannot run inside a transaction block
+ERROR:  cannot execute ROLLBACK PREPARED during recovery
 ROLLBACK;
 -- Locks
 BEGIN;

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Thu, May  7, 2020 at 02:08:58PM -0400, Chapman Flack wrote:
> On 05/07/20 09:46, Bruce Momjian wrote:
> > Ah, very good point.  New text is:
> > 
> > Allow Unicode escapes, e.g., E'\u', in databases that do not
> > use UTF-8 encoding (Tom Lane)
> > 
> > The Unicode characters must be available in the database encoding.
> > ...
> > 
> > I am only using E'\u' as an example.
> 
> Hmm, how about:
> 
>   Allow Unicode escapes, e.g., E'\u' or U&'\', to represent
>   any character available in the database encoding, even when that
>   encoding is not UTF-8.
> 
> which I suggest as I recall more clearly that the former condition
> was not that such escapes were always rejected in other encodings; it was
> that they were rejected if they represented characters outside of ASCII.
> (Yossarian let out a respectful whistle.)

I like your wording, but the "that encoding" wasn't clear enough for me,
so I reworded it to:

Allow Unicode escapes, e.g., E'\u', U&'\', to represent any
character available in the database encoding, even when the database
encoding is not UTF-8 (Tom Lane)

> My inclination is to give at least one example each of the E and U&
> form, if only so the casual reader of the notes may think "say! I hadn't
> heard of that other form!" and be inspired to find out about it. But
> perhaps it seems too much.

Sure, works for me.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Mon, May 11, 2020 at 08:41:00PM +0300, Alexander Korotkov wrote:
> On Mon, May 11, 2020 at 4:20 PM Bruce Momjian  wrote:
> > Sorry for the delay in replying.  Yes, I agree we should list all of
> > those operator class cases where we now take arguments.  I looked at the
> > patch but got confused and missed the doc changes that clearly need to
> > be in the release notes.  I see these operator classes now taking
> > parameters, as you helpfully listed in your commit message:
> >
> > tsvector_ops
> > gist_ltree_ops
> > gist__ltree_ops
> > gist_trgm_ops
> > gist_hstore_ops
> > gist__int_ops
> > gist__intbig_ops
> >
> > I assume the double-underscore is because the first underscore is to
> > separate words, and the second one is for the array designation, right?
> 
> Yes, this is true.

OK.

> > So my big question is whether people will understand when they are using
> > these operator classes, since many of them are defaults.  Can you use an
> > operator class parameter when you are just using the default operator
> > class and not specifying its name?
> 
> Actually no.  Initial version of patch allowed to explicitly specify
> DEFAULT keyword instead of opclass name.  But I didn't like idea to
> allow keyword instead of name there.
> 
> I've tried to implement syntax allowing specifying parameters without
> both new keyword and opclass name, but that causes a lot of grammar
> problems.
> 
> Finally, I've decided to provide parameters functionality only when
> specifying opclass name.  My motivation is that opclass parameters is
> functionality for advanced users, who are deeply concerned into what
> opclass do.  For this category of users it's natural to know the
> opclass name.

Yes, that is good analysis, and your final decision was probably
correct.  I now see that the complexity is not a big problem.

> > What I thinking  that just saying
> > the operator class take arguments might not be helpful.  I think I see
> > enough detail in the documentation to write release note items for
> > these, but I will have to point out they need to specify the operator
> > class, even if it is the default, right?
> 
> My point was that we should specify where to look to find new
> functionality.  We can don't write opclass names, because those names
> might be confusing for users who are not aware of them.  We may
> briefly say that new parameters are introduced for GiST for tsvector,
> contrib/intarray, contrib/pg_trgm, contrib/ltree, contrib/hstore.
> What do you think?

OK, I have applied the attached patch, which I now think is the right
level of detail, given your information above.  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/release-13.sgml b/doc/src/sgml/release-13.sgml
new file mode 100644
index 239586c..c0840c2
*** a/doc/src/sgml/release-13.sgml
--- b/doc/src/sgml/release-13.sgml
*** Author: Alexander Korotkov 
  
+ 
+ Indexes created on four and eight-byte integer array, tsvector, trigram,
+ ltree, and hstore columns can now control these GiST index parameters,
+ rather than using the defaults.

Remaining routine pre-beta tasks

2020-05-11 Thread Tom Lane

There are a couple of things left to do yet:

* Collapse temporary OID assignments down to the permanent range.
AFAICS there's no reason to delay this step any longer, since there
aren't any open issues that seem likely to touch the catalog data.

* In recent years we've usually done a preliminary pgindent run before
beta1 (and then another right before making the stable branch).  I've
been holding off on this to avoid breaking WIP patches for open issues,
but we really ought to be close to done on those.

Barring objections, I'll do the OID renumbering tomorrow (Tuesday)
and a pgindent run on Thursday.

regards, tom lane

Re: PG 13 release notes, first draft

2020-05-11 Thread Alvaro Herrera

On 2020-May-11, Alvaro Herrera wrote:

> Hello
> 
> > +
> > +
> > +
> > +Add psql commands to report operator classes and operator families (Sergey 
> > Cherkashin, Nikita Glukhov, Alexander Korotkov)
> > +
> 
> I think this item should list the commands in question:
> \dA, \dAc, \dAf, \dAo, \dAp
> (All the other psql entries in the relnotes do that).

Sorry, it's the last four only -- \dA is an older command.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Show opclass and opfamily related information in psql

2020-05-11 Thread Alvaro Herrera

On 2020-Mar-08, Alexander Korotkov wrote:

> Show opclass and opfamily related information in psql
> 
> This commit provides psql commands for listing operator classes, operator
> families and its contents in psql.  New commands will be useful for exploring
> capabilities of both builtin opclasses/opfamilies as well as
> opclasses/opfamilies defined in extensions.

I had chance to use these new commands this morning.  I noticed the
ORDER BY clause of \dAo is not very useful; for example:

=# \dAo+ brin datetime_minmax_ops 
 List of operators of operator 
families
  AM  │Opfamily Name│   Operator
│ Strategy │ Purpose │ Sort opfamily 
──┼─┼───┼──┼─┼───
 brin │ datetime_minmax_ops │ < (date, date)
│1 │ search  │ 
 brin │ datetime_minmax_ops │ < (date, timestamp with time zone)
│1 │ search  │ 
 brin │ datetime_minmax_ops │ < (date, timestamp without time zone) 
│1 │ search  │ 
 brin │ datetime_minmax_ops │ < (timestamp with time zone, date)
│1 │ search  │ 
 brin │ datetime_minmax_ops │ < (timestamp with time zone, timestamp with time 
zone)│1 │ search  │ 
 brin │ datetime_minmax_ops │ < (timestamp with time zone, timestamp without 
time zone) │1 │ search  │ 
 brin │ datetime_minmax_ops │ < (timestamp without time zone, date) 
│1 │ search  │ 
 brin │ datetime_minmax_ops │ < (timestamp without time zone, timestamp with 
time zone) │1 │ search  │ 
 brin │ datetime_minmax_ops │ < (timestamp without time zone, timestamp without 
time zone)  │1 │ search  │ 
 brin │ datetime_minmax_ops │ <= (date, date)   
│2 │ search  │ 
 brin │ datetime_minmax_ops │ <= (date, timestamp with time zone)   
│2 │ search  │ 
 brin │ datetime_minmax_ops │ <= (date, timestamp without time zone)
│2 │ search  │ 
 brin │ datetime_minmax_ops │ <= (timestamp with time zone, date)   
│2 │ search  │ 
 brin │ datetime_minmax_ops │ <= (timestamp with time zone, timestamp with time 
zone)   │2 │ search  │ 
 brin │ datetime_minmax_ops │ <= (timestamp with time zone, timestamp without 
time zone)│2 │ search  │ 

Note how operator for strategy 1 are all together, then strategy 2, and
so on.  But I think we'd prefer the operators to be grouped together for
the same types (just like \dAp already works); so I would change the clause
from:
  ORDER BY 1, 2, o.amopstrategy, 3;
to:
  ORDER BY 1, 2, pg_catalog.format_type(o.amoplefttype, NULL), 
pg_catalog.format_type(o.amoprighttype, NULL), o.amopstrategy;

which gives this table:

  AM  │Opfamily Name│   Operator
│ Strategy │ Purpose │ Sort opfamily 
──┼─┼───┼──┼─┼───
 brin │ datetime_minmax_ops │ < (date, date)
│1 │ search  │ 
 brin │ datetime_minmax_ops │ <= (date, date)   
│2 │ search  │ 
 brin │ datetime_minmax_ops │ = (date, date)
│3 │ search  │ 
 brin │ datetime_minmax_ops │ >= (date, date)   
│4 │ search  │ 
 brin │ datetime_minmax_ops │ > (date, date)
│5 │ search  │ 
 brin │ datetime_minmax_ops │ < (date, timestamp with time zone)
│1 │ search  │ 
 brin │ datetime_minmax_ops │ <= (date, timestamp with time zone)   
│2 │ search  │ 
 brin │ datetime_minmax_ops │ = (date, timestamp with time zone)
│3 │ search  │ 
 brin │ datetime_minmax_ops │ >= (date, timestamp with time zone)   
│4 │ search  │ 
 brin │ datetime_minmax_ops │ > (date, timestamp with time zone)
│5 │ search  │ 

Also, while I'm going about this, ISTM it'd make sense to
list same-class operators first, followed by cross-class operators.
That requires to add "o.amoplefttype = o.amoprighttype DESC," after
"ORDER BY 1, 2,".  For brin's integer_minmax_ops, the resulting list
would have first (bigint,bigint) then (integer,integer) then
(smallint,smallint), then all the rest:

 brin   │ integer_minmax_ops│ < (bigint, bigint)
│1 │ search   │ 
 brin   │ integer_minmax_ops

Re: PG 13 release notes, first draft

2020-05-11 Thread Alvaro Herrera

Hello

> +
> +
> +
> +Add psql commands to report operator classes and operator families (Sergey 
> Cherkashin, Nikita Glukhov, Alexander Korotkov)
> +

I think this item should list the commands in question:
\dA, \dAc, \dAf, \dAo, \dAp
(All the other psql entries in the relnotes do that).

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: JSON output from psql

2020-05-11 Thread Gurjeet Singh

On Mon, May 11, 2020 at 1:24 PM Robert Haas  wrote:

> On Fri, May 8, 2020 at 7:32 PM Gurjeet Singh  wrote:
> > There's no standard format that comes to mind, but perhaps an output
> format similar to that of (array of row_to_json()) would be desirable. For
> example, `select relname, relnamespace from pg_class;` would emit the
> following:
> >
> > [
> > {"relname": "pgclass", "relnamespace": 11},
> > {"relname": "pg_statistic", "relnamespace": 11},
> > ]
>
> I don't see why psql needs any special support. You can already
> generate this using the existing server side functions, if you want
> it.
>

That's a good point! It might still be desirable, perhaps for performance
trade-off of JSON conversion on the client-side instead of on the
server-side.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

pg13: xlogreader API adjust

2020-05-11 Thread Alvaro Herrera

Hello

Per discussion in thread [1], I propose the following patch to give
another adjustment to the xlogreader API.  This results in a small but
not insignificat net reduction of lines of code.  What this patch does
is adjust the signature of these new xlogreader callbacks, making the
API simpler.  The changes are:

* the segment_open callback installs the FD in xlogreader state itself,
  instead of passing the FD back.  This was suggested by Kyotaro
  Horiguchi in that thread[2].

* We no longer pass segcxt to segment_open; it's in XLogReaderState, 
  which is already an argument.

* We no longer pass seg/segcxt to WALRead; instead, that function takes
  them from XLogReaderState, which is already an argument.
  (This means XLogSendPhysical has to drink more of the fake_xlogreader
  kool-aid.)

I claim the reason to do it now instead of pg14 is to make it simpler
for third-party xlogreader callers to adjust.

(Some might be thinking that I do this to avoid an API change later, but
my guts tell me that we'll adjust xlogreader again in pg14 for the
encryption stuff and other reasons, so.)


[1] https://postgr.es/m/20200406025651.fpzdb5yyb7qyh...@alap3.anarazel.de
[2] 
https://postgr.es/m/20200508.114228.963995144765118400.horikyota@gmail.com

-- 
Álvaro Herrera Developer, https://www.PostgreSQL.org/
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 7cee8b92c9..4b6f4ada6e 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1050,8 +1050,6 @@ err:
  * Read 'count' bytes into 'buf', starting at location 'startptr', from WAL
  * fetched from timeline 'tli'.
  *
- * 'seg/segcxt' identify the last segment used.
- *
  * Returns true if succeeded, false if an error occurs, in which case
  * 'errinfo' receives error details.
  *
@@ -1061,7 +1059,6 @@ err:
 bool
 WALRead(XLogReaderState *state,
 		char *buf, XLogRecPtr startptr, Size count, TimeLineID tli,
-		WALOpenSegment *seg, WALSegmentContext *segcxt,
 		WALReadError *errinfo)
 {
 	char	   *p;
@@ -1078,34 +1075,34 @@ WALRead(XLogReaderState *state,
 		int			segbytes;
 		int			readbytes;
 
-		startoff = XLogSegmentOffset(recptr, segcxt->ws_segsize);
+		startoff = XLogSegmentOffset(recptr, state->segcxt.ws_segsize);
 
 		/*
 		 * If the data we want is not in a segment we have open, close what we
 		 * have (if anything) and open the next one, using the caller's
 		 * provided openSegment callback.
 		 */
-		if (seg->ws_file < 0 ||
-			!XLByteInSeg(recptr, seg->ws_segno, segcxt->ws_segsize) ||
-			tli != seg->ws_tli)
+		if (state->seg.ws_file < 0 ||
+			!XLByteInSeg(recptr, state->seg.ws_segno, state->segcxt.ws_segsize) ||
+			tli != state->seg.ws_tli)
 		{
 			XLogSegNo	nextSegNo;
 
-			if (seg->ws_file >= 0)
+			if (state->seg.ws_file >= 0)
 state->routine.segment_close(state);
 
-			XLByteToSeg(recptr, nextSegNo, segcxt->ws_segsize);
-			seg->ws_file = state->routine.segment_open(state, nextSegNo,
-	   segcxt, );
+			XLByteToSeg(recptr, nextSegNo, state->segcxt.ws_segsize);
+			state->routine.segment_open(state, nextSegNo, );
+			Assert(state->seg.ws_file >= 0);	/* shouldn't happen */
 
 			/* Update the current segment info. */
-			seg->ws_tli = tli;
-			seg->ws_segno = nextSegNo;
+			state->seg.ws_tli = tli;
+			state->seg.ws_segno = nextSegNo;
 		}
 
 		/* How many bytes are within this segment? */
-		if (nbytes > (segcxt->ws_segsize - startoff))
-			segbytes = segcxt->ws_segsize - startoff;
+		if (nbytes > (state->segcxt.ws_segsize - startoff))
+			segbytes = state->segcxt.ws_segsize - startoff;
 		else
 			segbytes = nbytes;
 
@@ -1115,7 +1112,7 @@ WALRead(XLogReaderState *state,
 
 		/* Reset errno first; eases reporting non-errno-affecting errors */
 		errno = 0;
-		readbytes = pg_pread(seg->ws_file, p, segbytes, (off_t) startoff);
+		readbytes = pg_pread(state->seg.ws_file, p, segbytes, (off_t) startoff);
 
 #ifndef FRONTEND
 		pgstat_report_wait_end();
@@ -1127,7 +1124,7 @@ WALRead(XLogReaderState *state,
 			errinfo->wre_req = segbytes;
 			errinfo->wre_read = readbytes;
 			errinfo->wre_off = startoff;
-			errinfo->wre_seg = *seg;
+			errinfo->wre_seg = state->seg;
 			return false;
 		}
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index bbd801513a..0fc27215df 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -784,31 +784,28 @@ XLogReadDetermineTimeline(XLogReaderState *state, XLogRecPtr wantPage, uint32 wa
 }
 
 /* XLogReaderRoutine->segment_open callback for local pg_wal files */
-int
+void
 wal_segment_open(XLogReaderState *state, XLogSegNo nextSegNo,
- WALSegmentContext *segcxt, TimeLineID *tli_p)
+ TimeLineID *tli_p)
 {
 	TimeLineID	tli = *tli_p;
 	char		path[MAXPGPATH];
-	int			fd;
 
-	XLogFilePath(path, tli, nextSegNo, segcxt->ws_segsize);
-	fd = BasicOpenFile(path, O_RDONLY | PG_BINARY);
-	if

Re: making update/delete of inheritance trees scale better

2020-05-11 Thread Robert Haas

On Mon, May 11, 2020 at 4:22 PM Tom Lane  wrote:
> Robert Haas  writes:
> > If the parent is RTI 1, and the children are RTIs 2..6, what
> > varno/varattno will we use in RTI 1's tlist to represent a column that
> > exists in both RTI 2 and RTI 3 but not in RTI 1, 4, 5, or 6?
>
> Fair question.  We don't have any problem representing the column
> as it exists in any one of those children, but we lack a notation
> for the "union" or whatever you want to call it, except in the case
> where the parent relation has a corresponding column.  Still, this
> doesn't seem that hard to fix.  My inclination would be to invent
> dummy parent-rel columns (possibly with negative attnums? not sure if
> that'd be easier or harder than adding them in the positive direction)
> to represent such "union" columns.

Ah, that makes sense. If we can invent dummy columns on the parent
rel, then most of what I was worrying about no longer seems very
worrying.

I'm not sure what's involved in inventing such dummy columns, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: No core file generated after PostgresNode->start

2020-05-11 Thread Antonin Houska

Robert Haas  wrote:

> On Sun, May 10, 2020 at 11:21 PM Andy Fan  wrote:
> > Looks this doesn't mean a crash.   If the test 
> > case(subscription/t/013_partition.pl)
> > failed,  test framework kill some process, which leads the above message.  
> > So you can
> > ignore this issue now.  Thanks
> 
> I think there might be a real issue here someplace, though, because I
> couldn't get a core dump last week when I did have a crash happening
> locally. I didn't poke into it very hard though so I never figured out
> exactly why not, but ulimit -c unlimited didn't help.

Could "sysctl kernel.core_pattern" be the problem? I discovered this setting
sometime when I also couldn't find the core dump on linux.


-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: JSON output from psql

2020-05-11 Thread Robert Haas

On Fri, May 8, 2020 at 7:32 PM Gurjeet Singh  wrote:
> There's no standard format that comes to mind, but perhaps an output format 
> similar to that of (array of row_to_json()) would be desirable. For example, 
> `select relname, relnamespace from pg_class;` would emit the following:
>
> [
> {"relname": "pgclass", "relnamespace": 11},
> {"relname": "pg_statistic", "relnamespace": 11},
> ]

I don't see why psql needs any special support. You can already
generate this using the existing server side functions, if you want
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: making update/delete of inheritance trees scale better

2020-05-11 Thread Tom Lane

Robert Haas  writes:
> If the parent is RTI 1, and the children are RTIs 2..6, what
> varno/varattno will we use in RTI 1's tlist to represent a column that
> exists in both RTI 2 and RTI 3 but not in RTI 1, 4, 5, or 6?

Fair question.  We don't have any problem representing the column
as it exists in any one of those children, but we lack a notation
for the "union" or whatever you want to call it, except in the case
where the parent relation has a corresponding column.  Still, this
doesn't seem that hard to fix.  My inclination would be to invent
dummy parent-rel columns (possibly with negative attnums? not sure if
that'd be easier or harder than adding them in the positive direction)
to represent such "union" columns.  This concept would only need to
exist within the planner I think, since after setrefs.c there'd be no
trace of those dummy columns.

> I think we have quite a bit of code that expects to be able to translate
> between parent-rel expressions and child-rel expressions, and that's
> going to be pretty problematic here.

... shrug.  Sure, we'll need to be able to do that mapping.  Why will
it be any harder than any other parent <-> child mapping?  The planner
would know darn well what the mapping is while it's inventing the
dummy columns, so it just has to keep that info around for use later.

> Maybe your answer is - let's just fix all that stuff. That could well
> be right, but my first reaction is to think that it sounds hard.

I have to think that it'll net out as less code, and certainly less
complicated code, than trying to extend inheritance_planner in its
current form to do what we wish it'd do.

regards, tom lane

Re: gcov coverage data not full with immediate stop

2020-05-11 Thread Robert Haas

On Mon, May 11, 2020 at 4:04 PM Tom Lane  wrote:
> This is the wrong thread to be debating that in, though.

True.

> Also I wonder if this is really RMT turf?

I think it is, but the RMT is permitted -- even encouraged -- to
consider the views of non-RMT members when making its decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: gcov coverage data not full with immediate stop

2020-05-11 Thread Tom Lane

Robert Haas  writes:
> I agree, but also, we should start thinking about when to branch. I,
> too, have patches that aren't critical enough to justify pushing them
> post-freeze, but which are still good improvements that I'd like to
> get into the tree. I'm queueing them right now to avoid the risk of
> destabilizing things, but that generates more work, for me and for
> other people, if their patches force me to rebase or the other way
> around. I know there's always a concern with removing the focus on
> release N too soon, but the open issues list is 3 items long right
> now, and 2 of those look like preexisting issues, not new problems in
> v13. Meanwhile, we have 20+ active committers.

Yeah.  Traditionally we've waited till the start of the next commitfest
(which I'm assuming is July 1, for lack of an Ottawa dev meeting to decide
differently).  But it seems like things are slow enough that perhaps
we could branch earlier, like June 1, and give the committers a chance
to deal with some of their own stuff before starting the CF.

This is the wrong thread to be debating that in, though.  Also I wonder
if this is really RMT turf?

regards, tom lane

Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)

2020-05-11 Thread Julien Rouhaud

On Mon, May 11, 2020 at 4:40 PM Tom Lane  wrote:
>
> Julien Rouhaud  writes:
> > On Mon, May 11, 2020 at 3:41 PM Tom Lane  wrote:
> >> Why?  It uses "fallthrough" which is a legal spelling per level 4.
>
> > GCC documentation mentions [ \t]*FALLTHR(OUGH|U)[ \t]* for level 4
> > (out of the view other alternatives), which AFAICT is case sensitive
> > (level 3 has fall(s | |-)?thr(ough|u)[ \t.!]*(-[^\n\r]*)?).
>
> Oh, I'd missed that that was case sensitive.  Ugh --- that seems
> unreasonable.  Maybe we'd better settle for level 3 after all;
> I don't think there's much room to doubt the intentions of a
> comment spelled that way.

Agreed.

Re: COPY, lock release and MVCC

2020-05-11 Thread Robert Haas

On Fri, May 8, 2020 at 4:58 AM Laurenz Albe  wrote:
> I happened to notice that COPY TO releases the ACCESS SHARE lock
> on the table right when the command ends rather than holding it
> until the end of the transaction:

That seems inconsistent with what an INSERT statement would do, and thus bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: No core file generated after PostgresNode->start

2020-05-11 Thread Robert Haas

On Sun, May 10, 2020 at 11:21 PM Andy Fan  wrote:
> Looks this doesn't mean a crash.   If the test 
> case(subscription/t/013_partition.pl)
> failed,  test framework kill some process, which leads the above message.  So 
> you can
> ignore this issue now.  Thanks

I think there might be a real issue here someplace, though, because I
couldn't get a core dump last week when I did have a crash happening
locally. I didn't poke into it very hard though so I never figured out
exactly why not, but ulimit -c unlimited didn't help.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: gcov coverage data not full with immediate stop

2020-05-11 Thread Robert Haas

On Mon, May 11, 2020 at 12:56 AM Tom Lane  wrote:
> > I think we should definitely get this fixed for pg13 ...
>
> -1 for shoving in such a thing so late in the cycle.  We've survived
> without it for years, we can do so for a few months more.

I agree, but also, we should start thinking about when to branch. I,
too, have patches that aren't critical enough to justify pushing them
post-freeze, but which are still good improvements that I'd like to
get into the tree. I'm queueing them right now to avoid the risk of
destabilizing things, but that generates more work, for me and for
other people, if their patches force me to rebase or the other way
around. I know there's always a concern with removing the focus on
release N too soon, but the open issues list is 3 items long right
now, and 2 of those look like preexisting issues, not new problems in
v13. Meanwhile, we have 20+ active committers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: making update/delete of inheritance trees scale better

2020-05-11 Thread Robert Haas

On Mon, May 11, 2020 at 2:48 PM Tom Lane  wrote:
> > There doesn't seem to be any execution-time
> > problem with such a representation, but there might be a planning-time
> > problem with building it,
>
> Possibly.  We manage to cope with not-all-alike children now, of course,
> but I think it might be true that no one plan node has Vars from
> dissimilar children.  Even so, the Vars are self-identifying, so it
> seems like this ought to be soluble.

If the parent is RTI 1, and the children are RTIs 2..6, what
varno/varattno will we use in RTI 1's tlist to represent a column that
exists in both RTI 2 and RTI 3 but not in RTI 1, 4, 5, or 6?

I suppose the answer is 2 - or 3, but I guess we'd pick the first
child as the representative of the class. We surely can't use varno 1,
because then there's no varattno that makes any sense. But if we use
2, now we have the tlist for RTI 1 containing expressions with a
child's RTI as the varno. I could be wrong, but I think that's going
to make setrefs.c throw up and die, and I wouldn't be very surprised
if there were a bunch of other things that crashed and burned, too. I
think we have quite a bit of code that expects to be able to translate
between parent-rel expressions and child-rel expressions, and that's
going to be pretty problematic here.

Maybe your answer is - let's just fix all that stuff. That could well
be right, but my first reaction is to think that it sounds hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: refactoring basebackup.c

2020-05-11 Thread Robert Haas

On Fri, May 8, 2020 at 5:27 PM Andres Freund  wrote:
> I wonder if there's cases where recursively forwarding like this will
> cause noticable performance effects. The only operation that seems
> frequent enough to potentially be noticable would be "chunks" of the
> file. So perhaps it'd be good to make sure we read in large enough
> chunks?

Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.

> This, to me, indicates that we should measure the progress solely based
> on how much of the "source" data was processed. The overhead of tar, the
> reduction due to compression, shouldn't be included.

I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.

There are some other problems with the current wire protocol, too:

1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.

2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.

I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:

t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.

> I've not though enough about the specifics, but I think it looks like
> it's going roughly in a better direction.

Good to hear.

> One thing I wonder about is how stateful the interface is. Archivers
> will pretty much always track which file is currently open etc. Somehow
> such a repeating state machine seems a bit ugly - but I don't really
> have a better answer.

I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: making update/delete of inheritance trees scale better

2020-05-11 Thread Tom Lane

Robert Haas  writes:
> I believe that you'd want to have happen here is for each child to
> emit the row identity columns that it knows about, and emit NULL for
> the others. Then when you do the Append you end up with a row format
> that includes all the individual identity columns, but for any
> particular tuple, only one set of such columns is populated and the
> others are all NULL.

Yeah, that was what I'd imagined in my earlier thinking about this.

> There doesn't seem to be any execution-time
> problem with such a representation, but there might be a planning-time
> problem with building it,

Possibly.  We manage to cope with not-all-alike children now, of course,
but I think it might be true that no one plan node has Vars from
dissimilar children.  Even so, the Vars are self-identifying, so it
seems like this ought to be soluble.

regards, tom lane

Re: making update/delete of inheritance trees scale better

2020-05-11 Thread Robert Haas

On Mon, May 11, 2020 at 8:58 AM Ashutosh Bapat
 wrote:
> What happens if there's a mixture of foreign and local partitions or
> mixture of FDWs? Injecting junk columns from all FDWs in the top level
> target list will cause error because those attributes won't be
> available everywhere.

I think that we're talking about a plan like this:

Update
-> Append
  -> a bunch of children

I believe that you'd want to have happen here is for each child to
emit the row identity columns that it knows about, and emit NULL for
the others. Then when you do the Append you end up with a row format
that includes all the individual identity columns, but for any
particular tuple, only one set of such columns is populated and the
others are all NULL. There doesn't seem to be any execution-time
problem with such a representation, but there might be a planning-time
problem with building it, because when you're writing a tlist for the
Append node, what varattno are you going to use for the columns that
exist only in one particular child and not the others? The fact that
setrefs processing happens so late seems like an annoyance in this
case.

Maybe it would be easier to have one Update note per kind of row
identity, i.e. if there's more than one such notion then...

Placeholder
-> Update
 -> Append
  -> all children with one notion of row identity
-> Update
 -> Append
  -> all children with another notion of row identity

...and so forth.

But I'm not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel copy

2020-05-11 Thread Robert Haas

I wonder why you're still looking at this instead of looking at just
speeding up the current code, especially the line splitting, per
previous discussion. And then coming back to study this issue more
after that's done.

On Mon, May 11, 2020 at 8:12 AM Amit Kapila  wrote:
> Apart from this, we have analyzed the other cases as mentioned below
> where we need to decide whether we can allow parallelism for the copy
> command.
> Case-1:
> Do we want to enable parallelism for a copy when transition tables are
> involved?

I think it would be OK not to support this.

> Case-2:
> a. When there are BEFORE/INSTEAD OF triggers on the table.
> b. For partitioned tables, we can't support multi-inserts when there
> are any statement-level insert triggers.
> c. For inserts into foreign tables.
> d. If there are volatile default expressions or the where clause
> contains a volatile expression.  Here, we can check if the expression
> is parallel-safe, then we can allow parallelism.

This all sounds fine.

> Case-3:
> In copy command, for performing foreign key checks, we take KEY SHARE
> lock on primary key table rows which inturn will increment the command
> counter and updates the snapshot.  Now, as we share the snapshots at
> the beginning of the command, we can't allow it to be changed later.
> So, unless we do something special for it, I think we can't allow
> parallelism in such cases.

This sounds like much more of a problem to me; it'd be a significant
restriction that would kick in routine cases where the user isn't
doing anything particularly exciting. The command counter presumably
only needs to be updated once per command, so maybe we could do that
before we start parallelism. However, I think we would need to have
some kind of dynamic memory structure to which new combo CIDs can be
added by any member of the group, and then discovered by other members
of the group later. At the end of the parallel operation, the leader
must discover any combo CIDs added by others to that table before
destroying it, even if it has no immediate use for the information. We
can't allow a situation where the group members have inconsistent
notions of which combo CIDs exist or what their mappings are, and if
KEY SHARE locks are being taken, new combo CIDs could be created.

> Case-4:
> For Deferred Triggers, it seems we record CTIDs of tuples (via
> ExecARInsertTriggers->AfterTriggerSaveEvent) and then execute deferred
> triggers at transaction end using AfterTriggerFireDeferred or at end
> of the statement.

I think this could be left for the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 2020-05-14 Press Release Draft

2020-05-11 Thread Justin Pryzby

On Sun, May 10, 2020 at 10:08:46PM -0400, Jonathan S. Katz wrote:
> * Ensure that a detatched partition has triggers that come from its former
> parent removed.

I would have said: "fix for issue which prevented/precluded detaching
partitions which have inherited ROW triggers"

> * Several fixes for `REINDEX CONCURRENTLY`, particular with dealing with issue
>  when a `REINDEX CONCURRENTLY` operation fails.

".. in particular relating to an issue ..."

> * Avoid scanning irrelevant timelines during archive recovery, which can
> eliminate attempts to fetch nonexistent WAL files from archive storage.

I feel like this is phrased backwards.  The goal is to avoid (attempting to)
fetch nonextant WALs, and the mechanism is by skipping timelines.  Maybe:

 * Avoid attempting to fetch nonexistent WAL files from archive storage during
 * recovery by skipping irrelevant timelines.

-- 
Justin

Re: PG 13 release notes, first draft

2020-05-11 Thread Alexander Korotkov

On Mon, May 11, 2020 at 4:20 PM Bruce Momjian  wrote:
> Sorry for the delay in replying.  Yes, I agree we should list all of
> those operator class cases where we now take arguments.  I looked at the
> patch but got confused and missed the doc changes that clearly need to
> be in the release notes.  I see these operator classes now taking
> parameters, as you helpfully listed in your commit message:
>
> tsvector_ops
> gist_ltree_ops
> gist__ltree_ops
> gist_trgm_ops
> gist_hstore_ops
> gist__int_ops
> gist__intbig_ops
>
> I assume the double-underscore is because the first underscore is to
> separate words, and the second one is for the array designation, right?

Yes, this is true.

> So my big question is whether people will understand when they are using
> these operator classes, since many of them are defaults.  Can you use an
> operator class parameter when you are just using the default operator
> class and not specifying its name?

Actually no.  Initial version of patch allowed to explicitly specify
DEFAULT keyword instead of opclass name.  But I didn't like idea to
allow keyword instead of name there.

I've tried to implement syntax allowing specifying parameters without
both new keyword and opclass name, but that causes a lot of grammar
problems.

Finally, I've decided to provide parameters functionality only when
specifying opclass name.  My motivation is that opclass parameters is
functionality for advanced users, who are deeply concerned into what
opclass do.  For this category of users it's natural to know the
opclass name.

> What I thinking  that just saying
> the operator class take arguments might not be helpful.  I think I see
> enough detail in the documentation to write release note items for
> these, but I will have to point out they need to specify the operator
> class, even if it is the default, right?

My point was that we should specify where to look to find new
functionality.  We can don't write opclass names, because those names
might be confusing for users who are not aware of them.  We may
briefly say that new parameters are introduced for GiST for tsvector,
contrib/intarray, contrib/pg_trgm, contrib/ltree, contrib/hstore.
What do you think?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: 2pc leaks fds

2020-05-11 Thread Alvaro Herrera

Hi Antonin, thanks for the review.


On 2020-May-11, Antonin Houska wrote:

> While looking at the changes, I've noticed a small comment issue in the
> XLogReaderRoutine structure definition:
> 
>   * "tli_p" is an input/output argument. XLogRead() uses it to pass the
> 
> The XLogRead() function has been renamed to WALRead() in 0dc8ead4. (This
> incorrect comment was actually introduced by that commit.)

Ah.  I'll fix this, thanks for pointing it out.

(It might be that the TLI situation can be improved with some callback,
too.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: making update/delete of inheritance trees scale better

2020-05-11 Thread Amit Langote

Hi Ashutosh,

Thanks for chiming in.

On Mon, May 11, 2020 at 9:58 PM Ashutosh Bapat
 wrote:
> On Fri, May 8, 2020 at 7:03 PM Amit Langote  wrote:
> > I do think that doing this would be worthwhile even if we may be
> > increasing ModifyTable's per-row overhead slightly, because planning
> > overhead of the current approach is very significant, especially for
> > partition trees with beyond a couple of thousand partitions.  As to
> > how bad the problem is, trying to create a generic plan for `update
> > foo set ... where key = $1`, where foo has over 2000 partitions,
> > causes OOM even on a machine with 6GB of memory.
>
> Per row overhead would be incurred for every row whereas the plan time
> overhead is one-time or in case of a prepared statement almost free.
> So we need to compare it esp. when there are 2000 partitions and all
> of them are being updated.

I assume that such UPDATEs would be uncommon.

> But generally I agree that this would be a
> better approach. It might help using PWJ when the result relation
> joins with other partitioned table.

It does, because the plan below ModifyTable is same as if the query
were SELECT instead of UPDATE; with my PoC:

explain (costs off) update foo set a = foo2.a + 1 from foo foo2 where
foo.a = foo2.a;
QUERY PLAN
--
 Update on foo
   Update on foo_1
   Update on foo_2
   ->  Append
 ->  Merge Join
   Merge Cond: (foo_1.a = foo2_1.a)
   ->  Sort
 Sort Key: foo_1.a
 ->  Seq Scan on foo_1
   ->  Sort
 Sort Key: foo2_1.a
 ->  Seq Scan on foo_1 foo2_1
 ->  Merge Join
   Merge Cond: (foo_2.a = foo2_2.a)
   ->  Sort
 Sort Key: foo_2.a
 ->  Seq Scan on foo_2
   ->  Sort
 Sort Key: foo2_2.a
 ->  Seq Scan on foo_2 foo2_2
(20 rows)

as opposed to what you get today:

explain (costs off) update foo set a = foo2.a + 1 from foo foo2 where
foo.a = foo2.a;
QUERY PLAN
--
 Update on foo
   Update on foo_1
   Update on foo_2
   ->  Merge Join
 Merge Cond: (foo_1.a = foo2.a)
 ->  Sort
   Sort Key: foo_1.a
   ->  Seq Scan on foo_1
 ->  Sort
   Sort Key: foo2.a
   ->  Append
 ->  Seq Scan on foo_1 foo2
 ->  Seq Scan on foo_2 foo2_1
   ->  Merge Join
 Merge Cond: (foo_2.a = foo2.a)
 ->  Sort
   Sort Key: foo_2.a
   ->  Seq Scan on foo_2
 ->  Sort
   Sort Key: foo2.a
   ->  Append
 ->  Seq Scan on foo_1 foo2
 ->  Seq Scan on foo_2 foo2_1
(23 rows)

> > For child result relations that are foreign tables, their FDW adds
> > junk attribute(s) to the query’s targetlist by updating it in-place
> > (AddForeignUpdateTargets).  However, as the child tables will no
> > longer get their own parsetree, we must use some hack around this
> > interface to obtain the foreign table specific junk attributes and add
> > them to the original/parent query’s targetlist.  Assuming that all or
> > most of the children will belong to the same FDW, we will end up with
> > only a handful such junk columns in the final targetlist.  I am not
> > sure if it's worthwhile to change the API of AddForeignUpdateTargets
> > to require FDWs to not scribble on the passed-in parsetree as part of
> > this patch.
>
> What happens if there's a mixture of foreign and local partitions or
> mixture of FDWs? Injecting junk columns from all FDWs in the top level
> target list will cause error because those attributes won't be
> available everywhere.

That is a good question and something I struggled with ever since I
started started thinking about implementing this.

For the problem that FDWs may inject junk columns that could neither
be present in local tables (root parent and other local children) nor
other FDWs, I couldn't think of any solution other than to restrict
what those junk columns can be -- to require them to be either "ctid",
"wholerow", or a set of only *inherited* user columns.  I think that's
what Tom was getting at when he said the following in the email I
cited in my first email:

"...It gets  a bit harder if the tree contains some foreign tables,
because they might have different concepts of row identity, but I'd
think in most cases you could still combine those into a small number
of output columns."

Maybe I misunderstood what Tom said, but I can't imagine how to let
these junk columns be anything that *all* tables contained in an
inheritance tree, especially the root parent, cannot emit, if they are
to be emitted out of a single plan.

> > As for how ModifyTable will create the new tuple for updates, I

Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)

2020-05-11 Thread Tom Lane

Julien Rouhaud  writes:
> On Mon, May 11, 2020 at 3:41 PM Tom Lane  wrote:
>> Why?  It uses "fallthrough" which is a legal spelling per level 4.

> GCC documentation mentions [ \t]*FALLTHR(OUGH|U)[ \t]* for level 4
> (out of the view other alternatives), which AFAICT is case sensitive
> (level 3 has fall(s | |-)?thr(ough|u)[ \t.!]*(-[^\n\r]*)?).

Oh, I'd missed that that was case sensitive.  Ugh --- that seems
unreasonable.  Maybe we'd better settle for level 3 after all;
I don't think there's much room to doubt the intentions of a
comment spelled that way.

regards, tom lane

Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)

2020-05-11 Thread Julien Rouhaud

On Mon, May 11, 2020 at 3:41 PM Tom Lane  wrote:
>
> Julien Rouhaud  writes:
> > Note that timezone/zic.c would also have to be changed.
>
> Why?  It uses "fallthrough" which is a legal spelling per level 4.

GCC documentation mentions [ \t]*FALLTHR(OUGH|U)[ \t]* for level 4
(out of the view other alternatives), which AFAICT is case sensitive
(level 3 has fall(s | |-)?thr(ough|u)[ \t.!]*(-[^\n\r]*)?).

Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)

2020-05-11 Thread Tom Lane

Julien Rouhaud  writes:
> Note that timezone/zic.c would also have to be changed.

Why?  It uses "fallthrough" which is a legal spelling per level 4.

regards, tom lane

Re: PG 13 release notes, first draft

2020-05-11 Thread Bruce Momjian

On Thu, May  7, 2020 at 08:12:30PM +0300, Alexander Korotkov wrote:
> On Thu, May 7, 2020 at 2:46 AM Bruce Momjian  wrote:
> > 
> > Allow CREATE INDEX to specify the GiST signature length and maximum 
> > number of integer ranges (Nikita Glukhov)
> > 
> 
> Should we specify which particular opclasses are affected?  Or at
> least mention it affects core and particular contribs?

Sorry for the delay in replying.  Yes, I agree we should list all of
those operator class cases where we now take arguments.  I looked at the
patch but got confused and missed the doc changes that clearly need to
be in the release notes.  I see these operator classes now taking
parameters, as you helpfully listed in your commit message:

tsvector_ops
gist_ltree_ops
gist__ltree_ops
gist_trgm_ops
gist_hstore_ops
gist__int_ops
gist__intbig_ops

I assume the double-underscore is because the first underscore is to
separate words, and the second one is for the array designation, right?

So my big question is whether people will understand when they are using
these operator classes, since many of them are defaults.  Can you use an
operator class parameter when you are just using the default operator
class and not specifying its name?  What I thinking  that just saying
the operator class take arguments might not be helpful.  I think I see
enough detail in the documentation to write release note items for
these, but I will have to point out they need to specify the operator
class, even if it is the default, right?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: making update/delete of inheritance trees scale better

2020-05-11 Thread Ashutosh Bapat

On Fri, May 8, 2020 at 7:03 PM Amit Langote  wrote:
>
> Here is a sketch for implementing the design that Tom described here:
> https://www.postgresql.org/message-id/flat/357.1550612935%40sss.pgh.pa.us
>
> In short, we would like to have only one plan for ModifyTable to get
> tuples out of to update/delete, not N for N child result relations as
> is done currently.
>
> I suppose things are the way they are because creating a separate plan
> for each result relation makes the job of ModifyTable node very
> simple, which is currently this:
>
> 1. Take the plan's output tuple, extract the tupleid of the tuple to
> update/delete in the currently active result relation,
> 2. If delete, go to 3, else if update, filter out the junk columns
> from the above tuple
> 3. Call ExecUpdate()/ExecDelete() on the result relation with the new
> tuple, if any
>
> If we make ModifyTable do a little more work for the inheritance case,
> we can create only one plan but without "expanding" the targetlist.
> That is, it will contain entries only for attributes that are assigned
> values in the SET clause.  This makes the plan reusable across result
> relations, because all child relations must have those attributes,
> even though the attribute numbers might be different.  Anyway, the
> work that ModifyTable will now have to do is this:
>
> 1. Take the plan's output tuple, extract tupleid of the tuple to
> update/delete and "tableoid"
> 2. Select the result relation to operate on using the tableoid
> 3. If delete, go to 4, else if update, fetch the tuple identified by
> tupleid from the result relation and fill in the unassigned columns
> using that "old" tuple, also filtering out the junk columns
> 4. Call ExecUpdate()/ExecDelete() on the result relation with the new
> tuple, if any
>
> I do think that doing this would be worthwhile even if we may be
> increasing ModifyTable's per-row overhead slightly, because planning
> overhead of the current approach is very significant, especially for
> partition trees with beyond a couple of thousand partitions.  As to
> how bad the problem is, trying to create a generic plan for `update
> foo set ... where key = $1`, where foo has over 2000 partitions,
> causes OOM even on a machine with 6GB of memory.

Per row overhead would be incurred for every row whereas the plan time
overhead is one-time or in case of a prepared statement almost free.
So we need to compare it esp. when there are 2000 partitions and all
of them are being updated. But generally I agree that this would be a
better approach. It might help using PWJ when the result relation
joins with other partitioned table. I am not sure whether that
effectively happens today by partition pruning. More on this later.

>
> The one plan shared by all result relations will be same as the one we
> would get if the query were SELECT, except it will contain junk
> attributes such as ctid needed to identify tuples and a new "tableoid"
> junk attribute if multiple result relations will be present due to
> inheritance.  One major way in which this targetlist differs from the
> current per-result-relation plans is that it won't be passed through
> expand_targetlist(), because the set of unassigned attributes may not
> be unique among children.  As mentioned above, those missing
> attributes will be filled by ModifyTable doing some extra work,
> whereas previously they would have come with the plan's output tuple.
>
> For child result relations that are foreign tables, their FDW adds
> junk attribute(s) to the query’s targetlist by updating it in-place
> (AddForeignUpdateTargets).  However, as the child tables will no
> longer get their own parsetree, we must use some hack around this
> interface to obtain the foreign table specific junk attributes and add
> them to the original/parent query’s targetlist.  Assuming that all or
> most of the children will belong to the same FDW, we will end up with
> only a handful such junk columns in the final targetlist.  I am not
> sure if it's worthwhile to change the API of AddForeignUpdateTargets
> to require FDWs to not scribble on the passed-in parsetree as part of
> this patch.

What happens if there's a mixture of foreign and local partitions or
mixture of FDWs? Injecting junk columns from all FDWs in the top level
target list will cause error because those attributes won't be
available everywhere.

>
> As for how ModifyTable will create the new tuple for updates, I have
> decided to use a ProjectionInfo for each result relation, which
> projects a full, *clean* tuple ready to be put back into the relation.
> When projecting, plan’s output tuple serves as OUTER tuple and the old
> tuple fetched to fill unassigned attributes serves as SCAN tuple.  By
> having this ProjectionInfo also serve as the “junk filter”, we don't
> need JunkFilters.  The targetlist that this projection computes is
> same as that of the result-relation-specific plan.  Initially, I
> thought to generate this "expanded" targetlist

Re: Concurrency bug in amcheck

2020-05-11 Thread Alexander Korotkov

On Tue, Apr 28, 2020 at 4:05 AM Peter Geoghegan  wrote:
> On Mon, Apr 27, 2020 at 4:17 AM Alexander Korotkov
>  wrote:
> > > Assuming it doesn't seem we actually need any items on deleted pages,
> > > I can propose to delete them on primary as well.  That would make
> > > contents of primary and standby more consistent.  What do you think?
> >
> > So, my proposal is following.  Backpatch my fix upthread to 11.  In
> > master additionally make _bt_unlink_halfdead_page() remove page items
> > on primary as well.  Do you have any objections?
>
> What I meant was that we might as well review the behavior of
> _bt_unlink_halfdead_page() here, since we have to change it anyway.
> But I think you're right: No matter what happens or doesn't happen to
> _bt_unlink_halfdead_page(), the fact is that deleted pages may or may
> not have a single remaining item (which was originally the "top
> parent" item from the page at offset number P_HIKEY), now and forever.
> We have to conservatively assume that it could be either state, now
> and forever. That means that we definitely have to give up on the
> check, per your patch. So, +1 for backpatching that back to 11.

Thank you.  I've worked a bit on comments and commit message.  I would
appreciate you review.

> (BTW, I don't think that this is a concurrency issue, except in the
> sense that a test case that recreates the false positive is sensitive
> to concurrency -- I imagine you agree with this.)

Yes, I agree it's related to concurrency, but not exactly concurrency issue.

> I like your idea of making the primary consistent with the REDO
> routine on the master branch only. I wonder if that will make it
> possible to change btree_mask() so that wal_consistency_checking can
> check deleted pages as well. The contents of a deleted page's special
> area do matter, and yet we don't currently verify that it matches (we
> use mask_page_content() within btree_mask() for deleted pages, which
> seems inappropriately broad). In particular, the left and right
> sibling links should be consistent with the primary on a deleted page.

Thank you.  2nd patch is proposed for master and makes btree page
unlink remove all the items from the page being deleted.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Fix-amcheck-for-page-checks-concurrent-to-replay-of-.patch
Description: Binary data


0002-Remove-btree-page-items-after-page-unlink.patch
Description: Binary data

Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)

2020-05-11 Thread Julien Rouhaud

On Mon, May 11, 2020 at 2:07 PM Andrew Dunstan
 wrote:
>
>
> On 5/11/20 7:29 AM, Julien Rouhaud wrote:
> > On Mon, May 11, 2020 at 6:47 AM Tom Lane  wrote:
> >> Alvaro Herrera  writes:
> >>> On 2020-May-06, Alvaro Herrera wrote:
>  This doesn't allow whitespace between "fall" and "through", which means
>  we generate 217 such warnings currently.  Or we can just use
>  -Wimplicit-fallthrough=3, which does allow whitespace (among other
>  detritus).
> >>> If we're OK with patching all those places, I volunteer to do so.  Any
> >>> objections?  Or I can keep it at level 3, which can be done with minimal
> >>> patching.
> >> If we're gonna do it at all, I think we should go for the level 4
> >> warnings.  Level 3's idea of a fallthrough comment is too liberal
> >> for my tastes...
> > Here's a patch that also takes care of cleaning all warning due to the
> > level 4 setting (at least the one I got with my other config options).
> > I went with "FALLTHRU" as this seemed more used.
> >
> > Note that timezone/zic.c would also have to be changed.
>
>
>
> Since this is external code maybe we should leave that at level 3? I
> think that should be doable via a Makefile override.

Yes that was my concern.  The best way I found to avoid changing zic.c
is something like that in src/timezone/Makefile, before the zic
target:

ifneq (,$(filter -Wimplicit-fallthrough=4,$(CFLAGS)))
CFLAGS := $(filter-out -Wimplicit-fallthrough=4,$(CFLAGS))
CFLAGS += '-Wimplicit-fallthrough=3'
endif

Re: gcov coverage data not full with immediate stop

2020-05-11 Thread Ashutosh Bapat

On Mon, May 11, 2020 at 2:30 PM Alexander Lakhin  wrote:

>
> But I can confirm that calling __gcov_flush() in SignalHandlerForCrashExit() 
> really improves a code coverage report.
>
> Finally I've managed to get an expected coverage when I performed 
> $node_standby->stop() (but __gcov_flush() in SignalHandlerForCrashExit() 
> helps too).

What happens if a coverage tool other than gcov is used? From that
perspective, it's better to perform a clean shutdown in the TAP tests
instead of immediate if that's possible.

-- 
Best Wishes,
Ashutosh Bapat

Re: Parallel copy

2020-05-11 Thread Amit Kapila

On Wed, Apr 15, 2020 at 11:49 PM Andres Freund  wrote:
>
> To be clear, I was only thinking of using a ringbuffer to indicate split
> boundaries. And that workers would just pop entries from it before they
> actually process the data (stored outside of the ringbuffer). Since the
> split boundaries will always be read in order by workers, and the
> entries will be tiny, there's no need to avoid copying out entries.
>

I think the binary mode processing will be slightly different because
unlike text and csv format, the data is stored in Length, Value format
for each column and there are no line markers.  I don't think there
will be a big difference but still, we need to somewhere keep the
information what is the format of data in ring buffers.  Basically, we
can copy the data in Length, Value format and once the writers know
about the format, they will parse the data in the appropriate format.
We currently also have a different way of parsing the binary format,
see NextCopyFrom.  I think we need to be careful about avoiding
duplicate work as much as possible.

Apart from this, we have analyzed the other cases as mentioned below
where we need to decide whether we can allow parallelism for the copy
command.
Case-1:
Do we want to enable parallelism for a copy when transition tables are
involved?  Basically, during the copy, we do capture tuples in
transition tables for certain cases like when after statement trigger
accesses the same relation on which we have a trigger.  See the
example below [1].  We decide this in function
MakeTransitionCaptureState. For such cases, we collect minimal tuples
in the tuple store after processing them so that later after statement
triggers can access them.  Now, if we want to enable parallelism for
such cases, we instead need to store and access tuples from shared
tuple store (sharedtuplestore.c/sharedtuplestore.h).  However, it
doesn't have the facility to store tuples in-memory, so we always need
to store and access from a file which could be costly unless we also
have an additional way to store minimal tuples in shared memory till
work_memory and then in shared tuple store.  It is possible to do all
this or part of this work to enable parallel copy for such cases but I
am not sure if it is worth it. We can decide to not enable parallelism
for such cases and later allow if we see demand for the same and it
will also help us to not introduce additional work/complexity in the
first version of the patch.

Case-2:
The Single Insertion mode (CIM_SINGLE) is performed in various
scenarios and whether we can allow parallelism for those depends on
case to case basis which is discussed below:
a. When there are BEFORE/INSTEAD OF triggers on the table.  We don't
allow multi-inserts in such cases because such triggers might query
the table we're inserting into and act differently if the tuples that
have already been processed and prepared for insertion are not there.
Now, if we allow parallelism with such triggers the behavior would
depend on if the parallel worker has already inserted or not that
particular row.  I guess such functions should ideally be marked as
parallel-unsafe.  So, in short in this case whether to allow
parallelism or not depends upon the parallel-safety marking of this
function.
b. For partitioned tables, we can't support multi-inserts when there
are any statement-level insert triggers.  This is because as of now,
we expect that any before row insert and statement-level insert
triggers are on the same relation.  Now, there is no harm in allowing
parallelism for such cases but it depends upon if we have the
infrastructure (basically allow tuples to be collected in shared tuple
store) to support statement-level insert triggers.
c. For inserts into foreign tables.  We can't allow the parallelism in
this case because each worker needs to establish the FDW connection
and operate in a separate transaction.  Now unless we have a
capability to provide a two-phase commit protocol for "Transactions
involving multiple postgres foreign servers" (which is being discussed
in a separate thread [2]), we can't allow this.
d. If there are volatile default expressions or the where clause
contains a volatile expression.  Here, we can check if the expression
is parallel-safe, then we can allow parallelism.

Case-3:
In copy command, for performing foreign key checks, we take KEY SHARE
lock on primary key table rows which inturn will increment the command
counter and updates the snapshot.  Now, as we share the snapshots at
the beginning of the command, we can't allow it to be changed later.
So, unless we do something special for it, I think we can't allow
parallelism in such cases.

I couldn't think of many problems if we allow parallelism in such
cases.  One inconsistency, if we allow FK checks via workers, would be
that at the end of COPY the value of command_counter will not be what
we expect as we wouldn't have accounted for that from workers.  Now,
if COPY is being done in a

Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)

2020-05-11 Thread Andrew Dunstan



On 5/11/20 7:29 AM, Julien Rouhaud wrote:
> On Mon, May 11, 2020 at 6:47 AM Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> On 2020-May-06, Alvaro Herrera wrote:
 This doesn't allow whitespace between "fall" and "through", which means
 we generate 217 such warnings currently.  Or we can just use
 -Wimplicit-fallthrough=3, which does allow whitespace (among other
 detritus).
>>> If we're OK with patching all those places, I volunteer to do so.  Any
>>> objections?  Or I can keep it at level 3, which can be done with minimal
>>> patching.
>> If we're gonna do it at all, I think we should go for the level 4
>> warnings.  Level 3's idea of a fallthrough comment is too liberal
>> for my tastes...
> Here's a patch that also takes care of cleaning all warning due to the
> level 4 setting (at least the one I got with my other config options).
> I went with "FALLTHRU" as this seemed more used.
>
> Note that timezone/zic.c would also have to be changed.



Since this is external code maybe we should leave that at level 3? I
think that should be doable via a Makefile override.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)

2020-05-11 Thread Julien Rouhaud

On Mon, May 11, 2020 at 6:47 AM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > On 2020-May-06, Alvaro Herrera wrote:
> >> This doesn't allow whitespace between "fall" and "through", which means
> >> we generate 217 such warnings currently.  Or we can just use
> >> -Wimplicit-fallthrough=3, which does allow whitespace (among other
> >> detritus).
>
> > If we're OK with patching all those places, I volunteer to do so.  Any
> > objections?  Or I can keep it at level 3, which can be done with minimal
> > patching.
>
> If we're gonna do it at all, I think we should go for the level 4
> warnings.  Level 3's idea of a fallthrough comment is too liberal
> for my tastes...

Here's a patch that also takes care of cleaning all warning due to the
level 4 setting (at least the one I got with my other config options).
I went with "FALLTHRU" as this seemed more used.

Note that timezone/zic.c would also have to be changed.
diff --git a/configure b/configure
index 83abe872aa..4dfe4fcfe1 100755
--- a/configure
+++ b/configure
@@ -5552,6 +5552,97 @@ if test x"$pgac_cv_prog_CXX_cxxflags__Wmissing_format_attribute" = x"yes"; then
 fi
 
 
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -Wimplicit-fallthrough=4, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -Wimplicit-fallthrough=4, for CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_4+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS} -Wimplicit-fallthrough=4"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_4=yes
+else
+  pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_4=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_4" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_4" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_4" = x"yes"; then
+  CFLAGS="${CFLAGS} -Wimplicit-fallthrough=4"
+fi
+
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -Wimplicit-fallthrough=4, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports -Wimplicit-fallthrough=4, for CXXFLAGS... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_4+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS} -Wimplicit-fallthrough=4"
+ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+ac_cxx_werror_flag=yes
+ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_4=yes
+else
+  pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_4=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+ac_cxx_werror_flag=$ac_save_cxx_werror_flag
+CXXFLAGS="$pgac_save_CXXFLAGS"
+CXX="$pgac_save_CXX"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_4" >&5
+$as_echo "$pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_4" >&6; }
+if test x"$pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_4" = x"yes"; then
+  CXXFLAGS="${CXXFLAGS} -Wimplicit-fallthrough=4"
+fi
+
+
   # This was included in -Wall/-Wformat in older GCC versions
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -Wformat-security, for CFLAGS" >&5
diff --git a/configure.in b/configure.in
index ecdf172396..94fba67261 100644
--- a/configure.in
+++ b/configure.in
@@ -496,6 +496,8 @@ if test "$GCC" = yes -a "$ICC" = no; then
   PGAC_PROG_CXX_CFLAGS_OPT([-Wendif-labels])
   PGAC_PROG_CC_CFLAGS_OPT([-Wmissing-format-attribute])
   PGAC_PROG_CXX_CFLAGS_OPT([-Wmissing-format-attribute])
+  PGAC_PROG_CC_CFLAGS_OPT([-Wimplicit-fallthrough=4])
+  PGAC_PROG_CXX_CFLAGS_OPT([-Wimplicit-fallthrough=4])
   # This was included in -Wall/-Wformat in older GCC versions
   PGAC_PROG_CC_CFLAGS_OPT([-Wformat-security])
   PGAC_PROG_CXX_CFLAGS_OPT([-Wformat-security])
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index

Re: Index Skip Scan

2020-05-11 Thread Dmitry Dolgov

> On Mon, May 11, 2020 at 04:04:00PM +0530, Dilip Kumar wrote:
>
> > > +static inline bool
> > > +_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
> > > + Buffer buf, ScanDirection dir)
> > > +{
> > > + OffsetNumber low, high;
> > > + Page page = BufferGetPage(buf);
> > > + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> > > +
> > > + low = P_FIRSTDATAKEY(opaque);
> > > + high = PageGetMaxOffsetNumber(page);
> > > +
> > > + if (unlikely(high < low))
> > > + return false;
> > > +
> > > + return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
> > > + _bt_compare(scan->indexRelation, key, page, high) < 1);
> > > +}
> > >
> > > I think the high key condition should be changed to
> > > _bt_compare(scan->indexRelation, key, page, high) < 0 ?  Because if
> > > prefix qual is equal to the high key then also there is no point in
> > > searching on the current page so we can directly skip.
> >
> > From nbtree/README and comments to functions like _bt_split I've got an
> > impression that the high key could be equal to the last item on the leaf
> > page, so there is a point in searching. Is that incorrect?
>
> But IIUC,  here we want to decide whether we will get the next key in
> the current page or not?

In general this function does what it says, it checks wether or not the
provided scankey could be found within the page. All the logic about
finding a proper next key to fetch is implemented on the call site, and
within this function we want to test whatever was passed in. Does it
answer the question?

Re: MultiXact\SLRU buffers configuration

2020-05-11 Thread Andrey M. Borodin



> 8 мая 2020 г., в 21:36, Andrey M. Borodin  написал(а):
> 
> *** The problem ***
> I'm investigating some cases of reduced database performance due to 
> MultiXactOffsetLock contention (80% MultiXactOffsetLock, 20% IO DataFileRead).
> The problem manifested itself during index repack and constraint validation. 
> Both being effectively full table scans.
> The database workload contains a lot of select for share\select for update 
> queries. I've tried to construct synthetic world generator and could not 
> achieve similar lock configuration: I see a lot of different locks in wait 
> events, particularly a lot more MultiXactMemberLocks. But from my experiments 
> with synthetic workload, contention of MultiXactOffsetLock can be reduced by 
> increasing NUM_MXACTOFFSET_BUFFERS=8 to bigger numbers.
> 
> *** Question 1 ***
> Is it safe to increase number of buffers of MultiXact\All SLRUs, recompile 
> and run database as usual?
> I cannot experiment much with production. But I'm mostly sure that bigger 
> buffers will solve the problem.
> 
> *** Question 2 ***
> Probably, we could do GUCs for SLRU sizes? Are there any reasons not to do 
> them configurable? I think multis, clog, subtransactions and others will 
> benefit from bigger buffer. But, probably, too much of knobs can be confusing.
> 
> *** Question 3 ***
> MultiXact offset lock is always taken as exclusive lock. It turns MultiXact 
> Offset subsystem to single threaded. If someone have good idea how to make it 
> more concurrency-friendly, I'm willing to put some efforts into this.
> Probably, I could just add LWlocks for each offset buffer page. Is it 
> something worth doing? Or are there any hidden cavers and difficulties?

I've created benchmark[0] imitating MultiXact pressure on my laptop: 7 clients 
are concurrently running select "select * from table where primary_key = ANY 
($1) for share" where $1 is array of identifiers so that each tuple in a table 
is locked by different set of XIDs. During this benchmark I observe contention 
of MultiXactControlLock in pg_stat_activity

пятница,  8 мая 2020 г. 15:08:37 (every 1s)

  pid  | wait_event | wait_event_type | state  |
   query
---++-++
 41344 | ClientRead | Client  | idle   | insert into t1 
select generate_series(1,100,1)
 41375 | MultiXactOffsetControlLock | LWLock  | active | select * from 
t1 where i = ANY ($1) for share
 41377 | MultiXactOffsetControlLock | LWLock  | active | select * from 
t1 where i = ANY ($1) for share
 41378 || | active | select * from 
t1 where i = ANY ($1) for share
 41379 | MultiXactOffsetControlLock | LWLock  | active | select * from 
t1 where i = ANY ($1) for share
 41381 || | active | select * from 
t1 where i = ANY ($1) for share
 41383 | MultiXactOffsetControlLock | LWLock  | active | select * from 
t1 where i = ANY ($1) for share
 41385 | MultiXactOffsetControlLock | LWLock  | active | select * from 
t1 where i = ANY ($1) for share
(8 rows)

Finally, the benchmark is measuring time to execute select for update 42 times.

I've went ahead and created 3 patches:
1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
2. Reduce locking level to shared on read of MultiXactId members
3. Configurable cache size

I've found out that:
1. When MultiXact working set does not fit into buffers - benchmark results 
grow very high. Yet, very big buffers slow down benchmark too. For this 
benchmark optimal SLRU size id 32 pages for offsets and 64 pages for members 
(defaults are 8 and 16 respectively).
2. Lock optimisation increases performance by 5% on default SLRU sizes. 
Actually, benchmark does not explicitly read MultiXactId members, but when it 
replaces one with another - it have to read previous set. I understand that we 
can construct benchmark to demonstrate dominance of any algorithm and 5% of 
synthetic workload is not a very big number. But it just make sense to try to 
take shared lock for reading.
3. Manipulations with cache size do not affect benchmark anyhow. It's somewhat 
expected: benchmark is designed to defeat cache, either way OffsetControlLock 
would not be stressed.

For our workload, I think we will just increase numbers of SLRU sizes. But 
patchset may be useful for tuning and as a performance optimisation of 
MultiXact.

Also MultiXacts seems to be not very good fit into SLRU design. I think it 
would be better to use B-tree as a container. Or at least make MultiXact 
members extendable in-place (reserve some size when multixact is created).
When we want to extend number of locks for a tuple currently we will:
1. Iterate through all SLRU buffers for offsets to read

Re: Index Skip Scan

2020-05-11 Thread Dilip Kumar

On Sun, May 10, 2020 at 11:17 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Sat, Apr 11, 2020 at 03:17:25PM +0530, Dilip Kumar wrote:
> >
> > Some more comments...
>
> Thanks for reviewing. Since this patch took much longer than I expected,
> it's useful to have someone to look at it with a "fresh eyes".
>
> > + so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
> > + _bt_update_skip_scankeys(scan, indexRel);
> > +
> > ...
> > + /*
> > + * We haven't found scan key within the current page, so let's scan from
> > + * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
> > + * number
> > + */
> > + so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
> > + stack = _bt_search(scan->indexRelation, so->skipScanKey,
> > +, BT_READ, scan->xs_snapshot);
> >
> > Why do we need to set so->skipScanKey->nextkey =
> > ScanDirectionIsForward(dir); multiple times? I think it is fine to
> > just set it once?
>
> I believe it was necessary for previous implementations, but in the
> current version we can avoid this, you're right.
>
> > +static inline bool
> > +_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
> > + Buffer buf, ScanDirection dir)
> > +{
> > + OffsetNumber low, high;
> > + Page page = BufferGetPage(buf);
> > + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> > +
> > + low = P_FIRSTDATAKEY(opaque);
> > + high = PageGetMaxOffsetNumber(page);
> > +
> > + if (unlikely(high < low))
> > + return false;
> > +
> > + return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
> > + _bt_compare(scan->indexRelation, key, page, high) < 1);
> > +}
> >
> > I think the high key condition should be changed to
> > _bt_compare(scan->indexRelation, key, page, high) < 0 ?  Because if
> > prefix qual is equal to the high key then also there is no point in
> > searching on the current page so we can directly skip.
>
> From nbtree/README and comments to functions like _bt_split I've got an
> impression that the high key could be equal to the last item on the leaf
> page, so there is a point in searching. Is that incorrect?

But IIUC,  here we want to decide whether we will get the next key in
the current page or not? Is my understanding is correct?   So if our
key (the last tuple key) is equal to the high key means the max key on
this page is the same as what we already got in the last tuple so why
would we want to go on this page? because this will not give us the
new key.  So ideally, we should only be looking into this page if our
last tuple key is smaller than the high key.  Am I missing something?

>
> > + /* Check if an index skip scan is possible. */
> > + can_skip = enable_indexskipscan & index->amcanskip;
> > +
> > + /*
> > + * Skip scan is not supported when there are qual conditions, which are not
> > + * covered by index. The reason for that is that those conditions are
> > + * evaluated later, already after skipping was applied.
> > + *
> > + * TODO: This implementation is too restrictive, and doesn't allow e.g.
> > + * index expressions. For that we need to examine index_clauses too.
> > + */
> > + if (root->parse->jointree != NULL)
> > + {
> > + ListCell *lc;
> > +
> > + foreach(lc, (List *)root->parse->jointree->quals)
> > + {
> > + Node *expr, *qual = (Node *) lfirst(lc);
> > + Var *var;
> >
> > I think we can avoid checking for expression if can_skip is already false.
>
> Yes, that makes sense. I'll include your suggestions into the next
> rebased version I'm preparing.

Ok.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: 2020-05-14 Press Release Draft

2020-05-11 Thread David Rowley

On Mon, 11 May 2020 at 14:09, Jonathan S. Katz  wrote:
> Attached is a draft of the press release for the 2020-05-14 cumulative
> update. Please let me know your feedback by 2020-05-13 :)

Hi,

Thanks for drafting those up.

For:

* Several fixes for GENERATED columns, including an issue where it was possible
to crash or corrupt data in a table when the output of the generated column was
the exact copy of a physical column on the table.

I think it's important to include the "or if the expression called a
function which could, in certain cases, return its own input".

The reason I think that's important is because there's likely no
legitimate case for having the expression an exact copy of the column.

David

Re: gcov coverage data not full with immediate stop

2020-05-11 Thread Alexander Lakhin

Hello Alvaro,
11.05.2020 06:42, Alvaro Herrera wrote:
> (Strangely, I was just thinking about these branches of mine as I
> closed my week last Friday...)
>
> On 2020-May-10, Alexander Lakhin wrote:
>
>> So if we want to make the coverage reports more precise, I see the three
>> ways:
>> 1. Change the stop mode in teardown_node to fast (probably only when
>> configured with --enable-coverage);
>> 2. Explicitly stop nodes in TAP tests (where it's important) -- seems
>> too tedious and troublesome;
>> 3. Explicitly call __gcov_flush in SIGQUIT handler (quickdie)?
> I tried your idea 3 a long time ago and my experiments didn't show an
> increase in coverage [1].  But I like this idea the best, and maybe I
> did something wrong.  Attached is the patch I had (on top of
> fc115d0f9fc6), but I don't know if it still applies.
Thanks for the reference to that discussion and your patch.
As I see the issue with that patch is that quickdie() is not the only
SIGQUIT handler. When a backend is interrupted with SIGQUIT, it's
exiting in SignalHandlerForCrashExit().
In fact if I only add __gcov_flush() in SignalHandlerForCrashExit(), it
raises test coverage for `make check -C src/test/recovery/` from
106198 lines/6319 functions
to
106420 lines/6328 functions

It's not yet clear to me what happens when __gcov_flush() called inside
__gcov_flush().
The test coverage changes to:
108432 lines/5417 functions
(number of function calls decreased)
And for example in coverage/src/backend/utils/cache/catcache.c.gcov.html
I see
 147   8 : int2eqfast(Datum a, Datum b)
...
 153   0 : int2hashfast(Datum datum)
but without __gcov_flush in quickdie() we have:
 147   78038 : int2eqfast(Datum a, Datum b)
...
 153  255470 : int2hashfast(Datum datum)
So it needs more investigation.

But I can confirm that calling __gcov_flush() in
SignalHandlerForCrashExit() really improves a code coverage report.
I tried to develop a test to elevate a coverage for gist:
https://coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html
(Please look at the attached test if it could be interesting.)
and came to this issue with a coverage. I tried to play with
GCOV_PREFIX, but without luck.
Yesterday I found the more recent discussion:
https://www.postgresql.org/message-id/flat/44ecae53-9861-71b7-1d43-4658acc52519%402ndquadrant.com#d02e2e61212831fbceadf290637913a0
(where probably the same problem came out).

Finally I've managed to get an expected coverage when I performed
$node_standby->stop() (but __gcov_flush() in SignalHandlerForCrashExit()
helps too).

Best regards,
Alexander
diff --git a/src/test/recovery/t/021_indexes.pl b/src/test/recovery/t/021_indexes.pl
new file mode 100644
index 00..902133d55a
--- /dev/null
+++ b/src/test/recovery/t/021_indexes.pl
@@ -0,0 +1,102 @@
+# Test testing indexes replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# Add a table for gist index check
+$node_master->safe_psql('postgres',
+qq{
+create table gist_point_tbl(id int, p point);
+insert into gist_point_tbl(id, p)
+select g, point(g*10, g*10) from generate_series(1, 11000) g;
+create index gist_pointidx on gist_point_tbl using gist(p);});
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = get_new_node('standby');
+$node_standby->init_from_backup($node_master, $backup_name,
+has_streaming => 1);
+$node_standby->start;
+
+$node_master->safe_psql('postgres', qq{
+create temp table gist_point_tbl_t(id int, p point);
+create index gist_pointidx_t on gist_point_tbl_t using gist(p);
+insert into gist_point_tbl_t(id, p)
+select g, point(g*10, g*10) from generate_series(1, 1000) g;
+set enable_seqscan=off;
+set enable_bitmapscan=off;
+explain (costs off)
+select p from gist_point_tbl_t where p <@ box(point(0,0), point(100, 100));
+});
+
+$node_master->safe_psql('postgres', qq{
+create unlogged table gist_point_tbl_u(id int, p point);
+create index gist_pointidx_u on gist_point_tbl_u using gist(p);
+insert into gist_point_tbl_u(id, p)
+select g, point(g*10, g*10) from generate_series(1, 1000) g;
+set enable_seqscan=off;
+set enable_bitmapscan=off;
+explain (costs off)
+select p from gist_point_tbl_u where p <@ box(point(0,0), point(100, 100));
+});
+
+$node_master->safe_psql('postgres', qq{
+insert into gist_point_tbl (id, p)
+select g, point(g*10, g*10) from generate_series(1, 1000) g;});
+
+$node_master->safe_psql('postgres', "delete from gist_point_tbl where id < 500");
+
+$node_master->safe_psql('postgres', qq{
+create table test as (select x, box(point(x, x),point(x, x))
+   from generate_series(1,400) as x);
+
+create index test_idx on test using gist (box);
+
+set enable_seqscan TO false;
+set

A patch for get origin from commit_ts.

2020-05-11 Thread movead...@highgo.ca

Hello hackers,

I am researching about 'origin' in PostgreSQL, mainly it used in logical
decoding to filter transaction from non-local source. I notice that the
'origin' is stored in commit_ts so that I think we are possible to get 'origin'
of a transaction from commit_ts.

But I can not fond any code to get 'origin' from commit_ts, just like it is
producing data which nobody cares about. Can I know what's the purpose
of the 'origin' in commit_ts? Do you think we should add some support
to the careless data?

For example, I add a function to get 'origin' from commit_ts:
===
postgres=# select pg_xact_commit_origin('490');
 pg_xact_commit_origin 
---
 test_origin
(1 row)

postgres=# select pg_xact_commit_origin('491');
 pg_xact_commit_origin 
---
 test_origin1
(1 row)

postgres=#
===




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


get_origin_from_commit_ts.patch
Description: Binary data

Re: 2pc leaks fds

2020-05-11 Thread Antonin Houska

Alvaro Herrera  wrote:

> On 2020-May-08, Kyotaro Horiguchi wrote:
> 
> > I agree to the direction of this patch. Thanks for the explanation.
> > The patch looks good to me except the two points below.
> 
> Thanks!  I pushed the patch.

I tried to follow the discussion but haven't had better idea. This looks
better than the previous version. Thanks!

While looking at the changes, I've noticed a small comment issue in the
XLogReaderRoutine structure definition:

* "tli_p" is an input/output argument. XLogRead() uses it to pass the

The XLogRead() function has been renamed to WALRead() in 0dc8ead4. (This
incorrect comment was actually introduced by that commit.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: pg_regress cleans up tablespace twice.

2020-05-11 Thread Kyotaro Horiguchi

At Fri, 21 Feb 2020 17:05:07 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 20 Feb 2020 14:23:22 +0900, Michael Paquier  
> wrote in 
> > Removing this code from pg_regress.c makes also sense to me.  Now, the
> > patch breaks "vcregress installcheck" as this is missing to patch
> > installcheck_internal() for the tablespace path creation.  I would
> > also recommend using a full path for the directory location to avoid
> > any potential issues if this code is refactored or moved around, the
> > patch now relying on the current path used.
> 
> Hmm. Right. I confused database directory and tablespace
> directory. Tablespace directory should be provided by the test script,
> even though the database directory is preexisting in installcheck. To
> avoid useless future failure, I would do that that for all
> subcommands, as regress/GNUmakefile does.

Tablespace directory cleanup is not done for all testing
targets. Actually it is not done for the tools under bin/ except
pg_upgrade.

On the other hand, it was done every by pg_regress run for Windows
build.  So I made vcregress.pl do the same with that. Spcecifically to
set up tablespace always before pg_regress is executed.

There is a place where --outputdir is specified for pg_regress,
pg_upgrade/test.sh. It is explained as the follows.

# Send installcheck outputs to a private directory.  This avoids conflict when
# check-world runs pg_upgrade check concurrently with src/test/regress check.
# To retrieve interesting files after a run, use pattern tmp_check/*/*.diffs.
outputdir="$temp_root/regress"
EXTRA_REGRESS_OPTS="$EXTRA_REGRESS_OPTS --outputdir=$outputdir"

Where the $temp_root is $(TOP)/src/bin/pg_upgrade/tmp_check/regress.

Thus the current regress/GNUMakefile does break this consideration and
the current vc_regress (of Windows build) does the right thing in the
light of the comment.  Don't we need to avoid cleaning up
"$(TOP)/src/test/regress/tablesapce" in that case? (the second patch
attached)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From fb351e0b075fbb2e6398aa0991b5a824e3c95e5a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 30 Apr 2020 14:06:51 +0900
Subject: [PATCH 1/2] Move tablespace cleanup out of pg_regress.

Tablespace directory is cleaned-up in regress/GNUmakefile for all
platforms other than Windows. For Windoiws pg_regress does that on
behalf of make, which is ugly. In addition to that, pg_regress does
the clean up twice at once. This patch moves the cleanup code out of
pg_regress into vcregress.pl, which is more sutable place.
---
 src/test/regress/pg_regress.c | 22 --
 src/tools/msvc/vcregress.pl   | 14 ++
 2 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 38b2b1e8e1..c56bfaf7f5 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -491,28 +491,6 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 
 	snprintf(testtablespace, MAXPGPATH, "%s/testtablespace", outputdir);
 
-#ifdef WIN32
-
-	/*
-	 * On Windows only, clean out the test tablespace dir, or create it if it
-	 * doesn't exist.  On other platforms we expect the Makefile to take care
-	 * of that.  (We don't migrate that functionality in here because it'd be
-	 * harder to cope with platform-specific issues such as SELinux.)
-	 *
-	 * XXX it would be better if pg_regress.c had nothing at all to do with
-	 * testtablespace, and this were handled by a .BAT file or similar on
-	 * Windows.  See pgsql-hackers discussion of 2008-01-18.
-	 */
-	if (directory_exists(testtablespace))
-		if (!rmtree(testtablespace, true))
-		{
-			fprintf(stderr, _("\n%s: could not remove test tablespace \"%s\"\n"),
-	progname, testtablespace);
-			exit(2);
-		}
-	make_directory(testtablespace);
-#endif
-
 	/* finally loop on each file and do the replacement */
 	for (name = names; *name; name++)
 	{
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index f95f7a5c7a..74d37a31de 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -104,6 +104,7 @@ exit 0;
 sub installcheck_internal
 {
 	my ($schedule, @EXTRA_REGRESS_OPTS) = @_;
+
 	my @args = (
 		"../../../$Config/pg_regress/pg_regress",
 		"--dlpath=.",
@@ -114,6 +115,7 @@ sub installcheck_internal
 		"--no-locale");
 	push(@args, $maxconn) if $maxconn;
 	push(@args, @EXTRA_REGRESS_OPTS);
+	CleanupTablespaceDirectory();
 	system(@args);
 	my $status = $? >> 8;
 	exit $status if $status;
@@ -143,6 +145,7 @@ sub check
 		"--temp-instance=./tmp_check");
 	push(@args, $maxconn) if $maxconn;
 	push(@args, $temp_config) if $temp_config;
+	CleanupTablespaceDirectory();
 	system(@args);
 	my $status = $? >> 8;
 	exit $status if $status;
@@ -178,6 +181,7 @@ sub ecpgcheck
 sub isolationcheck
 {
 	chdir "../isolation";
+	CleanupTablespaceDirectory();

Re: Problem with logical replication

2020-05-11 Thread Michael Paquier

On Sun, May 10, 2020 at 07:08:03PM -0300, Euler Taveira wrote:
> I attached a patch with the described solution. I also included a test that
> covers this scenario.

-   Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel));
+   Assert(GetRelationIdentityOrPK(rel) == RelationGetRelid(idxrel));

Not much a fan of adding a routine to relcache.c to do the work of two
routines already present, so I think that we had better add an extra
condition based on RelationGetPrimaryKeyIndex, and give up on
GetRelationIdentityOrPK() in execReplication.c.  Wouldn't it also be
better to cross-check the replica identity here depending on if
RelationGetReplicaIndex() returns an invalid OID or not?
--
Michael


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 105 matches

Mail list logo