Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-02-28 Thread Andres Freund
Hi,

On 2020-02-28 21:24:59 -0800, Andres Freund wrote:
> Turns out that I am to blame for that. All the way back in 9.4. For
> logical decoding I needed to make ScanPgRelation() use a specific type
> of snapshot during one corner case of logical decoding. For reasons lost
> to time, I didn't continue to pass NULL to systable_beginscan() in the
> plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
> Note the missing RegisterSnapshot()...

Oh, I pierced through the veil: It's likely because the removal of
SnapshotNow happened concurrently to the development of logical
decoding. Before using proper snapshot for catalog scans passing in
SnapshotNow was precisely the right thing to do...

I think that's somewhat unlikely to actively cause problems in practice,
as ScanPgRelation() requires that we already have a lock on the
relation, we only look for a single row, and I don't think we rely on
the result's tid to be correct. I don't immediately see a case where
this would trigger in a problematic way.


After fixing the ScanPgRelation case I found another occurance of the
problem:
The catalogsnapshot copied at (ignore the slight off line numbers please):

#0  GetCatalogSnapshot (relid=1249) at 
/home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:454
#1  0x560d725b198d in systable_beginscan (heapRelation=0x7f13429f2a08, 
indexId=2659, indexOK=true, snapshot=0x0, nkeys=2, key=0x7fff26e04db0)
at /home/andres/src/postgresql/src/backend/access/index/genam.c:378
#2  0x560d72b4117f in SearchCatCacheMiss (cache=0x560d74590800, nkeys=2, 
hashValue=1029784422, hashIndex=102, v1=697088, v2=3, v3=0, v4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1359
#3  0x560d72b41045 in SearchCatCacheInternal (cache=0x560d74590800, 
nkeys=2, v1=697088, v2=3, v3=0, v4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1299
#4  0x560d72b40d09 in SearchCatCache (cache=0x560d74590800, v1=697088, 
v2=3, v3=0, v4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1153
#5  0x560d72b5b65f in SearchSysCache (cacheId=7, key1=697088, key2=3, 
key3=0, key4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1112
#6  0x560d72b5b9dd in SearchSysCacheCopy (cacheId=7, key1=697088, key2=3, 
key3=0, key4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1187
#7  0x560d72645501 in RemoveAttrDefaultById (attrdefId=697096) at 
/home/andres/src/postgresql/src/backend/catalog/heap.c:1821
#8  0x560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at 
/home/andres/src/postgresql/src/backend/catalog/dependency.c:1397
#9  0x560d7263fa17 in deleteOneObject (object=0x560d745d37a0, 
depRel=0x7fff26e052d0, flags=29)
at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1261
#10 0x560d7263e4d6 in deleteObjectsInList (targetObjects=0x560d745d1ec0, 
depRel=0x7fff26e052d0, flags=29)
at /home/andres/src/postgresql/src/backend/catalog/dependency.c:271
#11 0x560d7263e58a in performDeletion (object=0x7fff26e05304, 
behavior=DROP_CASCADE, flags=29)
at /home/andres/src/postgresql/src/backend/catalog/dependency.c:356
#12 0x560d72655f3f in RemoveTempRelations (tempNamespaceId=686167) at 
/home/andres/src/postgresql/src/backend/catalog/namespace.c:4155
#13 0x560d72655f72 in RemoveTempRelationsCallback (code=0, arg=0) at 
/home/andres/src/postgresql/src/backend/catalog/namespace.c:4174
#14 0x560d729a58e2 in shmem_exit (code=0) at 
/home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:239
#15 0x560d729a57b5 in proc_exit_prepare (code=0) at 
/home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:194

is released at the end of systable_endscan. And then xmin is reset at:

#0  SnapshotResetXmin () at 
/home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:1038
#1  0x560d72bb9bfc in InvalidateCatalogSnapshot () at 
/home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:521
#2  0x560d72b43d62 in LocalExecuteInvalidationMessage (msg=0x7fff26e04e70) 
at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:562
#3  0x560d729b277b in ReceiveSharedInvalidMessages 
(invalFunction=0x560d72b43d26 , 
resetFunction=0x560d72b43f92 ) at 
/home/andres/src/postgresql/src/backend/storage/ipc/sinval.c:120
#4  0x560d72b44070 in AcceptInvalidationMessages () at 
/home/andres/src/postgresql/src/backend/utils/cache/inval.c:683
#5  0x560d729b8f4f in LockRelationOid (relid=2658, lockmode=3) at 
/home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:136
#6  0x560d725341e3 in relation_open (relationId=2658, lockmode=3) at 
/home/andres/src/postgresql/src/backend/access/common/relation.c:56
#7  0x560d725b22c3 in index_open (relationId=2658, lockmode=3) at 
/home/andres/src/postgresql/src/backend/access/index/indexam.c:130
#8  0x560d727b6991 in ExecOpenIndices (resultRelInfo=0x560d7468ffa0, 
speculative=false)
   

proposal \gcsv

2020-02-28 Thread Pavel Stehule
Hi

I would to enhance \g command about variant \gcsv

proposed command has same behave like \g, only the result will be every
time in csv format.

It can helps with writing psql macros wrapping \g command.

Options, notes?

Regards

Pavel


Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-02-28 Thread Andres Freund
Hi,

While self reviewing a patch I'm about to send I changed the assertion
in index_getnext_tid from
   Assert(TransactionIdIsValid(RecentGlobalXmin))
to instead test (via an indirection)
   Assert(TransactionIdIsValid(MyProc->xmin))

Without ->xmin being set, it's not safe to do scans. And
checking RecentGlobalXmin doesn't really do much.

But, uh. That doesn't get very far through the regression tests.

Turns out that I am to blame for that. All the way back in 9.4. For
logical decoding I needed to make ScanPgRelation() use a specific type
of snapshot during one corner case of logical decoding. For reasons lost
to time, I didn't continue to pass NULL to systable_beginscan() in the
plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
Note the missing RegisterSnapshot()...

That's bad because:

a) If invalidation processing triggers a InvalidateCatalogSnapshot(),
   the contained SnapshotResetXmin() may find no other snapshot, and
   reset ->xmin. Which then may cause relevant row versions to be
   removed.
b) If there's a subsequent GetCatalogSnapshot() during invalidation
   processing, that will GetSnapshotData() into the snapshot currently
   being used.

The fix itself is trivial, just pass NULL for the normal case, rather
than doing GetCatalogSnapshot().


But I think this points to some severe holes in relevant assertions /
infrastructure:

1) Asserting that RecentGlobalXmin is set - like many routines do -
   isn't meaningful, because it stays set even if SnapshotResetXmin()
   releases the transaction's snapshot. These are fairly old assertions
   (d53a56687f3d). As far as I can tell these routines really should
   verify that a snapshot is set.

2) I think we need to reset TransactionXmin, RecentXmin whenever
   SnapshotResetXmin() clears xmin. While we'll set them again the next
   time a snapshot is acquired, the fact that they stay set seems likely
   to hide bugs.

   We also could remove TransactionXmin and instead use the
   pgproc/pgxact's ->xmin. I don't really see the point of having it?

3) Similarly, I think we ought to reset reset RecentGlobal[Data]Xmin at
   the end of the transaction or such.

   But I'm not clear what protects those values from being affected by
   wraparound in a longrunning transaction? Initially they are obviously
   protected against that due to the procarray - but once the oldest
   procarray entry releases its snapshot, the global xmin horizon can
   advance. That allows transactions that up to ~2 billion into the
   future of the current backend, whereas RecentGlobalXmin might be
   nearly ~2 billion transactions in the past relative to ->xmin.

   That might not have been a likely problem many years ago, but seems
   far from impossible today?


I propose to commit a fix, but then also add an assertion for
TransactionIdIsValid(MyPgXact->xmin) instead (or in addition) to the
TransactionIdIsValid(RecentGlobalXmin) tests right now. And in master
clear the various *Xmin variables whenever we reset xmin.

I think in master we should also start to make RecentGlobalXmin etc
FullTransactionIds. We can then convert the 32bit xids we compare with
RecentGlobal* to 64bit xids (which is safe, because live xids have to be
within [oldestXid, nextXid)).  I have that as part of another patch
anyway...


Greetings,

Andres Freund




Re: Making psql error out on output failures

2020-02-28 Thread David Zhang

Hi Alvaro,

Thanks for your review, now the new patch with the error message in PG 
style is attached.


On 2020-02-28 8:03 a.m., Alvaro Herrera wrote:

On 2020-Feb-18, David Zhang wrote:


3. I think a better way to resolve this issue will still be the solution
with an extra %m, which can make the error message much more informative to
the end users.

Yes, agreed.  However, we use a style like this:

pg_log_error("could not print tuples: %m");


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 4b2679360f..9ffb1ef10f 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -855,6 +855,7 @@ static bool
 PrintQueryTuples(const PGresult *results)
 {
printQueryOpt my_popt = pset.popt;
+   bool result = true;
 
/* one-shot expanded output requested via \gx */
if (pset.g_expanded)
@@ -872,6 +873,11 @@ PrintQueryTuples(const PGresult *results)
disable_sigpipe_trap();
 
printQuery(results, &my_popt, fout, false, pset.logfile);
+   if (ferror(fout))
+   {
+   pg_log_error("could not print tuples: %m");
+   result = false;
+   }
 
if (is_pipe)
{
@@ -882,9 +888,16 @@ PrintQueryTuples(const PGresult *results)
fclose(fout);
}
else
+   {
printQuery(results, &my_popt, pset.queryFout, false, 
pset.logfile);
+   if (ferror(pset.queryFout))
+   {
+   pg_log_error("could not print tuples: %m");
+   result = false;
+   }
+   }
 
-   return true;
+   return result;
 }
 
 


Re: Some problems of recovery conflict wait events

2020-02-28 Thread Masahiko Sawada
On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada
 wrote:
>
> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
>  wrote:
> >
> > Hi all,
> >
> > When recovery conflicts happen on the streaming replication standby,
> > the wait event of startup process is null when
> > max_standby_streaming_delay = 0 (to be exact, when the limit time
> > calculated by max_standby_streaming_delay is behind the last WAL data
> > receipt time is behind). Moreover the process title of waiting startup
> > process looks odd in the case of lock conflicts.
> >
> > 1. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a lock,
> >
> > * wait event
> >  backend_type | wait_event_type | wait_event
> > --+-+
> >  startup  | Lock| relation
> > (1 row)
> >
> > * ps
> > 42513   ??  Ss 0:00.05 postgres: startup   recovering
> > 00010003 waiting
> >
> > Looks good.
> >
> > 2. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a snapshot,
> >
> > * wait event
> >  backend_type | wait_event_type | wait_event
> > --+-+
> >  startup  | |
> > (1 row)
> >
> > * ps
> > 44299   ??  Ss 0:00.05 postgres: startup   recovering
> > 00010003 waiting
> >
> > wait_event_type and wait_event are null in spite of waiting for
> > conflict resolution.
> >
> > 3. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a lock,
> >
> > * wait event
> >  backend_type | wait_event_type | wait_event
> > --+-+
> >  startup  | |
> > (1 row)
> >
> > * ps
> > 46510   ??  Ss 0:00.05 postgres: startup   recovering
> > 00010003 waiting waiting
> >
> > wait_event_type and wait_event are null and the process title is
> > wrong; "waiting" appears twice.
> >
> > The cause of the first problem, wait_event_type and wait_event are not
> > set, is that WaitExceedsMaxStandbyDelay which is called by
> > ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
> > using pg_usleep rather than WaitLatch. I think we can change it so
> > that it uses WaitLatch and those caller passes wait event information.
> >
> > For the second problem, wrong process title, the cause is also
> > relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
> > conflicts we add "waiting" to the process title in WaitOnLock but we
> > add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
> > have WaitOnLock not set process title in recovery case.
> >
> > This problem exists on 12, 11 and 10. I'll submit the patch.
> >
>
> I've attached patches that fix the above two issues.
>
> 0001 patch fixes the first problem. Currently there are 5 types of
> recovery conflict resolution: snapshot, tablespace, lock, database and
> buffer pin, and we set wait events to only 2 events out of 5: lock
> (only when doing ProcWaitForSignal) and buffer pin. Therefore, users
> cannot know that the startup process is waiting or not, and what
> waiting for. This patch sets wait events to more 3 events: snapshot,
> tablespace and lock. For wait events of those 3 events, I thought that
> we can create a new more appropriate wait event type, say
> RecoveryConflict, and set it for them. However, considering
> back-patching to existing versions, adding new wait event type would
> not be acceptable. So this patch sets existing wait events such as
> PG_WAIT_LOCK to those 3 places and doesn't not set a wait event for
> conflict resolution on dropping database because there is not an
> appropriate existing one. I'll start a separate thread about
> improvement on wait events of recovery conflict resolution for PG13 if
> necessary.

Attached a patch improves wait events of recovery conflict resolution.
It's for PG13. I added new RecoveryConflict wait_event_type and some
wait_event. This patch can be applied on top of two patches I already
proposed.

Regards,

[1] 
https://www.postgresql.org/message-id/CA%2Bfd4k63ukOtdNx2f-fUZ2vuB3RgE%3DPo%2BxSnpmcPJbKqsJMtiA%40mail.gmail.com

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0003-Improve-wait-events-of-recovery-conflict-resolution.patch
Description: Binary data


Re: ALTER tbl rewrite loses CLUSTER ON index

2020-02-28 Thread Justin Pryzby
On Fri, Feb 28, 2020 at 06:26:04PM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> > I think the attached is 80% complete (I didn't touch pg_dump).
> > One objection to this change would be that all relations (including indices)
> > end up with relclustered fields, and pg_index already has a number of 
> > bools, so
> > it's not like this one bool is wasting a byte.
> > I think relisclustered was a's clever way of avoiding that overhead 
> > (c0ad5953).
> > So I would be -0.5 on moving it to pg_class..
> > But I think 0001 and 0002 are worthy.  Maybe the test in 0002 should live
> > somewhere else.
> 
> 0001 has been superseded by events (faade5d4c), so the cfbot is choking
> on that one's failure to apply, and not testing any further.  Please
> repost without 0001 so that we can get this testing again.

I've just noticed while working on (1) that this separately affects REINDEX
CONCURRENTLY, which would be a new bug in v12.  Without CONCURRENTLY there's no
issue.  I guess we need a separate patch for that case.

(1) https://commitfest.postgresql.org/27/2269/

The ALTER bug goes back further and its fix should be a kept separate.

postgres=# DROP TABLE tt; CREATE TABLE tt(i int unique); CLUSTER tt USING 
tt_i_key; CLUSTER tt; REINDEX INDEX tt_i_key; CLUSTER tt;
DROP TABLE
CREATE TABLE
CLUSTER
CLUSTER
REINDEX
CLUSTER

postgres=# DROP TABLE tt; CREATE TABLE tt(i int unique); CLUSTER tt USING 
tt_i_key; CLUSTER tt; REINDEX INDEX CONCURRENTLY tt_i_key; CLUSTER tt;
DROP TABLE
CREATE TABLE
CLUSTER
CLUSTER
REINDEX
ERROR:  there is no previously clustered index for table "tt"

-- 
Justin




Re: [PATCH] Opclass parameters

2020-02-28 Thread Nikita Glukhov

Attached new version of the patches.


On 12.09.2019 3:16, Tomas Vondra wrote:


On Wed, Sep 11, 2019 at 01:44:28AM +0300, Nikita Glukhov wrote:

On 11.09.2019 1:03, Tomas Vondra wrote:


On Tue, Sep 10, 2019 at 04:30:41AM +0300, Nikita Glukhov wrote:


2. New AM method amattoptions().

  amattoptions() is used to specify per-column AM-specific options.
  The example is signature length for bloom indexes (patch #3).


I'm somewhat confused how am I supposed to use this, considering the 
patch
set only defines this for the contrib/bloom index AM. So let's say I 
want
to create a custom BRIN opclass with per-attribute options (like the 
two
BRIN opclasses I work on in the other thread). Clearly, I can't 
tweak the
IndexAmRoutine from the extension. ISTM the patch series should 
modify all
existing index AMs to have a valid amattoptions() implementation, 
calling

the new amproc if defined.

Or what is the correct way to define custom opclass for existing 
index AM

(e.g. BRIN) with attribute options?

Per-attribute opclass options are implemented independently from 
per-attribute
AM options.  amattoptions() is optional and needs to be defined only 
if AM has

per-attribute options.


OK, thanks for the explanation - so the per-attribute opclass options 
will

work even when the AM does not have amattoptions() defined. Is there any
practical reason why not to just define everything as opclass options and
get rid of the amattoptions() entirely?
The reason is that it would be no need to duplicate AM-specific 
per-attribute
options in each opclass. For example, contrib/bloom AM has per-column 
options

(signature length), but its opclasses have no options now.


amproc #0 is called regardless of whether amattoptions
is defined or not.  That was the main reason why uniform procnum 0 was
picked.



I still think using procnum 0 and passing the data through fn_expr are 
not
the right solution. Firstly, traditionally the amprocs are either 
required
or optional, with required procs having low procnums and optional 
starting

at 11 or so. The 0 breaks this, because it's optional but it contradicts
the procnum rule. Also, what happens if we need to add another optional
amproc defined for all AMs? Surely we won't use -1.

IMHO we should keep AM-specific procnum and pass it somehow to the AM
machinery.


As you suggested, I introduced AM-specific procnum
IndexAmRoutine.attoptsprocnum, that has different values for each AM.
This was not a problem, because there are only a few AMs now.



FWIW there seems to be a bug in identify_opfamily_groups() which does
this:
   /* Ignore strategy numbers outside supported range */
   if (oprform->amopstrategy > 0 && oprform->amopstrategy < 64)
   thisgroup->operatorset |= ((uint64) 1) << oprform->amopstrategy;

but then identify_opfamily_groups() computes allfuncs without any such
restriction, i.e. it includes procnum 0. And then it fails on this check

   if (thisgroup->functionset != allfuncs) {...}

None of the built-in brin opclasses defines the new amproc, so the code
does not hit this issue. I only noticed this with the opclasses added in
the other thread.


This really seems to be bug in previous patches, but now procnum can't be 0.



As for the fn_expr, I still think this seems like a misuse of a field
which was intended for something else. I wonder if it might be breaking
some exising code - either in code or in some extension. It seems quite
possible.

It just seems like we're inventing a new way to novel way to pass data
into a function while we already have parameters for that purpose. Adding
parameters may require care so as not to break existing opclasses, but it
seems like the right approach.


fn_expr is used to pass metainformation about user function calls. It is 
not

used In AM method calls, so it seems safe to reuse it.  Opclass parameters
can be considered here as some kind metainfo about AM calls.

Of course, we could add special parameter to each support function of 
each AM.
But it looks too complicated, and every external AM needs to do that to 
enable

opclass parameters. So I think it is desirable to implement a more general
solution.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Introduce-opclass-parameters-20200229.patch.gz
Description: application/gzip


0002-Introduce-amattoptions-20200229.patch.gz
Description: application/gzip


0003-Use-amattoptions-in-contrib_bloom-20200229.patch.gz
Description: application/gzip


0004-Use-opclass-parameters-in-GiST-tsvector_ops-20200229.patch.gz
Description: application/gzip


0005-Use-opclass-parameters-in-contrib_intarray-20200229.patch.gz
Description: application/gzip


0006-Use-opclass-parameters-in-contrib_ltree-20200229.patch.gz
Description: application/gzip


0007-Use-opclass-parameters-in-contrib_pg_trgm-20200229.patch.gz
Description: application/gzip


0008-Use-opclass-parameters-in-contrib_hstore-20200229.patch.gz
Description: application/gz

Re: Portal->commandTag as an enum

2020-02-28 Thread Tom Lane
Mark Dilger  writes:
>> On Feb 28, 2020, at 3:05 PM, Tom Lane  wrote:
>> Is there a way to drop that logic altogether by making the tagname string
>> be "INSERT 0" for the INSERT case?  Or would the zero bleed into other
>> places where we don't want it?

> In general, I don't think we want to increase the number of distinct
> tags.  Which command you finished running and whether you want a
> rowcount and/or lastoid are orthogonal issues.

Well, my thought is that last_oid is gone and it isn't ever coming back.
So the less code we use supporting a dead feature, the better.

If we can't remove the special case in EndCommand() altogether, I'd be
inclined to hard-code it as "if (tag == CMDTAG_INSERT ..." rather than
expend infrastructure on treating last_oid as a live option for commands
to have.

regards, tom lane




Re: Allowing ALTER TYPE to change storage strategy

2020-02-28 Thread Tom Lane
Tomas Vondra  writes:
> I think we might check if there are any attributes with the given data
> type, and allow the change if there are none. That would still allow the
> change when the type is used only for things like function parameters
> etc. But we'd also have to check for domains (recursively).

Still has race conditions.

> One thing I haven't mentioned in the original message is CASCADE. It
> seems useful to optionally change storage for all attributes with the
> given data type. But I'm not sure it's actually a good idea, and the
> amount of code seems non-trivial (it'd have to copy quite a bit of code
> from ALTER TABLE).

You'd need a moderately strong lock on each such table, which means
there'd be serious deadlock hazards.  I'm dubious that it's worth
troubling with.

regards, tom lane




Re: Allowing ALTER TYPE to change storage strategy

2020-02-28 Thread Tomas Vondra

On Fri, Feb 28, 2020 at 01:59:49PM -0500, Tom Lane wrote:

Tomas Vondra  writes:

My understanding is that pg_type.typstorage essentially specifies two
things: (a) default storage strategy for the attributes with this type,
and (b) whether the type implementation is prepared to handle TOAST-ed
values or not. And pg_attribute.attstorage has to respect this - when
the type says PLAIN then the attribute can't simply use strategy that
would enable TOAST.


Check.


Luckily, this is only a problem when switching typstorage to PLAIN (i.e.
when disabling TOAST for the type). The attached v1 patch checks if
there are attributes with non-PLAIN storage for this type, and errors
out if it finds one. But unfortunately that's not entirely correct,
because ALTER TABLE only sets storage for new data. A table may be
created with e.g. EXTENDED storage for an attribute, a bunch of rows may
be loaded and then the storage for the attribute may be changed to
PLAIN. That would pass the check as it's currently in the patch, yet
there may be TOAST-ed values for the type with PLAIN storage :-(



I'm not entirely sure what to do about this, but I think it's OK to just
reject changes in this direction (from non-PLAIN to PLAIN storage).


Yeah, I think you should just reject that: once toast-capable, always
toast-capable.  Scanning pg_attribute is entirely insufficient because
of race conditions --- and while we accept such races in some other
places, this seems like a place where the risk is too high and the
value too low.

Anybody who really needs to go in that direction still has the alternative
of manually frobbing the catalogs (and taking the responsibility for
races and un-toasting whatever's stored already).



Yeah. Attached is v2 of the patch, simply rejecting such changes.

I think we might check if there are any attributes with the given data
type, and allow the change if there are none. That would still allow the
change when the type is used only for things like function parameters
etc. But we'd also have to check for domains (recursively).

One thing I haven't mentioned in the original message is CASCADE. It
seems useful to optionally change storage for all attributes with the
given data type. But I'm not sure it's actually a good idea, and the
amount of code seems non-trivial (it'd have to copy quite a bit of code
from ALTER TABLE). I'm also not sure what to do about domains and
attributes using those. It's more time/code than what I'm willing spend
now, so I'll laeve this as a possible future improvement.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f30268b1981fc53474786085842f1c883a1d7491 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 29 Feb 2020 01:04:58 +0100
Subject: [PATCH] ALTER TYPE ... SET STORAGE

---
 doc/src/sgml/ref/alter_type.sgml |  33 ++
 src/backend/commands/alter.c |  45 
 src/backend/commands/typecmds.c  | 148 +++
 src/backend/parser/gram.y|  28 -
 src/backend/tcop/utility.c   |  26 +
 src/bin/psql/tab-complete.c  |   2 +-
 src/include/commands/alter.h |   1 +
 src/include/commands/typecmds.h  |   2 +
 src/include/nodes/nodes.h|   1 +
 src/include/nodes/parsenodes.h   |  12 +++
 src/test/regress/expected/domain.out |  17 +++
 src/test/regress/sql/domain.sql  |  19 
 12 files changed, 332 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 67be1dd568..03a6a59468 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -30,6 +30,7 @@ ALTER TYPE name 
RENAME TO name SET SCHEMA 
new_schema
 ALTER TYPE name ADD VALUE [ IF 
NOT EXISTS ] new_enum_value [ { 
BEFORE | AFTER } neighbor_enum_value ]
 ALTER TYPE name RENAME VALUE 
existing_enum_value TO 
new_enum_value
+ALTER TYPE name SET STORAGE { 
PLAIN | EXTERNAL | EXTENDED | MAIN }
 
 where action is one 
of:
 
@@ -97,6 +98,38 @@ ALTER TYPE name 
RENAME VALUE 

 
+   
+
+ SET STORAGE
+ 
+  TOAST
+  per-type storage settings
+ 
+
+
+ 
+  This form sets the storage mode for a data type, controlling whether the
+  values are held inline or in a secondary TOAST table,
+  and whether the data should be compressed or not.
+  PLAIN must be used for fixed-length values such as
+  integer and is inline, uncompressed. MAIN
+  is for inline, compressible data. EXTERNAL is for
+  external, uncompressed data, and EXTENDED is for
+  external, compressed data.  EXTENDED is the default
+  for most data types that support non-PLAIN storage.
+  Use of EXTERNAL will make substring operations on
+  very large text and bytea values run faster,
+  at the penalty of increased storage space.  Note that
+  SET STORAGE doesn't itself change anything in the
+  tab

Re: Portal->commandTag as an enum

2020-02-28 Thread Mark Dilger



> On Feb 28, 2020, at 3:05 PM, Tom Lane  wrote:
> 
> Alvaro Herrera  writes:
>> I just realized that we could rename command_tag_display_last_oid() to
>> something like command_tag_print_a_useless_zero_for_historical_reasons() 
>> and nothing would be lost.
> 
> Is there a way to drop that logic altogether by making the tagname string
> be "INSERT 0" for the INSERT case?  Or would the zero bleed into other
> places where we don't want it?

In general, I don't think we want to increase the number of distinct tags.  
Which command you finished running and whether you want a rowcount and/or 
lastoid are orthogonal issues.  We already have problems with there being 
different commandtags for different versions of morally the same commands.  
Take for example "SELECT FOR KEY SHARE" vs.  "SELECT FOR NO KEY UPDATE" vs.  
"SELECT FOR SHARE" vs.  "SELECT FOR UPDATE".

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-02-28 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Feb 28, 2020 at 01:45:29PM -0500, Tom Lane wrote:
>> After poking around, I see there aren't any other callers.  But I think
>> that the cause of this bug is clearly failure to think carefully about
>> the different cases that isTempNamespaceInUse is recognizing, so that
>> the right way to fix it is more like the attached.

> Good idea, thanks.  Your suggestion looks good to me.

Will push that, thanks for looking.

>> In the back branches, we could leave isTempNamespaceInUse() in place
>> but unused, just in case somebody is calling it.  I kind of doubt that
>> anyone is, given the small usage in core, but maybe.

> I doubt that there are any external callers, but I'd rather leave the
> past API in place on back-branches.

Agreed.

regards, tom lane




Re: BUG #15858: could not stat file - over 4GB

2020-02-28 Thread Tom Lane
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?=  
writes:
> The latest version of this patch could benefit from an update. Please find
> attached a new version.

The cfbot thinks this doesn't compile on Windows [1].  Looks like perhaps
a missing-#include problem?

regards, tom lane

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81541




Re: Binary support for pgoutput plugin

2020-02-28 Thread Tom Lane
Dave Cramer  writes:
> Rebased against head

The cfbot's failing to apply this [1].  It looks like the reason is only
that you included a catversion bump in what you submitted.  Protocol is to
*not* do that in submitted patches, but rely on the committer to add it at
the last minute --- otherwise you'll waste a lot of time rebasing the
patch, which is what it needs now.

regards, tom lane

[1] http://cfbot.cputube.org/patch_27_2152.log




Re: ALTER tbl rewrite loses CLUSTER ON index (consider moving indisclustered to pg_class)

2020-02-28 Thread Tom Lane
Justin Pryzby  writes:
> I think the attached is 80% complete (I didn't touch pg_dump).
> One objection to this change would be that all relations (including indices)
> end up with relclustered fields, and pg_index already has a number of bools, 
> so
> it's not like this one bool is wasting a byte.
> I think relisclustered was a's clever way of avoiding that overhead 
> (c0ad5953).
> So I would be -0.5 on moving it to pg_class..
> But I think 0001 and 0002 are worthy.  Maybe the test in 0002 should live
> somewhere else.

0001 has been superseded by events (faade5d4c), so the cfbot is choking
on that one's failure to apply, and not testing any further.  Please
repost without 0001 so that we can get this testing again.

regards, tom lane




Re: Portal->commandTag as an enum

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-28, Tom Lane wrote:

> Alvaro Herrera  writes:
> > I just realized that we could rename command_tag_display_last_oid() to
> > something like command_tag_print_a_useless_zero_for_historical_reasons() 
> > and nothing would be lost.
> 
> Is there a way to drop that logic altogether by making the tagname string
> be "INSERT 0" for the INSERT case?  Or would the zero bleed into other
> places where we don't want it?

Hmm, interesting thought. But yeah, it would show up in ps display:

commandTag = CreateCommandTag(parsetree->stmt);

set_ps_display(GetCommandTagName(commandTag), false);

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-02-28 Thread Michael Paquier
On Fri, Feb 28, 2020 at 01:45:29PM -0500, Tom Lane wrote:
> After poking around, I see there aren't any other callers.  But I think
> that the cause of this bug is clearly failure to think carefully about
> the different cases that isTempNamespaceInUse is recognizing, so that
> the right way to fix it is more like the attached.

Good idea, thanks.  Your suggestion looks good to me.

> In the back branches, we could leave isTempNamespaceInUse() in place
> but unused, just in case somebody is calling it.  I kind of doubt that
> anyone is, given the small usage in core, but maybe.

I doubt that there are any external callers, but I'd rather leave the
past API in place on back-branches.
--
Michael


signature.asc
Description: PGP signature


Re: d25ea01275 and partitionwise join

2020-02-28 Thread Tom Lane
Amit Langote  writes:
> On Wed, Nov 6, 2019 at 2:00 AM Tom Lane  wrote:
>> Just to leave a breadcrumb in this thread --- the planner failure
>> induced by d25ea01275 has been fixed in 529ebb20a.  The difficulty
>> with multiway full joins that Amit started this thread with remains
>> open, but I imagine the posted patches will need rebasing over
>> 529ebb20a.

> Here are the rebased patches.

The cfbot shows these patches as failing regression tests.  I think
it is just cosmetic fallout from 6ef77cf46 having changed EXPLAIN's
choices of table alias names; but please look closer to confirm,
and post updated patches.

regards, tom lane




Re: SLRU statistics

2020-02-28 Thread Alvaro Herrera
On 2020-Jan-21, Tomas Vondra wrote:

> On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:

> > I've not tested the performance impact but perhaps we might want to
> > disable these counter by default and controlled by a GUC. And similar
> > to buffer statistics it might be better to inline
> > pgstat_count_slru_page_xxx function for better performance.
> 
> Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
> something like track_slru GUC.

I disagree with adding a GUC.  If a performance impact can be measured
let's turn the functions to static inline, as already proposed.  My
guess is that pgstat_count_slru_page_hit() is the only candidate for
that; all the other paths involve I/O or lock acquisition or even WAL
generation, so the impact won't be measurable anyhow.  We removed
track-enabling GUCs years ago.

BTW, this comment:
/* update the stats counter of pages found in shared 
buffers */

is not strictly true, because we don't use what we normally call "shared
buffers"  for SLRUs.

Patch applies cleanly.  I suggest to move the page_miss() call until
after SlruRecentlyUsed(), for consistency with the other case.

I find SlruType pretty odd, and the accompanying "if" list in
pg_stat_get_slru() correspondingly so.  Would it be possible to have
each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
and query that, somehow.  (I don't think we have an array of SlruCtlData
anywhere though, so this might be a useless idea.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Portal->commandTag as an enum

2020-02-28 Thread Tom Lane
Alvaro Herrera  writes:
> I just realized that we could rename command_tag_display_last_oid() to
> something like command_tag_print_a_useless_zero_for_historical_reasons() 
> and nothing would be lost.

Is there a way to drop that logic altogether by making the tagname string
be "INSERT 0" for the INSERT case?  Or would the zero bleed into other
places where we don't want it?

regards, tom lane




Re: Portal->commandTag as an enum

2020-02-28 Thread Alvaro Herrera
I just realized that we could rename command_tag_display_last_oid() to
something like command_tag_print_a_useless_zero_for_historical_reasons() 
and nothing would be lost.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Portal->commandTag as an enum

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-28, Alvaro Herrera wrote:

> On 2020-Feb-21, John Naylor wrote:
> 
> > Thinking about this some more, would it be possible to treat these
> > like we do parser/kwlist.h? Something like this:
> > 
> > commandtag_list.h:
> > PG_COMMANDTAG(ALTER_ACCESS_METHOD, "ALTER ACCESS METHOD", true, false,
> > false, false)
> > ...
> 
> I liked this idea, so I'm halfway on it now.

Here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From ff9a1de6f8d589281841dfde3a6e096f8f992f81 Mon Sep 17 00:00:00 2001
From: Mark Dilger 
Date: Thu, 6 Feb 2020 11:41:44 -0800
Subject: [PATCH v6] Migrating commandTag from string to enum.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The backend was using strings to represent command tags and doing string
comparisons in multiple places.  Fixing that by creating a new
CommandTag enum and using it instead.

Replacing numerous occurrences of char *completionTag with a
QueryCompletionData struct so that the code no longer stores information
about completed queries in a cstring.  Only at the last moment, in
EndCommand(), does this get converted to a string.

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags.

Author: Mark Dilger, Álvaro Herrera
---
 .../pg_stat_statements/pg_stat_statements.c   |  18 +-
 contrib/sepgsql/hooks.c   |   6 +-
 doc/src/sgml/event-trigger.sgml   |   2 +-
 src/backend/commands/createas.c   |  14 +-
 src/backend/commands/event_trigger.c  | 160 +
 src/backend/commands/matview.c|   2 +-
 src/backend/commands/portalcmds.c |  16 +-
 src/backend/commands/prepare.c|   4 +-
 src/backend/executor/execMain.c   |   4 +-
 src/backend/executor/functions.c  |   4 +-
 src/backend/executor/spi.c|  22 +-
 src/backend/replication/logical/decode.c  |   2 +-
 src/backend/replication/walsender.c   |  18 +-
 src/backend/tcop/Makefile |   1 +
 src/backend/tcop/cmdtag.c | 105 
 src/backend/tcop/dest.c   |  43 +-
 src/backend/tcop/postgres.c   |  24 +-
 src/backend/tcop/pquery.c | 112 ++--
 src/backend/tcop/utility.c| 560 +-
 src/backend/utils/cache/evtcache.c|  30 +-
 src/backend/utils/cache/plancache.c   |   4 +-
 src/backend/utils/mmgr/portalmem.c|   6 +-
 src/include/commands/createas.h   |   3 +-
 src/include/commands/event_trigger.h  |   3 +-
 src/include/commands/matview.h|   2 +-
 src/include/commands/portalcmds.h |   2 +-
 src/include/commands/prepare.h|   2 +-
 src/include/tcop/cmdtag.h |  57 ++
 src/include/tcop/cmdtaglist.h | 208 +++
 src/include/tcop/dest.h   |   6 +-
 src/include/tcop/pquery.h |   2 +-
 src/include/tcop/utility.h|  15 +-
 src/include/utils/evtcache.h  |   3 +-
 src/include/utils/plancache.h |   7 +-
 src/include/utils/portal.h|   6 +-
 src/pl/plperl/plperl.c|   2 +-
 src/pl/plpgsql/src/pl_exec.c  |  10 +-
 src/pl/tcl/pltcl.c|   3 +-
 .../test_ddl_deparse/test_ddl_deparse.c   |   2 +-
 39 files changed, 882 insertions(+), 608 deletions(-)
 create mode 100644 src/backend/tcop/cmdtag.c
 create mode 100644 src/include/tcop/cmdtag.h
 create mode 100644 src/include/tcop/cmdtaglist.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index e4fda4b404..7b8e690c95 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -307,7 +307,7 @@ static void pgss_ExecutorEnd(QueryDesc *queryDesc);
 static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
-DestReceiver *dest, char *completionTag);
+DestReceiver *dest, QueryCompletion *qc);
 static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
@@ -960,7 +960,7 @@ static void
 pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	ProcessUtilityContext context,
 	ParamListInfo params, QueryEnvironment *queryEnv,
-	DestReceiver *dest, char *completionTag)
+	DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
 
@@ -998,11 +998,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *quer

Re: Less-silly selectivity for JSONB matching operators

2020-02-28 Thread Tom Lane
I wrote:
> This patch is not complete, because I didn't look at changing
> the contrib modules, and grep says at least some of them are using
> contsel for non-geometric data types.  But I thought I'd put it up
> for discussion at this stage.

Hearing nothing, I went ahead and hacked on the contrib code.
The attached 0002 updates hstore, ltree, and pg_trgm to get them
out of using contsel/contjoinsel for anything.  (0001 is the same
patch I posted before.)

In ltree, I noted that < <= >= > were using contsel even though
those are part of a btree opclass, meaning they could perfectly
well use scalarltsel and friends.  So now they do.  Everything
else now uses matchsel/matchjoinsel, leaving ltreeparentsel as
an unused backward-compatibility feature.  I didn't think that
the default selectivity in ltreeparentsel was particularly sane,
so having those operators use their own selectivity logic
instead of using matchsel like everything else seemed pointless
(and certainly pairing a custom ltreeparentsel with contjoinsel
isn't something to encourage).

In pg_trgm, the change of default selectivity estimate causes one
plan to change, but I think that's fine; looking at the data hidden
by COSTS OFF shows the new estimate is closer to reality anyway.
(That test is meant to exercise some gist consistent-function logic,
which it still does, so no worries there.)

The cube and seg extensions still make significant use of contsel and
the other geometric estimator stubs.  Although we could in principle
change those operators to use matchsel, I'm hesitant to do so without
closer analysis.  The sort orderings imposed by their default btree
opclasses correlate strongly with cube/seg size, which is related to
overlap/containment outcomes, so I'm not sure that the histogram
entries would provide a plausibly random sample for this purpose.
So those modules are not touched here.

There are a few other random uses of geometric join estimators
paired with non-geometric restriction estimators, including
these in the core core:

 @>(anyrange,anyelement) | range_contains_elem | rangesel | contjoinsel
 @>(anyrange,anyrange)   | range_contains  | rangesel | contjoinsel
 <@(anyelement,anyrange) | elem_contained_by_range | rangesel | contjoinsel
 <@(anyrange,anyrange)   | range_contained_by  | rangesel | contjoinsel
 &&(anyrange,anyrange)   | range_overlaps  | rangesel | areajoinsel

plus the @@ and ~~ operators in intarray.  While this is ugly,
it's probably not worth changing until somebody creates non-stub
join selectivity code that will work for these cases.

regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 070868f..2791037 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -559,8 +559,6 @@ ltree2text(PG_FUNCTION_ARGS)
 }
 
 
-#define DEFAULT_PARENT_SEL 0.001
-
 /*
  *	ltreeparentsel - Selectivity of parent relationship for ltree data types.
  */
@@ -571,101 +569,12 @@ ltreeparentsel(PG_FUNCTION_ARGS)
 	Oid			operator = PG_GETARG_OID(1);
 	List	   *args = (List *) PG_GETARG_POINTER(2);
 	int			varRelid = PG_GETARG_INT32(3);
-	VariableStatData vardata;
-	Node	   *other;
-	bool		varonleft;
 	double		selec;
 
-	/*
-	 * If expression is not variable <@ something or something <@ variable,
-	 * then punt and return a default estimate.
-	 */
-	if (!get_restriction_variable(root, args, varRelid,
-  &vardata, &other, &varonleft))
-		PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
-
-	/*
-	 * If the something is a NULL constant, assume operator is strict and
-	 * return zero, ie, operator will never return TRUE.
-	 */
-	if (IsA(other, Const) &&
-		((Const *) other)->constisnull)
-	{
-		ReleaseVariableStats(vardata);
-		PG_RETURN_FLOAT8(0.0);
-	}
-
-	if (IsA(other, Const))
-	{
-		/* Variable is being compared to a known non-null constant */
-		Datum		constval = ((Const *) other)->constvalue;
-		FmgrInfo	contproc;
-		double		mcvsum;
-		double		mcvsel;
-		double		nullfrac;
-		int			hist_size;
-
-		fmgr_info(get_opcode(operator), &contproc);
-
-		/*
-		 * Is the constant "<@" to any of the column's most common values?
-		 */
-		mcvsel = mcv_selectivity(&vardata, &contproc, constval, varonleft,
- &mcvsum);
-
-		/*
-		 * If the histogram is large enough, see what fraction of it the
-		 * constant is "<@" to, and assume that's representative of the
-		 * non-MCV population.  Otherwise use the default selectivity for the
-		 * non-MCV population.
-		 */
-		selec = histogram_selectivity(&vardata, &contproc,
-	  constval, varonleft,
-	  10, 1, &hist_size);
-		if (selec < 0)
-		{
-			/* Nope, fall back on default */
-			selec = DEFAULT_PARENT_SEL;
-		}
-		else if (hist_size < 100)
-		{
-			/*
-			 * For histogram sizes from 10 to 100, we combine the histogram
-			 * and default selectivities, putting increasingly more trust in
-			 * the histogram for larger sizes.
-			 */
-			double		hist_weight = hist_size / 100.0;
-
-			

Re: HAVE_WORKING_LINK still needed?

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-28, Tom Lane wrote:

> Also +1 for s/durable_link_or_rename/durable_link/.

Actually, it's not *that* either, because what the function does is link
followed by unlink.  So it's more a variation of durable_rename with
slightly different semantics -- the difference is what happens if a file
with the target name already exists.  Maybe call it durable_rename_no_overwrite.

There's a lot of commonality between the two.  Perhaps it's not entirely
silly to merge both as a single routine, with a flag to select either
behavior.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: HAVE_WORKING_LINK still needed?

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-28, Peter Eisentraut wrote:

> @@ -788,7 +788,6 @@ durable_link_or_rename(const char *oldfile, const char 
> *newfile, int elevel)
>   if (fsync_fname_ext(oldfile, false, false, elevel) != 0)
>   return -1;
>  
> -#ifdef HAVE_WORKING_LINK
>   if (link(oldfile, newfile) < 0)
>   {
>   ereport(elevel,
> @@ -798,17 +797,6 @@ durable_link_or_rename(const char *oldfile, const char 
> *newfile, int elevel)
>   return -1;
>   }
>   unlink(oldfile);
> -#else
> - /* XXX: Add racy file existence check? */
> - if (rename(oldfile, newfile) < 0)

Maybe rename durable_link_or_rename to just durable_link?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Improve handling of parameter differences in physical replication

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-27, Peter Eisentraut wrote:

> So this patch relaxes this a bit.  Upon receipt of
> XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
> warning and set a global flag if there is a problem.  Then when we
> actually hit the resource issue and the flag was set, we issue another
> warning message with relevant information.  Additionally, at that
> point we pause recovery, so a hot standby remains usable.

Hmm, so what is the actual end-user behavior?  As I read the code, we
first send the WARNING, then pause recovery until the user resumes
replication; at that point we raise the original error.  Presumably, at
that point the startup process terminates and is relaunched, and replay
continues normally.  Is that it?

I think if the startup process terminates because of the original error,
after it is unpaused, postmaster will get that as a signal to do a
crash-recovery cycle, closing all existing connections.  Is that right?
If so, it would be worth improving that (possibly by adding a
sigsetjmp() block) to avoid the disruption.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-28, ahsan hadi wrote:


> Tested the pg_dump patch for dumping "ALTER .. DEPENDS ON EXTENSION" in case 
> of indexes, functions, triggers etc. The "ALTER .. DEPENDS ON EXTENSION" is 
> included in the dump. However in some case not sure why "ALTER 
> INDEX.DEPENDS ON EXTENSION" is repeated several times in the dump?

Hi, thanks for testing.

Are the repeated commands for the same index, same extension?  Did you
apply the same command multiple times before running pg_dump?

There was an off-list complaint that if you repeat the ALTER .. DEPENDS
for the same object on the same extension, then the same dependency is
registered multiple times.  (You can search pg_depend for "deptype = 'x'"
to see that).  I suppose that would lead to the line being output
multiple times by pg_dump, also.  Is that what you did?

If so: Patch 0002 is supposed to fix that problem, by raising an error
if the dependency is already registered ... though it occurs to me now
that it would be more in line with custom to make the command a silent
no-op.  In fact, doing that would cause old dumps (generated with
databases containing duplicated entries) to correctly restore a single
entry, without error.  Therefore my inclination now is to change 0002
that way and push and backpatch it ahead of 0001.

I realize just now that I have failed to verify what happens with
partitioned indexes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Making psql error out on output failures

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-18, David Zhang wrote:

> 3. I think a better way to resolve this issue will still be the solution
> with an extra %m, which can make the error message much more informative to
> the end users.

Yes, agreed.  However, we use a style like this:

pg_log_error("could not print tuples: %m");

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'

2020-02-28 Thread Alvaro Herrera
I just noticed that this patch has been classified under "bug fixes",
but per Tom's comments, this is not a bug fix -- it seems we would need
a new format code to implement some different week numbering mechanism.
That seems a new feature, not a bug fix.

Therefore I propose to move this in Commitfest from "Bug fixes" to
"Server features".  This has implications such as not automatically
moving to next commitfest if no update appears during this one.


I've never personally had to write calendaring applications, so I don't
have an opinion on whether this is useful.  Why isn't it sufficient to
rely on ISO week/day numbering (IW/ID), which appears to be more
consistent?  I think we should consider adding more codes only if
real-world use cases exist for them.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Portal->commandTag as an enum

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-21, John Naylor wrote:

> Thinking about this some more, would it be possible to treat these
> like we do parser/kwlist.h? Something like this:
> 
> commandtag_list.h:
> PG_COMMANDTAG(ALTER_ACCESS_METHOD, "ALTER ACCESS METHOD", true, false,
> false, false)
> ...

I liked this idea, so I'm halfway on it now.

> I'm hand-waving a bit, and it doesn't have the flexibility of a .dat
> file, but it's a whole lot simpler.

Yeah, I for one don't want to maintain the proposed DataFile.pm.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Libpq support to connect to standby server as priority

2020-02-28 Thread Alvaro Herrera
MauMau, Greg, is any of you submitting a new patch for this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Comments related to "Take fewer snapshots" and "Revert patch for taking fewer snapshots"

2020-02-28 Thread Alvaro Herrera
On 2020-Feb-10, Michail Nikolaev wrote:

> I think it is good idea to add few comments to code related to the
> topic in order to safe time for a next guy.

Applied, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add LogicalTapeSetExtend() to logtape.c

2020-02-28 Thread Jeff Davis
On Fri, 2020-02-28 at 14:16 +0800, Adam Lee wrote:
> I noticed another difference, I was using palloc0(), which could be
> one of the
> reason, but not sure.

I changed the palloc0()'s in your code to plain palloc(), and it didn't
make any perceptible difference. Still slower than the version I posted
that keeps the flexible array.

Did you compare all 3? Master, with your patch, and with my patch? I'd
like to see if you're seeing the same thing that I am.

> Tested your hashagg-20200226.patch on my laptop(Apple clang version
> 11.0.0),
> the average time is 25.9s:

That sounds high -- my runs are about half that time. Is that with a
debug build or an optimized one?

Regards,
Jeff Davis







Re: Allowing ALTER TYPE to change storage strategy

2020-02-28 Thread Tom Lane
Tomas Vondra  writes:
> My understanding is that pg_type.typstorage essentially specifies two
> things: (a) default storage strategy for the attributes with this type,
> and (b) whether the type implementation is prepared to handle TOAST-ed
> values or not. And pg_attribute.attstorage has to respect this - when
> the type says PLAIN then the attribute can't simply use strategy that
> would enable TOAST.

Check.

> Luckily, this is only a problem when switching typstorage to PLAIN (i.e.
> when disabling TOAST for the type). The attached v1 patch checks if
> there are attributes with non-PLAIN storage for this type, and errors
> out if it finds one. But unfortunately that's not entirely correct,
> because ALTER TABLE only sets storage for new data. A table may be
> created with e.g. EXTENDED storage for an attribute, a bunch of rows may
> be loaded and then the storage for the attribute may be changed to
> PLAIN. That would pass the check as it's currently in the patch, yet
> there may be TOAST-ed values for the type with PLAIN storage :-(

> I'm not entirely sure what to do about this, but I think it's OK to just
> reject changes in this direction (from non-PLAIN to PLAIN storage).

Yeah, I think you should just reject that: once toast-capable, always
toast-capable.  Scanning pg_attribute is entirely insufficient because
of race conditions --- and while we accept such races in some other
places, this seems like a place where the risk is too high and the
value too low.

Anybody who really needs to go in that direction still has the alternative
of manually frobbing the catalogs (and taking the responsibility for
races and un-toasting whatever's stored already).

regards, tom lane




Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-02-28 Thread Tom Lane
I wrote:
> Also, I notice that isTempNamespaceInUse is already detecting the case
> where the namespace doesn't exist or isn't really a temp namespace.
> I wonder whether it'd be better to teach that to return an indicator about
> the namespace not being what you think it is.  That would force us to look
> at its other callers to see if any of them have related bugs, which seems
> like a good thing to check --- and even if they don't, having to think
> about the point in future call sites might forestall new bugs.

After poking around, I see there aren't any other callers.  But I think
that the cause of this bug is clearly failure to think carefully about
the different cases that isTempNamespaceInUse is recognizing, so that
the right way to fix it is more like the attached.

In the back branches, we could leave isTempNamespaceInUse() in place
but unused, just in case somebody is calling it.  I kind of doubt that
anyone is, given the small usage in core, but maybe.

regards, tom lane

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index e70243a..5ff7824 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3217,7 +3217,7 @@ isOtherTempNamespace(Oid namespaceId)
 }
 
 /*
- * isTempNamespaceInUse - is the given namespace owned and actively used
+ * checkTempNamespaceStatus - is the given namespace owned and actively used
  * by a backend?
  *
  * Note: this can be used while scanning relations in pg_class to detect
@@ -3225,8 +3225,8 @@ isOtherTempNamespace(Oid namespaceId)
  * given database.  The result may be out of date quickly, so the caller
  * must be careful how to handle this information.
  */
-bool
-isTempNamespaceInUse(Oid namespaceId)
+TempNamespaceStatus
+checkTempNamespaceStatus(Oid namespaceId)
 {
 	PGPROC	   *proc;
 	int			backendId;
@@ -3235,25 +3235,25 @@ isTempNamespaceInUse(Oid namespaceId)
 
 	backendId = GetTempNamespaceBackendId(namespaceId);
 
-	/* No such temporary namespace? */
+	/* No such namespace, or its name shows it's not temp? */
 	if (backendId == InvalidBackendId)
-		return false;
+		return TEMP_NAMESPACE_NOT_TEMP;
 
 	/* Is the backend alive? */
 	proc = BackendIdGetProc(backendId);
 	if (proc == NULL)
-		return false;
+		return TEMP_NAMESPACE_IDLE;
 
 	/* Is the backend connected to the same database we are looking at? */
 	if (proc->databaseId != MyDatabaseId)
-		return false;
+		return TEMP_NAMESPACE_IDLE;
 
 	/* Does the backend own the temporary namespace? */
 	if (proc->tempNamespaceId != namespaceId)
-		return false;
+		return TEMP_NAMESPACE_IDLE;
 
-	/* all good to go */
-	return true;
+	/* Yup, so namespace is busy */
+	return TEMP_NAMESPACE_IN_USE;
 }
 
 /*
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 6d1f28c..5314228 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2073,7 +2073,7 @@ do_autovacuum(void)
 			 * We just ignore it if the owning backend is still active and
 			 * using the temporary schema.
 			 */
-			if (!isTempNamespaceInUse(classForm->relnamespace))
+			if (checkTempNamespaceStatus(classForm->relnamespace) == TEMP_NAMESPACE_IDLE)
 			{
 /*
  * The table seems to be orphaned -- although it might be that
@@ -2243,7 +2243,7 @@ do_autovacuum(void)
 			continue;
 		}
 
-		if (isTempNamespaceInUse(classForm->relnamespace))
+		if (checkTempNamespaceStatus(classForm->relnamespace) != TEMP_NAMESPACE_IDLE)
 		{
 			UnlockRelationOid(relid, AccessExclusiveLock);
 			continue;
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index d67f93a..3e3a192 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -38,6 +38,16 @@ typedef struct _FuncCandidateList
 }		   *FuncCandidateList;
 
 /*
+ * Result of checkTempNamespaceStatus
+ */
+typedef enum TempNamespaceStatus
+{
+	TEMP_NAMESPACE_NOT_TEMP,	/* nonexistent, or non-temp namespace */
+	TEMP_NAMESPACE_IDLE,		/* exists, belongs to no active session */
+	TEMP_NAMESPACE_IN_USE		/* belongs to some active session */
+} TempNamespaceStatus;
+
+/*
  *	Structure for xxxOverrideSearchPath functions
  */
 typedef struct OverrideSearchPath
@@ -138,7 +148,7 @@ extern bool isTempToastNamespace(Oid namespaceId);
 extern bool isTempOrTempToastNamespace(Oid namespaceId);
 extern bool isAnyTempNamespace(Oid namespaceId);
 extern bool isOtherTempNamespace(Oid namespaceId);
-extern bool isTempNamespaceInUse(Oid namespaceId);
+extern TempNamespaceStatus checkTempNamespaceStatus(Oid namespaceId);
 extern int	GetTempNamespaceBackendId(Oid namespaceId);
 extern Oid	GetTempToastNamespace(void);
 extern void GetTempNamespaceState(Oid *tempNamespaceId,


Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-02-28 Thread Tom Lane
Michael Paquier  writes:
> And back on that one, I still like better the solution as of the
> attached which skips any relations with their namespace gone missing
> as 246a6c87's intention was only to allow orphaned temp relations to
> be dropped by autovacuum when a backend slot is connected, but not
> using yet its own temp namespace.

Simply skipping the drop looks like basically the right fix to me.

A tiny nit is that using "get_namespace_name(...) != NULL" as a test for
existence of the namespace seems a bit weird/unreadable.  I'd be more
inclined to code that as a SearchSysCacheExists test, at least in the
place where you don't actually need the namespace name.

Also, I notice that isTempNamespaceInUse is already detecting the case
where the namespace doesn't exist or isn't really a temp namespace.
I wonder whether it'd be better to teach that to return an indicator about
the namespace not being what you think it is.  That would force us to look
at its other callers to see if any of them have related bugs, which seems
like a good thing to check --- and even if they don't, having to think
about the point in future call sites might forestall new bugs.

regards, tom lane




Re: HAVE_WORKING_LINK still needed?

2020-02-28 Thread Tom Lane
Peter Eisentraut  writes:
> I came across the HAVE_WORKING_LINK define in pg_config_manual.h. 
> AFAICT, hard links are supported on Windows and Cygwin in the OS 
> versions that we support, and pg_upgrade already contains the required 
> shim.  It seems to me we could normalize and simplify that, as in the 
> attached patches.  (Perhaps rename durable_link_or_rename() then.)  I 
> successfully tested on MSVC, MinGW, and Cygwin.

I don't have any way to test on Windows, but this patchset passes
eyeball review.  +1 for getting rid of the special cases.
Also +1 for s/durable_link_or_rename/durable_link/.

regards, tom lane




Re: HAVE_WORKING_LINK still needed?

2020-02-28 Thread Juan José Santamaría Flecha
On Fri, Feb 28, 2020 at 2:15 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> I came across the HAVE_WORKING_LINK define in pg_config_manual.h.
> AFAICT, hard links are supported on Windows and Cygwin in the OS
> versions that we support, and pg_upgrade already contains the required
> shim.  It seems to me we could normalize and simplify that, as in the
> attached patches.  (Perhaps rename durable_link_or_rename() then.)  I
> successfully tested on MSVC, MinGW, and Cygwin.
>

The link referenced in the comments of win32_pghardlink() [1] is quite old,
and is automatically redirected to the current documentation [2]. Maybe
this patch should use the new path.

[1] http://msdn.microsoft.com/en-us/library/aa363860(VS.85).aspx
[2]
https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createhardlinka

Regards,

Juan José Santamaría Flecha


Re: Minor issues in .pgpass

2020-02-28 Thread Hamid Akhtar
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

First of all, this seems like fixing a valid issue, albeit, the probability of 
somebody messing is low, but it is still better to fix this problem.

I've not tested the patch in any detail, however, there are a couple of 
comments I have before I proceed on with detailed testing.

1. pgindent is showing a few issues with formatting. Please have a look and 
resolve those.
2. I think you can potentially use "len" variable instead of introducing 
"buflen" and "tmplen" variables. Also, I would choose a more appropriate name 
for "tmp" variable.

I believe if you move the following lines before the conditional statement and 
simply and change the if statement to "if (len >= sizeof(buf) - 1)", it will 
serve the purpose.

/* strip trailing newline and carriage return */
len = pg_strip_crlf(buf);

if (len == 0)
continue;


So, the patch should look like this in my opinion (ignore the formatting issues 
as this is just to give you an idea of what I mean):

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 408000a..6ca262f 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -6949,6 +6949,7 @@ passwordFromFile(const char *hostname, const char *port, 
const char *dbname,
 {
FILE   *fp;
struct stat stat_buf;
+   int line_number = 0;
 
 #define LINELEN NAMEDATALEN*5
charbuf[LINELEN];
@@ -7018,10 +7019,40 @@ passwordFromFile(const char *hostname, const char 
*port, const char *dbname,
if (fgets(buf, sizeof(buf), fp) == NULL)
break;
 
-   /* strip trailing newline and carriage return */
-   len = pg_strip_crlf(buf);
+   line_number++;
 
-   if (len == 0)
+/* strip trailing newline and carriage return */
+len = pg_strip_crlf(buf);
+
+if (len == 0)
+continue;
+
+   if (len >= sizeof(buf) - 1)
+   {
+   chartmp[LINELEN];
+
+   /*
+* Warn if this password setting line is too long,
+* because it's unexpectedly truncated.
+*/
+   if (buf[0] != '#')
+   fprintf(stderr,
+   libpq_gettext("WARNING: line %d 
too long in password file \"%s\"\n"),
+   line_number, pgpassfile);
+
+   /* eat rest of the line */
+   while (!feof(fp) && !ferror(fp))
+   {
+   if (fgets(tmp, sizeof(tmp), fp) == NULL)
+   break;
+   len = strlen(tmp);
+   if (len < sizeof(tmp) -1 || tmp[len - 1] == 
'\n')
+   break;
+   }
+   }
+
+   /* ignore comments */
+   if (buf[0] == '#')

---
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

The new status of this patch is: Waiting on Author


Re: Trying to pull up EXPR SubLinks

2020-02-28 Thread Tom Lane
Richard Guo  writes:
> On Fri, Feb 28, 2020 at 3:02 PM Andy Fan  wrote:
>> Glad to see this.  I think the hard part is this transform is not *always*
>> good.  for example foo.a only has 1 rows, but bar has a lot  of rows, if
>> so the original would be the better one.

> Yes exactly. TBH I'm not sure how to achieve that.

Yeah, I was about to make the same objection when I saw Andy already had.
Without some moderately-reliable way of estimating whether the change
is actually a win, I think we're better off leaving it out.  The user
can always rewrite the query for themselves if the grouped implementation
would be better -- but if the planner just does it blindly, there's no
recourse when it's worse.

> Any ideas on this part?

I wonder whether it'd be possible to rewrite the query, but then
consider two implementations, one where the equality clause is
pushed down into the aggregating subquery as though it were LATERAL.
You'd want to be able to figure out that the presence of that clause
made it unnecessary to do the GROUP BY ... but having done so, a
plan treating the aggregating subquery as LATERAL ought to be pretty
nearly performance-equivalent to the current way.  So this could be
mechanized in the current planner structure by treating that as a
parameterized path for the subquery, and comparing it to unparameterized
paths that calculate the full grouped output.

Obviously it'd be a long slog from here to there, but it seems like
maybe that could be made to work.  There's a separate question about
whether it's really worth the trouble, seeing that the optimization
is available today to people who rewrite their queries.

regards, tom lane




Re: proposal: schema variables

2020-02-28 Thread Pavel Stehule
čt 27. 2. 2020 v 15:37 odesílatel Pavel Stehule 
napsal:

>
> Hi
>
>
>> 3) Any way to define CONSTANTs ?
>> We already talked a bit about this subject and also Gilles Darold
>> introduces it in this mailing-list topic but I'd like to insist on it.
>> I think it would be nice to have a way to say that a variable should not
>> be changed once defined.
>> Maybe it's hard to implement and can be implemented later, but I just
>> want to know if this concern is open.
>>
>
> I played little bit with it and I didn't find any nice solution, but maybe
> I found the solution. I had ideas about some variants, but almost all time
> I had a problem with parser's shifts because all potential keywords are not
> reserved.
>
> last variant, but maybe best is using keyword WITH
>
> So the syntax can looks like
>
> CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT
> expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ]
>
> What do you think about this syntax? It doesn't need any new keyword, and
> it easy to enhance it.
>
> CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT);
>

After some more thinking and because in other patch I support syntax CREATE
TRANSACTION VARIABLE ... I change my opinion and implemented support for
syntax CREATE IMMUTABLE VARIABLE for define constants.

See attached patch

Regards

Pavel


>
> ?
>
> Regards
>
> Pavel
>
>
>


schema-variables-20200228.patch.gz
Description: application/gzip


Re: Use compiler intrinsics for bit ops in hash

2020-02-28 Thread David Fetter
On Thu, Feb 27, 2020 at 02:41:49PM +0800, John Naylor wrote:
> On Thu, Feb 27, 2020 at 1:56 PM David Fetter  wrote:
> > [v6 set]
> 
> Hi David,
> 
> In 0002, the pg_bitutils functions have a test (input > 0), and the
> new callers ceil_log2_* and next_power_of_2_* have asserts. That seems
> backward to me.

To me, too, now that you mention it.  My thinking was a little fuzzed
by trying to accommodate platforms with intrinsics where clz is
defined for 0 inputs.

> I imagine some callers of bitutils will already know the value > 0,
> and it's probably good to keep that branch out of the lowest level
> functions. What do you think?

I don't know quite how smart compilers and CPUs are these days, so
it's unclear to me how often that branch would actually happen.

Anyhow, I'll get a revised patch set out later today.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Planning counters in pg_stat_statements (using pgss_store)

2020-02-28 Thread legrand legrand
Hi Julien,

>> But I would have prefered this new feature to work the same way with or
>> without track_planning activated ;o(

> Definitely, but fixing the issue in pgss (ignoring planner calls when
> we don't have a query text) means that pgss won't give an exhaustive
> view of activity anymore, so a fix in IVM would be a better solution.
> Let's wait and see if Nagata-san and other people involved in that
> have an opinion on it.

It seems IVM team does not consider this point as a priority ... 
We should not wait for them, if we want to keep a chance to be 
included in PG13.

So we have to make this feature more robust, an assert failure being 
considered as a severe regression (even if this is not coming from pgss).

I like the idea of adding a check for a non-zero queryId in the new 
pgss_planner_hook() (zero queryid shouldn't be reserved for
utility_statements ?).

Fixing the corner case where a query (with no sql text) can be planned 
without being parsed is an other subject that should be resolved in an 
other thread.

This kind of query was ignored in pgss, it should be ignored in pgss with 
planning counters.

Any thoughts ?
Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Resume vacuum and autovacuum from interruption and cancellation

2020-02-28 Thread Masahiko Sawada
On Tue, 5 Nov 2019 at 15:57, Masahiko Sawada
 wrote:
>
> On Sat, 2 Nov 2019 at 02:10, Robert Haas  wrote:
> >
> > On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih  
> > wrote:
> > > Sounds like an interesting idea, but does it really help? Because if
> > > vacuum was interrupted previously, wouldn't it already know the dead
> > > tuples, etc in the next run quite quickly, as the VM, FSM is already
> > > updated for the page in the previous run.
> >
> > +1. I don't deny that a patch like this could sometimes save
> > something, but it doesn't seem like it would save all that much all
> > that often. If your autovacuum runs are being frequently cancelled,
> > that's going to be a big problem, I think.
>
> I've observed the case where user wants to cancel a very long running
> autovacuum (sometimes for anti-wraparound) for doing DDL or something
> maintenance works. If the table is very large autovacuum could take a
> long time and they might not reclaim garbage enough.
>
> > And as Rafia says, even
> > though you might do a little extra work reclaiming garbage from
> > subsequently-modified pages toward the beginning of the table, it
> > would be unusual if they'd *all* been modified. Plus, if they've
> > recently been modified, they're more likely to be in cache.
> >
> > I think this patch really needs a test scenario or demonstration of
> > some kind to prove that it produces a measurable benefit.
>
> Okay. A simple test could be that we cancel a long running vacuum on a
> large table that is being updated and rerun vacuum. And then we see
> the garbage on that table. I'll test it.
>

Attached the updated version patch.

I've measured the effect by this patch. In the test, I simulate the
case where autovacuum running on the table that is being updated is
canceled in the middle of vacuum, and then rerun (or resume)
autovacuum on the table. Since the vacuum resume block is saved after
heap vacuum, I set maintenance_work_mem so that vacuum on that table
needs heap vacuum twice or more. In other words, maintenance_work_mem
are used up during autovacuum at least more than once. The detail step
is:

1.  Make table dirty for 15 min
2.  Run vacuum with vacuum delays
3.  After the first heap vacuum, cancel it
4.  Rerun vacuum (or with the patch resume vacuum)
Through step #2 to step #4 the table is being updated in background. I
used pgbench and \random command, so the table is updated uniformly.

 I've measured the dead tuple percentage of the table. In these tests,
how long step #4 took and how much collected garbage at step #4 are
important.

1. Canceled vacuum after processing about 20% of table at step #2.
1-1. HEAD
After making dirtied (after step #1): 6.96%
After cancellation (after step #3): 6.13%

At step #4, vacuum reduced it to 4.01% and took 12m 49s. The vacuum
efficiency is 0.16%/m (2.12% down in 12.8min).

1-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.92%
After cancellation (after step #3): 5.84%

At step #4, vacuum reduced it to 4.32% and took 10m 26s. The vacuum
efficiency is 0.14%/m.

--
2. Canceled vacuum after processing about 40% of table at step #2.
2-1. HEAD
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.56%

At step #4, vacuum reduced it to 1.91% and took 8m 15s.The vacuum
efficiency is 0.32%/m.

2-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.46%

At step #4, vacuum reduced it to 1.94% and took 6m 30s. The vacuum
efficiency is 0.38%/m.

-
3. Canceled vacuum after processing about 70% of table at step #2.
3-1. HEAD
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.73%

At step #4, vacuum reduced it to 2.32% and took 8m 11s. The vacuum
efficiency is 0.29%/m.

3-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.96%
After cancellation (after step #3): 4.73%

At step #4, vacuum reduced it to 3.25% and took 4m 12s. The vacuum
efficiency is 0.35%/m.

According to those results, it's thought that the more we resume
vacuum from the tail of the table, the efficiency is good. Since the
table is being updated uniformly even during autovacuum it was more
efficient to restart autovacuum from last position rather than from
the beginning of the table. I think that results shows somewhat the
benefit of this patch but I'm concerned that it might be difficult for
users when to use this option. In practice the efficiency completely
depends on the dispersion of updated pages, and that test made pages
dirty uniformly, which is not a common situation. So probably if we
want this feature, I think we should automatically enable resuming
when we can basically be sure that resuming is better. For example, we
remember both the last vacuumed block and how many vacuum-able pages
seems to exist from there, and we decide to resume vacuum if we can
expect to process more many pages.

Regards

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
Postg

HAVE_WORKING_LINK still needed?

2020-02-28 Thread Peter Eisentraut
I came across the HAVE_WORKING_LINK define in pg_config_manual.h. 
AFAICT, hard links are supported on Windows and Cygwin in the OS 
versions that we support, and pg_upgrade already contains the required 
shim.  It seems to me we could normalize and simplify that, as in the 
attached patches.  (Perhaps rename durable_link_or_rename() then.)  I 
successfully tested on MSVC, MinGW, and Cygwin.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 2d14ebf68aeaa777c631655ef8159579106d592f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 27 Feb 2020 16:33:05 +0100
Subject: [PATCH 1/3] pg_standby: Don't use HAVE_WORKING_LINK

HAVE_WORKING_LINK is meant for hard links, mainly for Windows.  Here
it's used for soft links on Unix, and the functionality is optional
anyway, so we can just make it error out normally if needed.
---
 contrib/pg_standby/pg_standby.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c
index 9784d7aef3..3567c86ac8 100644
--- a/contrib/pg_standby/pg_standby.c
+++ b/contrib/pg_standby/pg_standby.c
@@ -144,10 +144,8 @@ CustomizableInitialize(void)
switch (restoreCommandType)
{
case RESTORE_COMMAND_LINK:
-#ifdef HAVE_WORKING_LINK
SET_RESTORE_COMMAND("ln -s -f", WALFilePath, 
xlogFilePath);
break;
-#endif
case RESTORE_COMMAND_COPY:
default:
SET_RESTORE_COMMAND("cp", WALFilePath, xlogFilePath);
-- 
2.25.0

From 210362e7562ea98c21eedd88fd3ab99eabca7a00 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 27 Feb 2020 16:42:13 +0100
Subject: [PATCH 2/3] Move pg_upgrade's Windows link() implementation to
 AC_REPLACE_FUNCS

This was we can make use of it in other components as well, and it
fits better with the rest of the build system.
---
 configure   | 13 
 configure.in|  1 +
 src/bin/pg_upgrade/file.c   | 27 ++---
 src/bin/pg_upgrade/pg_upgrade.h |  2 --
 src/include/pg_config.h.in  |  3 +++
 src/include/port.h  |  4 
 src/port/link.c | 35 +
 src/tools/msvc/Mkvcbuild.pm |  2 +-
 src/tools/msvc/Solution.pm  |  1 +
 9 files changed, 60 insertions(+), 28 deletions(-)
 create mode 100644 src/port/link.c

diff --git a/configure b/configure
index d2c74e530c..1e8a93707f 100755
--- a/configure
+++ b/configure
@@ -15504,6 +15504,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "link" "ac_cv_func_link"
+if test "x$ac_cv_func_link" = xyes; then :
+  $as_echo "#define HAVE_LINK 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" link.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS link.$ac_objext"
+ ;;
+esac
+
+fi
+
 ac_fn_c_check_func "$LINENO" "mkdtemp" "ac_cv_func_mkdtemp"
 if test "x$ac_cv_func_mkdtemp" = xyes; then :
   $as_echo "#define HAVE_MKDTEMP 1" >>confdefs.h
diff --git a/configure.in b/configure.in
index 0b0a871298..7c07cc4a3f 100644
--- a/configure.in
+++ b/configure.in
@@ -1697,6 +1697,7 @@ AC_REPLACE_FUNCS(m4_normalize([
getpeereid
getrusage
inet_aton
+   link
mkdtemp
pread
pwrite
diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index 6db7cce6a2..cc8a675d00 100644
--- a/src/bin/pg_upgrade/file.c
+++ b/src/bin/pg_upgrade/file.c
@@ -26,10 +26,6 @@
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
 
-#ifdef WIN32
-static int win32_pghardlink(const char *src, const char *dst);
-#endif
-
 
 /*
  * cloneFile()
@@ -151,7 +147,7 @@ void
 linkFile(const char *src, const char *dst,
 const char *schemaName, const char *relName)
 {
-   if (pg_link_file(src, dst) < 0)
+   if (link(src, dst) < 0)
pg_fatal("error while creating link for relation \"%s.%s\" 
(\"%s\" to \"%s\"): %s\n",
 schemaName, relName, src, dst, 
strerror(errno));
 }
@@ -369,29 +365,10 @@ check_hard_link(void)
snprintf(new_link_file, sizeof(new_link_file), 
"%s/PG_VERSION.linktest", new_cluster.pgdata);
unlink(new_link_file);  /* might fail */
 
-   if (pg_link_file(existing_file, new_link_file) < 0)
+   if (link(existing_file, new_link_file) < 0)
pg_fatal("could not create hard link between old and new data 
directories: %s\n"
 "In link mode the old and new data directories 
must be on the same file system.\n",
 strerror(errno));
 
unlink(new_link_file);
 }
-
-#ifdef WIN32
-/* implementation of pg_link_file() on Windows */
-static int
-win32_pghardlink(const char *src, const char *dst)
-{
-   /*
-* CreateHardLinkA returns zero for failure
-* http://msdn.microsoft.com/en-us/library/aa363860(VS.85).

Re: Add PostgreSQL home page to --help output

2020-02-28 Thread Peter Eisentraut

On 2020-02-20 12:09, Daniel Gustafsson wrote:

On 20 Feb 2020, at 10:53, Daniel Gustafsson  wrote:


On 20 Feb 2020, at 10:15, Peter Eisentraut  
wrote:

On 2020-02-13 14:24, Greg Stark wrote:

Sounds like a fine idea. But personally I would prefer it without the <> around 
the it, just a url on a line by itself. I think it would be clearer, look cleaner, 
and be easier to select to copy/paste elsewhere.


I'm on the fence about this one, but I like the delimiters because it would 
also work consistently if we put a URL into running text where it might be 
immediately adjacent to other characters.  So I was actually going for easier 
to copy/paste here, but perhaps in other environments it's not easier?


For URLs completely on their own, not using <> makes sense.  Copy pasting 
into the location bar of Safari makes it load the url, but Firefox and Chrome
turn it into a search engine query (no idea about Windows browsers).

For URLs in running text it's not uncommon to have <> around the URL for the
very reason you mention.  Looking at --help and manpages from random open
source tools there seems to be roughly a 50/50 split on using <> or not.


RFC3986 discuss this in , with
the content mostly carried over from RFC2396 appendix E.


I think we weren't going to get any more insights here, so I have 
committed it as is.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-28 Thread Alexey Kondratov

On 2020-02-28 09:43, Michael Paquier wrote:

On Thu, Feb 27, 2020 at 06:29:34PM +0300, Alexey Kondratov wrote:

On 2020-02-27 16:41, Alexey Kondratov wrote:
>
> New version of the patch is attached. Thanks again for your review.
>

Last patch (v18) got a conflict with one of today commits 
(05d8449e73).

Rebased version is attached.


The shape of the patch is getting better.  I have found some issues
when reading through the patch, but nothing huge.

+   printf(_("  -c, --restore-target-wal   use restore_command in
target config\n"));
+   printf(_(" to retrieve WAL files
from archive\n"));
[...]
{"progress", no_argument, NULL, 'P'},
+   {"restore-target-wal", no_argument, NULL, 'c'},
It may be better to reorder that alphabetically.



Sure, I put it in order. However, the recent -R option is out of order 
too.




+   if (rc != 0)
+   /* Sanity check, should never happen. */
+   elog(ERROR, "failed to build restore_command due to missing
parameters");
No point in having this comment IMO.



I would prefer to keep it, since there are plenty of similar comments 
near Asserts and elogs all over the Postgres. Otherwise it may look like 
a valid error state. It may be obvious now, but for someone who is not 
aware of BuildRestoreCommand refactoring it may be not. So from my 
perspective there is nothing bad in this extra one line comment.




+/* logging support */
+#define pg_fatal(...) do { pg_log_fatal(__VA_ARGS__); exit(1); } 
while(0)

Actually, I don't think that this is a good idea to name this
pg_fatal() as we have the same think in pg_rewind so it could be
confusing.



I have added explicit exit(1) calls, since pg_fatal was used only twice 
in the archive.c. Probably, pg_log_fatal from common/logging should obey 
the same logic as FATAL log-level in the backend and do exit the 
process, but for now including pg_rewind.h inside archive.c or vice 
versa does not look like a solution.




-   while ((c = getopt_long(argc, argv, "D:nNPR", long_options,
&option_index)) != -1)
+   while ((c = getopt_long(argc, argv, "D:nNPRc", long_options,
&option_index)) != -1)
Alphabetical order here.



Done.



+   rmdir($node_master->archive_dir);
rmtree() is used in all our other tests.



Done. There was an unobvious logic that rmdir only deletes empty 
directories, which is true in the case of archive_dir in that test, but 
I have unified it for consistency.




+   pg_log_error("archive file \"%s\" has wrong size: %lu
instead of %lu, %s",
+xlogfname, (unsigned long) 
stat_buf.st_size,
+(unsigned long) expectedSize, 
strerror(errno));

I think that the error message should be reworded: "unexpected WAL
file size for \"%s\": %lu instead of %lu".  Please note that there is
no need for strerror() here at all, as errno should be 0.

+if (xlogfd < 0)
+pg_log_error("could not open file \"%s\" restored from 
archive: %s\n",

+ xlogpath, strerror(errno));
[...]
+pg_log_error("could not stat file \"%s\" restored from archive: 
%s",

+xlogpath, strerror(errno));
No need for strerror() as you can just use %m.  And no need for the
extra newline at the end as pg_log_* routines do that by themselves.

+   pg_log_error("could not restore file \"%s\" from archive\n",
+xlogfname);
No need for a newline here.



Thanks, I have cleaned up these log statements.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company

From ba20808ffddf3fe2eefe96d3385697fb6583ce9a Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Tue, 25 Feb 2020 02:22:45 +0300
Subject: [PATCH v20] pg_rewind: Add options to restore WAL files from archive

Currently, pg_rewind fails when it could not find required WAL files in the
target data directory.  One have to manually figure out which WAL files are
required and copy them back from archive.

This commit implements new pg_rewind options, which allow pg_rewind to
automatically retrieve missing WAL files from archival storage. The
restore_command option is read from postgresql.conf.

Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru
Author: Alexey Kondratov
Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera
Reviewed-by: Andres Freund, Alexander Korotkov
---
 doc/src/sgml/ref/pg_rewind.sgml  |  28 --
 src/backend/access/transam/xlogarchive.c |  60 ++--
 src/bin/pg_rewind/parsexlog.c|  33 ++-
 src/bin/pg_rewind/pg_rewind.c|  77 ++-
 src/bin/pg_rewind/pg_rewind.h|   6 +-
 src/bin/pg_rewind/t/001_basic.pl |   3 +-
 src/bin/pg_rewind/t/RewindTest.pm|  66 -
 src/common/Makefile  |   2 +
 src/common/archive.c | 102 
 src/c

Re: Trying to pull up EXPR SubLinks

2020-02-28 Thread Richard Guo
On Fri, Feb 28, 2020 at 3:02 PM Andy Fan  wrote:

>
>
> On Fri, Feb 28, 2020 at 2:35 PM Richard Guo 
> wrote:
>
>> Hi All,
>>
>> Currently we will not consider EXPR_SUBLINK when pulling up sublinks and
>> this would cause performance issues for some queries with the form of:
>> 'a > (SELECT agg(b) from ...)' as described in [1].
>>
>> So here is a patch as an attempt to pull up EXPR SubLinks. The idea,
>> which is based on Greenplum's implementation, is to perform the
>> following transformation.
>>
>> For query:
>>
>> select * from foo where foo.a >
>> (select avg(bar.a) from bar where foo.b = bar.b);
>>
>> we transform it to:
>>
>> select * from foo inner join
>> (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub
>> on foo.b = sub.b and foo.a > sub.avg;
>>
>
> Glad to see this.  I think the hard part is this transform is not *always*
> good.  for example foo.a only has 1 rows, but bar has a lot  of rows, if
> so
> the original would be the better one.
>

Yes exactly. TBH I'm not sure how to achieve that. Currently in the
patch this transformation happens in the stage of preprocessing the
jointree. We do not have enough information at this time to tell which
is better between the transformed one and untransformed one.

If we want to choose the better one by cost comparison, then we need to
plan the query twice, one for the transformed query and one for the
untransformed query. But this seems infeasible in current optimizer's
architecture.

Any ideas on this part?

Thanks
Richard


Re: Implementing Incremental View Maintenance

2020-02-28 Thread legrand legrand
>> thank you for patch v14, that fix problems inherited from temporary
tables.
>> it seems that this ASSERT problem with pgss patch is still present ;o(
>> 
>
> Sorry but we are busy on fixing and improving IVM patches. I think fixing
> the assertion failure needs non trivial changes to other part of
> PosthreSQL.
> So we would like to work on the issue you reported after the pgss patch
> gets committed.

Imagine it will happen tomorrow !
You may say I'm a dreamer
But I'm not the only one
...
...





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: BUG #15858: could not stat file - over 4GB

2020-02-28 Thread Juan José Santamaría Flecha
On Wed, Feb 5, 2020 at 12:47 PM Emil Iggland  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>

The latest version of this patch could benefit from an update. Please find
attached a new version.

Most changes are cosmetic, but they have been more extensive than a simple
rebase so I am changing the status back to 'needs review'.

To summarize those changes:
- Rename 'win32_stat.c' file to  'win32stat.c', as a better match of
project files.
- Improve indentation and comments.
- Remove cruft about old Windows versions.

Regards,

Juan José Santamaría Flecha


0001-Support-for-large-files-on-Win32-v5.patch
Description: Binary data


Re: base backup client as auxiliary backend process

2020-02-28 Thread Peter Eisentraut
I have set this patch to "returned with feedback" in the upcoming commit 
fest, because I will not be able to finish it.


Unsurprisingly, the sequencing of startup actions in postmaster.c is 
extremely tricky and needs more thinking.  All the rest worked pretty 
well, I thought.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: truncating timestamps on arbitrary intervals

2020-02-28 Thread John Naylor
On Wed, Feb 26, 2020 at 11:36 PM Tom Lane  wrote:
>
> John Naylor  writes:
> > [ v3-datetrunc_interval.patch ]
>
> A few thoughts:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

Not sure.

A surprise I foresee in general might be: '1 week' is just '7 days',
and not aligned on "WOY". Since the function is passed an interval and
not text, we can't raise a warning. But date_trunc() already covers
that, so probably not a big deal.

> * I'm still not convinced that the code does the right thing for
> 1-based months or days.  Shouldn't you need to subtract 1, then
> do the modulus, then add back 1?

Yes, brain fade on my part. Fixed in the attached v4.

> * Speaking of modulus, would it be clearer to express the
> calculations like
> timestamp -= timestamp % interval;
> (That's just a question, I'm not sure.)

Seems nicer, so done that way.

> * Code doesn't look to have thought carefully about what to do with
> negative intervals, or BC timestamps.

By accident, negative intervals currently behave the same as positive
ones. We could make negative intervals round up rather than truncate
(or vice versa). I don't know the best thing to do here.

As for BC, changed so it goes in the correct direction at least, and added test.

> * The comment
>  * Justify all lower timestamp units and throw an error if any
>  * of the lower interval units are non-zero.
> doesn't seem to have a lot to do with what the code after it actually
> does.  Also, you need explicit /* FALLTHRU */-type comments in that
> switch, or pickier buildfarm members will complain.

Stale comment from an earlier version, fixed. Not sure if "justify" is
the right term, but "zero" is a bit misleading. Added fall thru's.

> * Seems like you could jam all the unit-related error checking into
> that switch's default: case, where it will cost nothing if the
> call is valid:
>
> switch (unit)
> {
> ...
> default:
> if (unit == 0)
> // complain about zero interval
> else
> // complain about interval with multiple components
> }

Done.

> * I'd use ERRCODE_INVALID_PARAMETER_VALUE for any case of disallowed
> contents of the interval.

Done.

Also removed the millisecond case, since it's impossible, or at least
not worth it, to distinguish from the microsecond case.

Note: I haven't done any additional docs changes in v4.

TODO: with timezone

possible TODO: origin parameter

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v4-datetrunc_interval.patch
Description: Binary data


Re: Make mesage at end-of-recovery less scary.

2020-02-28 Thread Kyotaro Horiguchi
Thank you for the comments.

At Fri, 28 Feb 2020 16:33:18 +0900, Michael Paquier  wrote 
in 
> On Fri, Feb 28, 2020 at 04:01:00PM +0900, Kyotaro Horiguchi wrote:
> > Hello, this is a followup thread of [1].
> > 
> > # I didn't noticed that the thread didn't cover -hackers..
> > 
> > When recovery of any type ends, we see several kinds of error messages
> > that says "WAL is broken".
> 
> Have you considered an error context here?  Your patch leads to a bit
> of duplication with the message a bit down of what you are changing
> where the end of local pg_wal is reached.

It is a DEBUG message and it is for the time moving from crash
recovery to archive recovery. I could remove that but decided to leave
it for tracability.

> > +   * reached the end of WAL.  Otherwise something's really wrong and
> > +   * we report just only the errormsg if any. If we don't receive
> 
> This sentence sounds strange to me.  Or you meant "Something is wrong,
> so use errormsg as report if it is set"?

The whole comment there follows.
| recovery. If we get here during recovery, we can assume that we
| reached the end of WAL.  Otherwise something's really wrong and
| we report just only the errormsg if any. If we don't receive
| errormsg here, we already logged something.  We don't emit
| "reached end of WAL" in muted messages.

"Othhersise" means "other than the case of recovery".  "Just only the
errmsg" means "show the message not as a part the message "reached end
of WAL".

> > +* Note: errormsg is alreay translated.
> 
> Typo here.

Thanks. Will fix along with "rached".

> > +   if (StandbyMode)
> > +   ereport(actual_emode,
> > +   (errmsg ("rached end of WAL at %X/%X on timeline %u in 
> > %s during streaming replication",
> 
> StandbyMode happens also with only WAL archiving, depending on if
> primary_conninfo is set or not.

Right. I'll fix it. Maybe to "during standby mode".

> > +   (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash 
> > recovery",
> 
> FWIW, you are introducing three times the same typo, in the same
> word, in three different messages.

They're copy-pasto.  I refrained from constructing an error message
from multiple nonindipendent parts.  Are you suggesting to do so?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-02-28 Thread Michael Paquier
On Tue, Feb 25, 2020 at 10:44:40PM +0100, Daniel Gustafsson wrote:
> In doing that I realized that there is another hunk in this patch for fixing 
> up
> logical decoding multi-insert support, which is independent of the patch in
> question here so I split it off.  It's another issue which cause no harm at 
> all
> today, but fails as soon as someone tries to perform multi inserts into the
> catalog.

Yep.  I was wondering how we should do that for some time, but I was
not able to come back to it.

+   /*
+* CONTAINS_NEW_TUPLE will always be set unless the multi_insert was
+* performed for a catalog.  If it is a catalog, return immediately as
+* there is nothing to logically decode.
+*/
+   if (!(xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE))
+   return;
Hmm, OK.  Consistency with DecodeInsert() is a good idea here, so
count me in regarding the way your patch handles the problem.  I would
be fine to apply that part but, Andres, perhaps you would prefer
taking care of it yourself?
--
Michael


signature.asc
Description: PGP signature


RE: [Proposal] Add accumulated statistics for wait event

2020-02-28 Thread imai.yoshik...@fujitsu.com
On Wed, Feb 26, 2020 at 1:39 AM, Kyotaro Horiguchi wrote: 
> Hello.  I had a brief look on this and have some comments on this.

Hi, Horiguchi-san. Thank you for looking at this!

> It uses its own hash implement. Aside from the appropriateness of
> having another implement of existing tool, in the first place hash
> works well for wide, sparse and uncertain set of keys. But they are in
> rather a dense and narrow set with certain and fixed members. It
> registers more than 200 entries but bucket size is 461. It would be
> needed to avoid colliisions, but seems a bit too wasting.

Yes, wait events are grouped and wait events ID are defined as a sequential
number, starting from specified number for each group(like 0x0100U), thus
keys are in a dense and narrow set.

=
#define PG_WAIT_LWLOCK  0x0100U
#define PG_WAIT_LOCK0x0300U
#define PG_WAIT_BUFFER_PIN  0x0400U
#define PG_WAIT_ACTIVITY0x0500U
...

typedef enum 
{
WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY,
WAIT_EVENT_AUTOVACUUM_MAIN,
...
WAIT_EVENT_WAL_WRITER_MAIN
} WaitEventActivity;
=

The number 461 is the lowest number which avoids collisions in the hash for
current wait events set. As you pointed out, there are a bit too much unused
entries.


If we prepare arrays for each wait classes with appropriate size which just
fits the number of each wait event class, we can store wait event statistics
into those arrays with no more wastes.
Also we calculate hash number by "(wait event ID) % (bucket size)" in hash
case, while we calculate index of arrays by "(wait event ID) - (wait event 
class id)"
in array case. The latter one's cost is lower than the former one.

Currently I implement wait event statistics store by hash because of its
easiness of implementation, but I think it is good to implement it by array
for above reasons. One concern is that we put restrictions on wait events
that it must be defined as it is sequenced in its wait class so that there
are no unused entries in the array.


> It seems trying to avoid doing needless work when the feature is not
> activated by checking "if (wa_hash==NULL)", but the hash is created
> being filled with possible entries regardless of whether
> track_wait_timing is on or off.

This might be bad implementation but there are the cases "wa_hash = NULL"
where pgstat_init() is not called like when executing "initdb". I insert
that check for avoiding unexpected crash in those cases.

I also noticed debug codes exist around that code...I will modify it.

> As the result
> pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously
> frequently. This should be the reason for the recent benchmark result.
> I'm not sure such frequency of hash-searching is acceptable even if
> the feature is turned on.

track_wait_timing parameter determines whether we collect wait time.
Regardless of that parameter, we always collect wait count every wait event
happens. I think calling pgstat_get_wa_entry frequently is not critical to
performance. From pavel's benchmark results, if track_wait_timing parameter
is off, there are +/-1.0% performance difference between patched and unpatched
and it is just considered as a measurement error.


Thanks
--
Yoshikazu Imai




Some improvements to numeric sqrt() and ln()

2020-02-28 Thread Dean Rasheed
Attached is a WIP patch to improve the performance of numeric sqrt()
and ln(), which also makes a couple of related improvements to
div_var_fast(), all of which have knock-on benefits for other numeric
functions. The actual impact varies greatly depending on the inputs,
but the overall effect is to reduce the run time of the numeric_big
regression test by about 20%.

Additionally it improves the accuracy of sqrt() -- currently sqrt()
sometimes rounds the last digit of the result the wrong way, for
example sqrt(10001) returns
10001, when the correct answer should be 1
to zero decimal places. With this patch, sqrt() guarantees to return
the result correctly rounded to the last digit for all inputs.

The main change is to sqrt_var(), which now uses a different algorithm
[1] for better performance than the Newton-Raphson method. Actually
I've re-cast the algorithm from [1] into an iterative form, rather
than doing it recursively, as it's presented in that paper. This
improves performance further, by avoiding overheads from function
calls and copying numeric variables around. Also, IMO, the iterative
form of the algorithm is much more elegant, since it works by making a
single pass over the input digits, consuming them one at a time from
most significant to least, producing a succession of increasingly more
accurate approximations to the square root, until the desired
precision is reached.

For inputs with a handful of digits, this is typically 3-5 times
faster, and for inputs with more digits the performance improvement is
larger (e.g. sqrt(2e131071) is around 10 times faster). If the input
is a perfect square, with a result having a lot of trailing zeros, the
new algorithm is much faster because it basically has nothing to do in
later iterations (e.g., sqrt(64e13070) is about 600 times faster).

Another change to sqrt_var() is that it now explicitly supports a
negative rscale, i.e., rounding before the decimal point. This is
exploited by ln_var() in its argument reduction stage -- ln_var()
reduces all inputs to the range (0.9, 1.1) by repeatedly taking the
square root. For very large inputs this can have an enormous impact,
for example log(1e131071) currently takes about 6.5 seconds on my
machine, whereas with this patch I can run it 1000 times in a plpgsql
loop in about 90ms, so its around 70,000 times faster in that case. Of
course, that's an extreme example, and for most inputs it's a much
more modest difference (e.g., ln(2) is about 1.5 times faster).

In passing, I also made a couple of optimisations to div_var_fast(),
discovered while comparing it's performace with div_var() for various
inputs.

It's possible that there are further gains to be had in the sqrt()
algorithm on platforms that support 128-bit integers, but I haven't
had a chance to investigate that yet.

Regards,
Dean

[1] https://hal.inria.fr/inria-00072854/document


numeric-sqrt-ln.patch
Description: Binary data


Re: ALTER INDEX fails on partitioned index

2020-02-28 Thread Michael Paquier
On Thu, Feb 27, 2020 at 05:25:13PM -0600, Justin Pryzby wrote:
>  /*
> - * Option parser for partitioned tables
> - */
> -bytea *
> -partitioned_table_reloptions(Datum reloptions, bool validate)
> -{
> - /*
> -  * There are no options for partitioned tables yet, but this is able to 
> do
> -  * some validation.
> -  */
> - return (bytea *) build_reloptions(reloptions, validate,
> -   
> RELOPT_KIND_PARTITIONED,
> -   0, 
> NULL, 0);
> -}

Please don't undo that.  You can look at 1bbd608 for all the details.
--
Michael


signature.asc
Description: PGP signature


Asynchronous Append on postgres_fdw nodes.

2020-02-28 Thread Kyotaro Horiguchi
Hello, this is a follow-on of [1] and [2].

Currently the executor visits execution nodes one-by-one.  Considering
sharding, Append on multiple postgres_fdw nodes can work
simultaneously and that can largely shorten the respons of the whole
query.  For example, aggregations that can be pushed-down to remote
would be accelerated by the number of remote servers. Even other than
such an extreme case, collecting tuples from multiple servers also can
be accelerated by tens of percent [2].

I have suspended the work waiting asyncrohous or push-up executor to
come but the mood seems inclining toward doing that before that to
come [3].

The patchset consists of three parts.

- v2-0001-Allow-wait-event-set-to-be-regsitered-to-resoure.patch
  The async feature uses WaitEvent, and it needs to be released on
  error.  This patch makes it possible to register WaitEvent to
  resowner to handle that case..

- v2-0002-infrastructure-for-asynchronous-execution.patch
  It povides an abstraction layer of asynchronous behavior
  (execAsync). Then adds ExecAppend, another version of ExecAppend,
  that handles "async-capable" subnodes asynchronously. Also it
  contains planner part that makes planner aware of "async-capable"
  and "async-aware" path nodes.

- v2-0003-async-postgres_fdw.patch
  The "async-capable" postgres_fdw.  It accelerates multiple
  postgres_fdw nodes on a single connection case as well as
  postgres_fdw nodes on dedicate connections.

regards.

[1] https://www.postgresql.org/message-id/2020012917585385831113%40highgo.ca
[2] 
https://www.postgresql.org/message-id/20180515.202945.69332784.horiguchi.kyot...@lab.ntt.co.jp
[3] https://www.postgresql.org/message-id/20191205181217.GA12895%40momjian.us

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 22099ed9a6107b92c8e2b95ff1d199832810629c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 22 May 2017 12:42:58 +0900
Subject: [PATCH v2 1/3] Allow wait event set to be registered to resource
 owner

WaitEventSet needs to be released using resource owner for a certain
case. This change adds WaitEventSet reowner and allow the creator of a
WaitEventSet to specify a resource owner.
---
 src/backend/libpq/pqcomm.c|  2 +-
 src/backend/storage/ipc/latch.c   | 18 -
 src/backend/storage/lmgr/condition_variable.c |  2 +-
 src/backend/utils/resowner/resowner.c | 67 +++
 src/include/storage/latch.h   |  4 +-
 src/include/utils/resowner_private.h  |  8 +++
 6 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 7717bb2719..16aefb03ee 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -218,7 +218,7 @@ pq_init(void)
 (errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
-	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
+	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, NULL, 3);
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 046ca5c6c7..9c10bd5fcf 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -56,6 +56,7 @@
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
+#include "utils/resowner_private.h"
 
 /*
  * Select the fd readiness primitive to use. Normally the "most modern"
@@ -84,6 +85,8 @@ struct WaitEventSet
 	int			nevents;		/* number of registered events */
 	int			nevents_space;	/* maximum number of events in this set */
 
+	ResourceOwner	resowner;	/* Resource owner */
+
 	/*
 	 * Array, of nevents_space length, storing the definition of events this
 	 * set is waiting for.
@@ -393,7 +396,7 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
 	int			ret = 0;
 	int			rc;
 	WaitEvent	event;
-	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, 3);
+	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, NULL, 3);
 
 	if (wakeEvents & WL_TIMEOUT)
 		Assert(timeout >= 0);
@@ -560,12 +563,15 @@ ResetLatch(Latch *latch)
  * WaitEventSetWait().
  */
 WaitEventSet *
-CreateWaitEventSet(MemoryContext context, int nevents)
+CreateWaitEventSet(MemoryContext context, ResourceOwner res, int nevents)
 {
 	WaitEventSet *set;
 	char	   *data;
 	Size		sz = 0;
 
+	if (res)
+		ResourceOwnerEnlargeWESs(res);
+
 	/*
 	 * Use MAXALIGN size/alignment to guarantee that later uses of memory are
 	 * aligned correctly. E.g. epoll_event might need 8 byte alignment on some
@@ -680,6 +686,11 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 	StaticAssertStmt(WSA_INVALID_EVENT == NULL, "");
 #endif
 
+	/* Register this wait event set if requested */
+	set->resowner = res;
+	if (res)
+		ResourceOwnerRememberWES(set->resowner, set);
+
 	return set;
 }
 
@@ -725,6 +736,9 @@ Free

Re: Improve handling of parameter differences in physical replication

2020-02-28 Thread Michael Paquier
On Fri, Feb 28, 2020 at 08:49:08AM +0100, Peter Eisentraut wrote:
> Perhaps it might be better to track the combined MaxBackends instead,
> however.

Not sure about that.  I think that we should keep them separated, as
that's more useful for debugging and more verbose for error reporting.

(Worth noting that max_prepared_xacts is separate because of its dummy
PGPROC entries created by PREPARE TRANSACTION, so it cannot be
included in the set).
--
Michael


signature.asc
Description: PGP signature