Re: Loaded footgun open_datasync on Windows

2018-09-13 Thread Laurenz Albe
Michael Paquier wrote:
> On Thu, Sep 13, 2018 at 03:55:20PM +0200, Laurenz Albe wrote:
> > Attached is the new, improved version of the patch.
> 
> Thanks, finally pushed.  I have made sure that recovery TAP tests,
> upgrade tests and bin/ tests are working properly.

Thanks for being interested and doing the work.

If it turns out not to break anything, would you consider backpatching?
On the one hand it fixes a bug, on the other hand it affects all
frontend executables...

I wonder why nobody noticed the problem in pg_test_fsync earlier.
Is it that people running Windows care less if their storage is reliable?

Yours,
Laurenz Albe




Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-09-13 Thread Kyotaro HORIGUCHI
At Sat, 4 Aug 2018 14:09:18 +1200, Thomas Munro  
wrote in 
> On Mon, Jul 9, 2018 at 6:51 AM, Noah Misch  wrote:
> > On Fri, Jul 06, 2018 at 04:32:56PM +0100, Simon Riggs wrote:
> >> On 6 July 2018 at 03:30, Thomas Munro  
> >> wrote:
> >> > On Thu, Jul 5, 2018 at 8:27 AM, Noah Misch  wrote:
> >> >>> However, 49bff5300d527 also introduced a similar bug where 
> >> >>> subtransaction
> >> >>> commit would fail to release an AccessExclusiveLock, leaving the 
> >> >>> lock to
> >> >>> be removed sometimes early and sometimes late. This commit fixes
> >> >>> that bug also. Backpatch to PG10 needed.
> >> >>
> >> >> Subtransaction commit is too early to release an arbitrary
> >> >> AccessExclusiveLock.  The primary releases every AccessExclusiveLock at
> >> >> top-level transaction commit, top-level transaction abort, or 
> >> >> subtransaction
> >> >> abort.  CommitSubTransaction() doesn't do that; it transfers locks to 
> >> >> the
> >> >> parent sub(xact).  Standby nodes can't safely remove an arbitrary lock 
> >> >> earlier
> >> >> than the primary would.
> >> >
> >> > But we don't release locks acquired by committing subxacts until the
> >> > top level xact commits.  Perhaps that's what the git commit message
> >> > meant by "early", as opposed to "late" meaning when
> >> > StandbyReleaseOldLocks() next runs because an XLOG_RUNNING_XACTS
> >> > record is replayed?
> >>
> >> Locks held by subtransactions were not released at the correct timing
> >> of top-level commit; they are now.
> >
> > I read the above-quoted commit message as saying that the commit expands the
> > set of locks released when replaying subtransaction commit.  The only lock
> > that should ever be released at subxact commit is the subxact XID lock.  If
> > that continues to be the only lock released at subxact commit, good.
> 
> You can still see these "late" lock releases on 10, since the above
> quoted commit (15378c1a) hasn't been back-patched yet:
> 
> CREATE TABLE foo ();
> 
> BEGIN; SAVEPOINT s; LOCK foo; COMMIT;
> 
> An AccessExclusiveLock is held on the standby until the next
> XLOG_RUNNING_XACTS comes along, up to LOG_SNAPSHOT_INTERVAL_MS = 15
> seconds later.
> 
> Does anyone know why StandbyReleaseLocks() releases all locks if
> passed InvalidTransactionId?  When would that happen?

AFAICS, it used to be used at shutdown time since hot standby was
introduced by efc16ea520 from
ShutdownRecoveryTransactionEnvironment/StandbyReleaseAllLocks.

c172b7b02e (Jan 23 2012) modified StandbyReleaseAllLocks not to
call StandbyReleaseLocks with InvalidTransactionId and the
feature became useless, and now it is.

So I think the feature has been obsolete for a long time.


As a similar thing, the following commands leaves AEL even though
the savepoint is rollbacked.

BEGIN; SAVEPOINT s; LOCK foo; CHECKPOINT; ROLLBACK TO SAVEPOINT s;

This is because the checkpoint issues XLOG_STANDBY_LOCK on foo
with the top-transaciton XID.

Every checkpoint issues it for all existent locks so
RecoveryLockList(s) can be bloated with the same lock entries and
increases lock counts. Although it doesn't seem common to sustain
AELs for a long time so that the length harms, I don't think such
duplication is good. Patch attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f9c12e603b..9de46e0bea 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -632,6 +632,7 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 	xl_standby_lock *newlock;
 	LOCKTAG		locktag;
 	bool		found;
+	ListCell   *lc;
 
 	/* Already processed? */
 	if (!TransactionIdIsValid(xid) ||
@@ -653,6 +654,20 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 		entry->locks = NIL;
 	}
 
+	/*
+	 * multple lock for the same object can be given by XLOG_STANDBY_LOCK logs.
+	 * we need no more than one of them.
+	 */
+	foreach (lc, entry->locks)
+	{
+		xl_standby_lock *l = (xl_standby_lock *) lfirst(lc);
+
+		Assert(l->xid == xid);
+
+		if (l->dbOid == dbOid && l->relOid == relOid)
+			return;
+	}
+
 	newlock = palloc(sizeof(xl_standby_lock));
 	newlock->xid = xid;
 	newlock->dbOid = dbOid;


Re: Cache lookup errors with functions manipulation object addresses

2018-09-13 Thread Michael Paquier
On Fri, Sep 14, 2018 at 11:22:12AM +0900, Michael Paquier wrote:
> Attached are rebased versions.  This has been around for some time, so I
> am planning to move on with this patch set pretty soon as that's mainly
> cleanup for getObjectIdentity as it triggers elog("cache lookup") or
> such for undefined objects.  Patch 0001 extends FDW and server routines
> so as it is possible to skip missing entries, without breaking
> compatibility.  Patch 0002 adds a missing_ok flag when doing
> subscription and publication lookups.
> 
> Any objections?

And I forgot to attach the patches..
--
Michael
From fba6464f1555ec05029a58e2bf378ef83ce73172 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 1 Jul 2018 23:26:10 +0900
Subject: [PATCH 1/3] Extend lookup routines for FDW and foreign server with
 NULL handling

The cache lookup routines for foreign-data wrappers and foreign servers
are extended with an extra argument able to control if an error or a
NULL object is returned to the caller in the event of an undefined
object. This is added in a set of new routines to not impact
unnecessrily any FPW plugins.
---
 doc/src/sgml/fdwhandler.sgml  | 30 +
 src/backend/foreign/foreign.c | 36 +--
 src/include/foreign/foreign.h |  4 
 3 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd77c..a68e264261 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1408,6 +1408,21 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
 
 
 ForeignDataWrapper *
+GetForeignDataWrapperExtended(Oid fdwid, bool missing_ok);
+
+
+ This function returns a ForeignDataWrapper
+ object for the foreign-data wrapper with the given OID.  A
+ ForeignDataWrapper object contains properties
+ of the FDW (see foreign/foreign.h for details).
+ If missing_ok is true, a NULL
+ result is returned to the caller instead of an error for an undefined
+ FDW.
+
+
+
+
+ForeignDataWrapper *
 GetForeignDataWrapper(Oid fdwid);
 
 
@@ -1420,6 +1435,21 @@ GetForeignDataWrapper(Oid fdwid);
 
 
 ForeignServer *
+GetForeignServerExtended(Oid serverid, bool missing_ok);
+
+
+ This function returns a ForeignServer object
+ for the foreign server with the given OID.  A
+ ForeignServer object contains properties
+ of the server (see foreign/foreign.h for details).
+ If missing_ok is true, a NULL
+ result is returned to the caller instead of an error for an undefined
+ foreign server.
+
+
+
+
+ForeignServer *
 GetForeignServer(Oid serverid);
 
 
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index eac78a5d31..01b5175e71 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -33,6 +33,18 @@
  */
 ForeignDataWrapper *
 GetForeignDataWrapper(Oid fdwid)
+{
+	return GetForeignDataWrapperExtended(fdwid, false);
+}
+
+
+/*
+ * GetForeignDataWrapperExtended -	look up the foreign-data wrapper
+ * by OID. If missing_ok is true, return NULL if the object cannot be
+ * found instead of raising an error.
+ */
+ForeignDataWrapper *
+GetForeignDataWrapperExtended(Oid fdwid, bool missing_ok)
 {
 	Form_pg_foreign_data_wrapper fdwform;
 	ForeignDataWrapper *fdw;
@@ -43,7 +55,11 @@ GetForeignDataWrapper(Oid fdwid)
 	tp = SearchSysCache1(FOREIGNDATAWRAPPEROID, ObjectIdGetDatum(fdwid));
 
 	if (!HeapTupleIsValid(tp))
-		elog(ERROR, "cache lookup failed for foreign-data wrapper %u", fdwid);
+	{
+		if (!missing_ok)
+			elog(ERROR, "cache lookup failed for foreign-data wrapper %u", fdwid);
+		return NULL;
+	}
 
 	fdwform = (Form_pg_foreign_data_wrapper) GETSTRUCT(tp);
 
@@ -91,6 +107,18 @@ GetForeignDataWrapperByName(const char *fdwname, bool missing_ok)
  */
 ForeignServer *
 GetForeignServer(Oid serverid)
+{
+	return GetForeignServerExtended(serverid, false);
+}
+
+
+/*
+ * GetForeignServerExtended - look up the foreign server definition. If
+ * missing_ok is true, return NULL if the object cannot be found instead
+ * of raising an error.
+ */
+ForeignServer *
+GetForeignServerExtended(Oid serverid, bool missing_ok)
 {
 	Form_pg_foreign_server serverform;
 	ForeignServer *server;
@@ -101,7 +129,11 @@ GetForeignServer(Oid serverid)
 	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(serverid));
 
 	if (!HeapTupleIsValid(tp))
-		elog(ERROR, "cache lookup failed for foreign server %u", serverid);
+	{
+		if (!missing_ok)
+			elog(ERROR, "cache lookup failed for foreign server %u", serverid);
+		return NULL;
+	}
 
 	serverform = (Form_pg_foreign_server) GETSTRUCT(tp);
 
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e64d2..5cc89e967c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -70,9 +70,13 @@ typedef struct ForeignTable
 
 
 extern ForeignServer *GetForeignServer(Oid serverid);
+extern ForeignSe

Re: Consistent segfault in complex query

2018-09-13 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Your other idea of forcing initPlan parameters to be evaluated
 >> before we enter the EPQ execution environment is probably more
 >> workable.

 Tom> Concretely, the attached seems to be enough to fix it (though I
 Tom> only tried the simplest case you posted).

If it helps, here is a patch that adds isolation tests to
eval-plan-qual.spec for two test cases (one with CTE, one without).
I've verified that these reproduce the crash, and that they run
successfully with your patch. I can't currently see any more specific
code paths to probe in these tests.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 9bbfdc1b5d..49b3fb3446 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -184,6 +184,37 @@ ta_id  ta_value   tb_row
 1  newTableAValue (1,tableBValue)
 step c2: COMMIT;
 
+starting permutation: updateforcip updateforcip2 c1 c2 read_a
+step updateforcip: 
+	UPDATE table_a SET value = NULL WHERE id = 1;
+
+step updateforcip2: 
+	UPDATE table_a SET value = COALESCE(value, (SELECT text 'newValue')) WHERE id = 1;
+ 
+step c1: COMMIT;
+step updateforcip2: <... completed>
+step c2: COMMIT;
+step read_a: SELECT * FROM table_a ORDER BY id;
+id value  
+
+1  newValue   
+
+starting permutation: updateforcip updateforcip3 c1 c2 read_a
+step updateforcip: 
+	UPDATE table_a SET value = NULL WHERE id = 1;
+
+step updateforcip3: 
+	WITH d(val) AS (SELECT text 'newValue' FROM generate_series(1,1))
+	UPDATE table_a SET value = COALESCE(value, (SELECT val FROM d)) WHERE id = 1;
+ 
+step c1: COMMIT;
+step updateforcip3: <... completed>
+step c2: COMMIT;
+step read_a: SELECT * FROM table_a ORDER BY id;
+id value  
+
+1  newValue   
+
 starting permutation: wrtwcte readwcte c1 c2
 step wrtwcte: UPDATE table_a SET value = 'tableAValue2' WHERE id = 1;
 step readwcte: 
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 0b70ad55ba..367922de75 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -92,6 +92,13 @@ step "updateforss"	{
 	UPDATE table_b SET value = 'newTableBValue' WHERE id = 1;
 }
 
+# these tests exercise EvalPlanQual with conditional InitPlans which
+# have not been executed prior to the EPQ
+
+step "updateforcip"	{
+	UPDATE table_a SET value = NULL WHERE id = 1;
+}
+
 # these tests exercise mark/restore during EPQ recheck, cf bug #15032
 
 step "selectjoinforupdate"	{
@@ -129,6 +136,13 @@ step "readforss"	{
 	FROM table_a ta
 	WHERE ta.id = 1 FOR UPDATE OF ta;
 }
+step "updateforcip2"	{
+	UPDATE table_a SET value = COALESCE(value, (SELECT text 'newValue')) WHERE id = 1;
+}
+step "updateforcip3"	{
+	WITH d(val) AS (SELECT text 'newValue' FROM generate_series(1,1))
+	UPDATE table_a SET value = COALESCE(value, (SELECT val FROM d)) WHERE id = 1;
+}
 step "wrtwcte"	{ UPDATE table_a SET value = 'tableAValue2' WHERE id = 1; }
 step "wrjt"	{ UPDATE jointest SET data = 42 WHERE id = 7; }
 step "c2"	{ COMMIT; }
@@ -137,6 +151,7 @@ session "s3"
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
 step "read"	{ SELECT * FROM accounts ORDER BY accountid; }
 step "read_ext"	{ SELECT * FROM accounts_ext ORDER BY accountid; }
+step "read_a"	{ SELECT * FROM table_a ORDER BY id; }
 
 # this test exercises EvalPlanQual with a CTE, cf bug #14328
 step "readwcte"	{
@@ -171,6 +186,8 @@ permutation "wx2" "partiallock" "c2" "c1" "read"
 permutation "wx2" "lockwithvalues" "c2" "c1" "read"
 permutation "wx2_ext" "partiallock_ext" "c2" "c1" "read_ext"
 permutation "updateforss" "readforss" "c1" "c2"
+permutation "updateforcip" "updateforcip2" "c1" "c2" "read_a"
+permutation "updateforcip" "updateforcip3" "c1" "c2" "read_a"
 permutation "wrtwcte" "readwcte" "c1" "c2"
 permutation "wrjt" "selectjoinforupdate" "c2" "c1"
 permutation "wrtwcte" "multireadwcte" "c1" "c2"


Re: Cache lookup errors with functions manipulation object addresses

2018-09-13 Thread Michael Paquier
On Mon, Jul 02, 2018 at 08:54:25PM +0900, Michael Paquier wrote:
> I am fine with any conclusion.  As the patch has rotten a bit, I
> switched it from "Ready for committer" to "Needs Review" by the way.
> That seems more appropriate as the thing has rotten a bit.

Attached are rebased versions.  This has been around for some time, so I
am planning to move on with this patch set pretty soon as that's mainly
cleanup for getObjectIdentity as it triggers elog("cache lookup") or
such for undefined objects.  Patch 0001 extends FDW and server routines
so as it is possible to skip missing entries, without breaking
compatibility.  Patch 0002 adds a missing_ok flag when doing
subscription and publication lookups.

Any objections?
--
Michael


signature.asc
Description: PGP signature


logical decoding bug when mapped relation with toast contents is rewritten repeatedly

2018-09-13 Thread Andres Freund
Hi,

(Tomas, CCing you because you IIRC mentioned encountered an issue like
this)

I just spent quite a while debugging an issue where running logical
decoding yielded a:
ERROR:  could not map filenode "base/$X/$Y" to relation OID
error.

After discarding like 30 different theories, I have found the cause:

During rewrites (i.e. VACUUM FULL / CLUSTER) of a mapped relation with a
toast table with actual live toasted tuples (pg_proc in my case and
henceforth) heap inserts with the toast data happen into the new toast
relation, triggered by:

static void
raw_heap_insert(RewriteState state, HeapTuple tup)
...
/*
 * If the new tuple is too big for storage or contains already toasted
 * out-of-line attributes from some other relation, invoke the toaster.
 *
 * Note: below this point, heaptup is the data we actually intend to 
store
 * into the relation; tup is the caller's original untoasted data.
 */
if (state->rs_new_rel->rd_rel->relkind == RELKIND_TOASTVALUE)
{
/* toast table entries should never be recursively toasted */
Assert(!HeapTupleHasExternal(tup));
heaptup = tup;
}
else if (HeapTupleHasExternal(tup) || tup->t_len > 
TOAST_TUPLE_THRESHOLD)
heaptup = toast_insert_or_update(state->rs_new_rel, tup, NULL,

 HEAP_INSERT_SKIP_FSM |

 (state->rs_use_wal ?

  0 : HEAP_INSERT_SKIP_WAL));
else
heaptup = tup;


At that point the new toast relation does *NOT* appear to be a system
catalog, it's just appears as an "independent" table.  Therefore we do
not trigger, in heap_insert():

/*
 * RelationIsLogicallyLogged
 *  True if we need to log enough information to extract the data 
from the
 *  WAL stream.
 *
 * We don't log information for unlogged tables (since they don't WAL log
 * anyway) and for system tables (their content is hard to make sense of, and
 * it would complicate decoding slightly for little gain). Note that we *do*
 * log information for user defined catalog tables since they presumably are
 * interesting to the user...
 */
#define RelationIsLogicallyLogged(relation) \
(XLogLogicalInfoActive() && \
 RelationNeedsWAL(relation) && \
 !IsCatalogRelation(relation))

/*
 * For logical decoding, we need the tuple even if we're doing 
a full
 * page write, so make sure it's included even if we take a 
full-page
 * image. (XXX We could alternatively store a pointer into the 
FPW).
 */
if (RelationIsLogicallyLogged(relation))
{
xlrec.flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;
bufflags |= REGBUF_KEEP_DATA;
}

i.e. the inserted toast tuple will be marked as
XLH_INSERT_CONTAINS_NEW_TUPLE - which it shouldn't, because it's a
system table. Which we currently do not allow do be logically decoded.

That normally ends up being harmless, because ReorderBufferCommit() has the
following check:
if 
(!RelationIsLogicallyLogged(relation))
goto change_done;

but to reach that check, we first have to map the relfilenode from the
WAL to the corresponding OID:
reloid = 
RelidByRelfilenode(change->data.tp.relnode.spcNode,

change->data.tp.relnode.relNode);

That works correctly if there's only one rewrite - the relmapper
contains the data for the new toast table.  But if there's been *two*
consecutive rewrites, the relmapper *does not* contain the intermediary
relfilenode of pg_proc.  There's no such problem for non-mapped tables,
because historic snapshots allow us to access the relevant data, but the
relmapper isn't mvcc.

Therefore the catalog-rewrite escape hatch of:
/*
 * Catalog tuple without data, emitted 
while catalog was
 * in the process of being rewritten.
 */
if (reloid == InvalidOid &&
change->data.tp.newtuple == 
NULL &&
change->data.tp.oldtuple == 
NULL)
goto change_done;
does not trigger and we run into:
else if (reloid == InvalidOid)
elog(

Re: stat() on Windows might cause error if target file is larger than 4GB

2018-09-13 Thread Michael Paquier
On Thu, Sep 13, 2018 at 02:23:47PM -0400, Robert Haas wrote:
> This, to me, seems way too clever.  Replacing 'struct stat' with
> something else everywhere in the code is more churn, but far less
> likely to have surprising consequences down the road.  Or so I would
> think, anyway.

I don't have the mind set to work on that today (enough Windows-ism for
the day), but I would rather use the clever solution because, as far as
I know, we want a back-patchable solution so things should not be
invasive.
--
Michael


signature.asc
Description: PGP signature


Re: Allowing printf("%m") only where it actually works

2018-09-13 Thread Michael Paquier
On Wed, Sep 12, 2018 at 01:40:01PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > I would have liked to look at this patch in details, but it failed to
> > apply.  Could you rebase?
> 
> Ah, yeah, the dlopen patch touched a couple of the same places.
> Rebase attached --- no substantive changes.

-   if (handleDLL == NULL)
-   ereport(FATAL,
-   (errmsg_internal("could not load netmsg.dll: error
-code %lu", GetLastError(;

In 0001, this is replaced by a non-FATAL error for the backend, which
does not seem like a good idea to me because the user loses visibility
with this DDL which canot be loaded.  I still have to see this error...

And about 0002.  I am surprised by the amount of cleanup that the
removal of all the special wrappers for %m gives, especially
expand_fmt_string.  So +1 for the concept.  I have been testing both
patches individually on Windows and compilation, as well as tests, do
not show any issues.  The tests have been done only with MSVC.

Could you drop the configure checks for snprintf and vsnprintf in a
separate patch?  The cleanup of %m and the removal of those switches
should be treated separatly in my opinion.
--
Michael


signature.asc
Description: PGP signature


Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-09-13 Thread Amit Langote
On 2018/09/13 23:13, Tom Lane wrote:
> Amit Langote  writes:
>> On 2018/09/13 1:14, Tom Lane wrote:
>>> That seems excessively restrictive.  Anything that has storage (e.g.
>>> matviews) ought to be truncatable, no?
> 
>> Not by heap_truncate it seems.  The header comment of heap_truncate says
>> that it concerns itself only with ON COMMIT truncation of temporary tables:
> 
> Ah.  Well, in that case I'm OK with just a simple test for
> RELKIND_RELATION, but I definitely feel that it should be inside
> heap_truncate.  Callers don't need to know about the limited scope
> of what that does.

I guess you meant inside heap_truncate_one_rel.  I updated the patch that
way, but I wonder if we shouldn't also allow other relkinds that have
storage which RelationTruncate et al can technically deal with.

Thanks,
Amit
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9176f6280b..57df70f7b9 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3195,6 +3195,10 @@ heap_truncate_one_rel(Relation rel)
 {
Oid toastrelid;
 
+   /* Only certain relkinds have storage. */
+   if (rel->rd_rel->relkind != RELKIND_RELATION)
+   return;
+
/* Truncate the actual file (and discard buffers) */
RelationTruncate(rel, 0);
 


Re: Loaded footgun open_datasync on Windows

2018-09-13 Thread Michael Paquier
On Thu, Sep 13, 2018 at 03:55:20PM +0200, Laurenz Albe wrote:
> Attached is the new, improved version of the patch.

Thanks, finally pushed.  I have made sure that recovery TAP tests,
upgrade tests and bin/ tests are working properly.
--
Michael


signature.asc
Description: PGP signature


Re: Indicate anti-wraparound autovacuum in log_autovacuum_min_duration

2018-09-13 Thread Michael Paquier
On Thu, Sep 13, 2018 at 12:00:49PM +0300, Sergei Kornilov wrote:
> Looks better for me. Updated patch attached.

Thanks Sergei for the new version, pushed.
--
Michael


signature.asc
Description: PGP signature


Re: Consistent segfault in complex query

2018-09-13 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Don't think that's going to work; the EPQ environment doesn't have
>  Tom> any way to know that an execPlan link is pointing to something in
>  Tom> a different estate.

> But can't such a way be created? e.g. by pointing execPlan to a special
> proxy node that points back to the original estate?

I don't really think the amount of complexity that would add is something
to consider for a back-patchable fix.

> Well, the case of UPDATE ... SET foo = case when x then (select thing
> from big_cte) else (select thing from other_big_cte) end will be rather
> annoying if we end up forcing both initplans to execute.

Given that this bug has been there since the late bronze age and just now
got detected, I think that optimizing the fix for especially improbable
cases ought not be the first thing on our minds.  Let's just get it to
work reliably.

regards, tom lane



Re: Consistent segfault in complex query

2018-09-13 Thread Tom Lane
I wrote:
> Your other idea of forcing initPlan parameters to be evaluated before we
> enter the EPQ execution environment is probably more workable.

Concretely, the attached seems to be enough to fix it (though I only
tried the simplest case you posted).

I don't find anything to love about ExecEvalParamExecParams: it's badly
named, badly located, full of undocumented assumptions, and probably
causes a memory leak.  Plus it doesn't exist as far back as we need it
for this.  But fixing those problems is a separable task.  In the
meantime, this is an expedient way to test whether this approach can work.

regards, tom lane

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c583e02..35c9eb2 100644
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
***
*** 46,51 
--- 46,52 
  #include "commands/matview.h"
  #include "commands/trigger.h"
  #include "executor/execdebug.h"
+ #include "executor/execExpr.h"
  #include "foreign/fdwapi.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
*** EvalPlanQualBegin(EPQState *epqstate, ES
*** 3078,3083 
--- 3079,3087 
  		{
  			int			i;
  
+ 			/* First, force evaluation of any initPlans needed by subplan */
+ 			ExecEvalParamExecParams(planstate->plan->extParam, parentestate);
+ 
  			i = list_length(parentestate->es_plannedstmt->paramExecTypes);
  
  			while (--i >= 0)
*** EvalPlanQualStart(EPQState *epqstate, ES
*** 3170,3175 
--- 3174,3182 
  	{
  		int			i;
  
+ 		/* First, force evaluation of any initPlans needed by subplan */
+ 		ExecEvalParamExecParams(planTree->extParam, parentestate);
+ 
  		i = list_length(parentestate->es_plannedstmt->paramExecTypes);
  		estate->es_param_exec_vals = (ParamExecData *)
  			palloc0(i * sizeof(ParamExecData));


Re: Consistent segfault in complex query

2018-09-13 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> What I'm wondering is whether the param in the copied estate
 >> shouldn't rather be just a proxy for the one in the original estate
 >> - if we need to evaluate it, then do so in the original estate,
 >> store the value there, and copy the value back into the EPQ
 >> plantree.

 Tom> Don't think that's going to work; the EPQ environment doesn't have
 Tom> any way to know that an execPlan link is pointing to something in
 Tom> a different estate.

But can't such a way be created? e.g. by pointing execPlan to a special
proxy node that points back to the original estate?

 Tom> Your other idea of forcing initPlan parameters to be evaluated
 Tom> before we enter the EPQ execution environment is probably more
 Tom> workable. It would be annoying to do that for every initPlan in
 Tom> sight, but I think we could look at the subplan's extParam to see
 Tom> whether it potentially references that parameter. (Although
 Tom> really, in most scenarios it wouldn't matter because all the
 Tom> initPlans in a data-modifying query are probably referenced in the
 Tom> subplan anyhow ...)

Well, the case of UPDATE ... SET foo = case when x then (select thing
from big_cte) else (select thing from other_big_cte) end will be rather
annoying if we end up forcing both initplans to execute.

-- 
Andrew (irc:RhodiumToad)



Re: Consistent segfault in complex query

2018-09-13 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> The second of those; what we need is for any referenced InitPlans
>  Tom> to be executed afresh under EPQ rules. (I'm not entirely sure that
>  Tom> an InitPlan could need to see different input tuples under EPQ
>  Tom> than it'd see otherwise, but I'm not sure it couldn't, either.)

> Shouldn't the InitPlan pretty much by definition be independent of the
> tuples being locked/updated?

[ thinks for awhile... ]  Yeah, I wasn't thinking clearly enough.  The
point of the special EPQ rules is that, other than the target row-being-
updated, any tuples from other tables should be the *same* tuples we'd
joined that row to before EPQ.  The logical extension of that to InitPlans
is that it should be the same InitPlan output as before, not potentially
a different value ...

> And doesn't executing them again run the risk of getting a different
> value for other reasons, for example if an initplan is volatile?

... and that's another good argument for not doing the initplan over.

> What I'm wondering is whether the param in the copied estate shouldn't
> rather be just a proxy for the one in the original estate - if we need
> to evaluate it, then do so in the original estate, store the value
> there, and copy the value back into the EPQ plantree.

Don't think that's going to work; the EPQ environment doesn't have any way
to know that an execPlan link is pointing to something in a different
estate.

Your other idea of forcing initPlan parameters to be evaluated before we
enter the EPQ execution environment is probably more workable.  It would
be annoying to do that for every initPlan in sight, but I think we could
look at the subplan's extParam to see whether it potentially references
that parameter.  (Although really, in most scenarios it wouldn't matter
because all the initPlans in a data-modifying query are probably
referenced in the subplan anyhow ...)

regards, tom lane



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-13 Thread Andres Freund
On 2018-09-04 17:51:30 -0700, Andres Freund wrote:
> My current proposal is thus to do add a check that does
> #if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
> something-that-fails
> #endif
> in an autoconf test, and have configure complain if that
> fails. Something roughly along the lines of
> "Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use 
> -msse2 or use gcc."

Here's a patch along those lines.

Greetings,

Andres Freund
>From 4a457a5b907d1390c133097e9081ecf6fb1d30ef Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 13 Sep 2018 14:18:43 -0700
Subject: [PATCH] Error out for clang on x86-32 without SSE2 support, no
 -fexcess-precision.

As clang currently doesn't support -fexcess-precision=standard,
compiling x86-32 code with SSE2 disabled, can lead to problems with
floating point overflow checks and the like.

This issue was noticed because clang, on at least some BSDs, defaults
to i386 compatibility, whereas it defaults to pentium4 on Linux.  Our
forced usage of __builtin_isinf() lead to some overflow checks not
triggering when compiling for i386, e.g. when the result of the
calculation didn't overflow in 80bit registers, but did so in 64bit.

While we could just fall back to a non-builtin isinf, it seems likely
that the use of 80bit registers leads to other problems (which is why
we force the flag for GCC already).  Therefore error out when
detecting clang in that situation.

Reported-By: Victor Wagner
Analyzed-By: Andrew Gierth and Andres Freund
Author: Andres Freund
Discussion: https://postgr.es/m/20180905005130.ewk4xcs5dgyzc...@alap3.anarazel.de
Backpatch: 9.3-, all supported versions are affected
---
 configure| 33 +
 configure.in | 18 ++
 2 files changed, 51 insertions(+)

diff --git a/configure b/configure
index c6a44a9078a..9b304023d3d 100755
--- a/configure
+++ b/configure
@@ -7023,6 +7023,39 @@ fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 fi
 
+# Defend against clang being used on x86-32 without SSE2 enabled.  As current
+# versions of clang do not understand -fexcess-precision=standard, the use of
+# x87 floating point operations leads to problems like isinf possibly returning
+# false for a value that is infinite when converted from the 80bit register to
+# the 8byte memory representation.
+#
+# Only perform the test if the compiler doesn't understand
+# -fexcess-precision=standard, that way a potentially fixed compiler will work
+# automatically.
+if test "$pgac_cv_prog_CC_cflags__fexcess_precision_standard" = no; then
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+#if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
+choke me
+#endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+
+else
+  as_fn_error $? "Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use -msse2 or use gcc." "$LINENO" 5
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+
 ac_ext=c
 ac_cpp='$CPP $CPPFLAGS'
 ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
diff --git a/configure.in b/configure.in
index 3ada48b5f95..2e60a89502c 100644
--- a/configure.in
+++ b/configure.in
@@ -624,6 +624,24 @@ choke me
 @%:@endif])], [], [AC_MSG_ERROR([do not put -ffast-math in CFLAGS])])
 fi
 
+# Defend against clang being used on x86-32 without SSE2 enabled.  As current
+# versions of clang do not understand -fexcess-precision=standard, the use of
+# x87 floating point operations leads to problems like isinf possibly returning
+# false for a value that is infinite when converted from the 80bit register to
+# the 8byte memory representation.
+#
+# Only perform the test if the compiler doesn't understand
+# -fexcess-precision=standard, that way a potentially fixed compiler will work
+# automatically.
+if test "$pgac_cv_prog_CC_cflags__fexcess_precision_standard" = no; then
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [
+@%:@if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
+choke me
+@%:@endif
+])], [],
+[AC_MSG_ERROR([Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use -msse2 or use gcc.])])
+fi
+
 AC_PROG_CPP
 AC_SUBST(GCC)
 
-- 
2.18.0.rc2.dirty



Re: Consistent segfault in complex query

2018-09-13 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> So I can see exactly where the problem is, but I'm not sure what the
 >> solution should be.

 >> EvalPlanQualStart copies the param_exec value list explicitly _not_
 >> including the execPlan link, which obviously isn't going to work if
 >> the value has not been computed yet. Should it be forcing the
 >> evaluation of initplans that haven't been run yet, or should the EPQ
 >> scan evaluate them itself from a copy of the plan, or does there
 >> need to be some way to share state? (having the InitPlan be run more
 >> than once might be a problem?)

 Tom> The second of those; what we need is for any referenced InitPlans
 Tom> to be executed afresh under EPQ rules. (I'm not entirely sure that
 Tom> an InitPlan could need to see different input tuples under EPQ
 Tom> than it'd see otherwise, but I'm not sure it couldn't, either.)

Obviously you know this code better than I do... but I'm not convinced.

Shouldn't the InitPlan pretty much by definition be independent of the
tuples being locked/updated?

And doesn't executing them again run the risk of getting a different
value for other reasons, for example if an initplan is volatile?

What I'm wondering is whether the param in the copied estate shouldn't
rather be just a proxy for the one in the original estate - if we need
to evaluate it, then do so in the original estate, store the value
there, and copy the value back into the EPQ plantree.

-- 
Andrew (irc:RhodiumToad)



Re: pg_dump test instability

2018-09-13 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> I see.  Why not shift all items up to the i'th up by one place, instead
>> of moving only the first one?  That way the sortedness would be
>> preserved.  Otherwise we'd move the first one into the middle, then
>> sorting would move it to the front again, etc.

> Hmmm ... might be worth doing, but I'm not sure.  The steady-state cycle
> will probably be that after one task has been dispatched, we'll sleep
> until some task finishes and then that will unblock some pending items,
> resulting in new entries at the end of the list, forcing a sort anyway
> before we next dispatch a task.  So I was expecting that avoiding a sort
> here wasn't really going to be worth expending much effort for.  But my
> intuition about that could be wrong.  I'll run a test case with some
> instrumentation added and see how often we could avoid sorts by
> memmove'ing.

OK, my intuition was faulty.  At least when testing with the regression
database, situations where we are taking the slow path at all seem to
involve several interrelated dump objects (eg indexes of a table) that
are all waiting for the same lock, such that we may be able to dispatch a
number of unrelated tasks before anything gets added from the pending
list.  Doing it as you suggest eliminates a significant fraction of
the re-sort operations.

Attached updated patch does it like that and makes the cosmetic
adjustments you suggested.   I also went ahead and did the renaming
of par_prev/par_next/par_list_xxx that I'd suggested upthread.
I think this is committable ...

regards, tom lane

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 42cf441..ba79821 100644
*** a/src/bin/pg_dump/pg_backup.h
--- b/src/bin/pg_dump/pg_backup.h
*** extern void ConnectDatabase(Archive *AH,
*** 252,269 
  extern void DisconnectDatabase(Archive *AHX);
  extern PGconn *GetConnection(Archive *AHX);
  
- /* Called to add a TOC entry */
- extern void ArchiveEntry(Archive *AHX,
- 			 CatalogId catalogId, DumpId dumpId,
- 			 const char *tag,
- 			 const char *namespace, const char *tablespace,
- 			 const char *owner, bool withOids,
- 			 const char *desc, teSection section,
- 			 const char *defn,
- 			 const char *dropStmt, const char *copyStmt,
- 			 const DumpId *deps, int nDeps,
- 			 DataDumperPtr dumpFn, void *dumpArg);
- 
  /* Called to write *data* to the archive */
  extern void WriteData(Archive *AH, const void *data, size_t dLen);
  
--- 252,257 
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 36e3383..3f7a658 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
*** typedef struct _outputContext
*** 49,54 
--- 49,72 
  	int			gzOut;
  } OutputContext;
  
+ /*
+  * State for tracking TocEntrys that are ready to process during a parallel
+  * restore.  (This used to be a list, and we still call it that, though now
+  * it's really an array so that we can apply qsort to it.)
+  *
+  * tes[] is sized large enough that we can't overrun it.
+  * The valid entries are indexed first_te .. last_te inclusive.
+  * We periodically sort the array to bring larger-by-dataLength entries to
+  * the front; "sorted" is true if the valid entries are known sorted.
+  */
+ typedef struct _parallelReadyList
+ {
+ 	TocEntry  **tes;			/* Ready-to-dump TocEntrys */
+ 	int			first_te;		/* index of first valid entry in tes[] */
+ 	int			last_te;		/* index of last valid entry in tes[] */
+ 	bool		sorted;			/* are valid entries currently sorted? */
+ } ParallelReadyList;
+ 
  /* translator: this is a module name */
  static const char *modulename = gettext_noop("archiver");
  
*** static void restore_toc_entries_parallel
*** 95,107 
  			 TocEntry *pending_list);
  static void restore_toc_entries_postfork(ArchiveHandle *AH,
  			 TocEntry *pending_list);
! static void par_list_header_init(TocEntry *l);
! static void par_list_append(TocEntry *l, TocEntry *te);
! static void par_list_remove(TocEntry *te);
! static void move_to_ready_list(TocEntry *pending_list, TocEntry *ready_list,
     RestorePass pass);
! static TocEntry *get_next_work_item(ArchiveHandle *AH,
!    TocEntry *ready_list,
     ParallelState *pstate);
  static void mark_dump_job_done(ArchiveHandle *AH,
     TocEntry *te,
--- 113,132 
  			 TocEntry *pending_list);
  static void restore_toc_entries_postfork(ArchiveHandle *AH,
  			 TocEntry *pending_list);
! static void pending_list_header_init(TocEntry *l);
! static void pending_list_append(TocEntry *l, TocEntry *te);
! static void pending_list_remove(TocEntry *te);
! static void ready_list_init(ParallelReadyList *ready_list, int tocCount);
! static void ready_list_free(ParallelReadyList *ready_list);
! static void ready_list_insert(ParallelReadyList *ready_list, TocEntry *te);
! static void ready_list_remove(ParallelReadyList *

Re: patch to allow disable of WAL recycling

2018-09-13 Thread Jerry Jelinek
Hi Peter,

I'll take a look at that. I had been trying to keep the patch as minimal as
possible, but I'm happy to work through this.

Thanks,
Jerry


On Tue, Sep 11, 2018 at 9:39 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 10/09/2018 16:10, Jerry Jelinek wrote:
> > Thank you again for running all of these tests on your various hardware
> > configurations. I was not aware of the convention that the commented
> > example in the config file is expected to match the default value, so I
> > was actually trying to show what to use if you didn't want the default,
> > but I am happy to update the patch so the comment matches the default.
> > Beyond that, I am unsure what the next steps are for this proposal.
>
> Could you organize the code so that the block below
>
> /*
>  * Initialize info about where to try to recycle to.
>  */
>
> isn't executed if recycling is off, since we don't need it.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Index Skip Scan

2018-09-13 Thread Alexander Kuzmenkov

El 13/09/18 a las 18:39, Jesper Pedersen escribió:


Yes, this doesn't look good. Using your test case I'm seeing that 
unique is being chosen when the group size is below 34, and skip 
above. This is with the standard initdb configuration; did you change 
something else ? Or did you force the default plan ?


Sorry I didn't mention this, the first column is indeed forced skip 
scan, just to see how it compares to index scan.


This is something to look at -- maybe there is a way to use 
btpo_next/btpo_prev instead/too in order to speed things up. Atm we 
just have the scan key in BTScanOpaqueData. I'll take a look after my 
upcoming vacation; feel free to contribute those changes in the 
meantime of course.




I probably won't be able to contribute the changes, but I'll try to 
review them.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [Patch] Create a new session in postmaster by calling setsid()

2018-09-13 Thread Andres Freund
Hi,

On 2018-09-13 15:27:58 +0800, Paul Guo wrote:
> From the respective of users instead of developers, I think for such
> important
> service, tty should be dissociated, i.e. postmaster should be daemonized by
> default (and even should have default log filename?) or be setsid()-ed at
> least.

I don't think it'd be good to switch to postgres daemonizing itself.  If
we were starting fresh, maybe, but there's plenty people invoking
postgres from scripts etc.  And tools like systemd don't actually want
daemons to fork away, so there's no clear need from that side either.


> I'm not sure the several-years-ago LINUX_OOM_ADJ issue (to resolve that,
> silient_mode was removed) still exists or not. I don't have context  about
> that, so
> I conceded to use setsid() even I prefer the deleted silent_mode code.

Yes, the OOM issues are still relevant.

Greetings,

Andres Freund



Re: Consistent segfault in complex query

2018-09-13 Thread Tom Lane
Andrew Gierth  writes:
> So I can see exactly where the problem is, but I'm not sure what the
> solution should be.

> EvalPlanQualStart copies the param_exec value list explicitly _not_
> including the execPlan link, which obviously isn't going to work if the
> value has not been computed yet. Should it be forcing the evaluation of
> initplans that haven't been run yet, or should the EPQ scan evaluate
> them itself from a copy of the plan, or does there need to be some way
> to share state? (having the InitPlan be run more than once might be a
> problem?)

The second of those; what we need is for any referenced InitPlans to be
executed afresh under EPQ rules.  (I'm not entirely sure that an InitPlan
could need to see different input tuples under EPQ than it'd see
otherwise, but I'm not sure it couldn't, either.)  Also, copying the
execPlan links would be bad because it'd allow EPQ to execute planstate
subtrees that are outside the portion of the planstate tree that it made
a working copy of, which doesn't seem safe (e.g., the planstates could
easily have dependencies on the particular EState they are children of).

I think that the expectation of this code was that empty execPlan
links would be filled at need during EvalPlanQualStart's ExecInitNode
calls.  That's not happening because the InitPlan we're concerned about
is not attached to any plan node in the part of the tree that we copied;
it's attached to the ModifyTable node itself.  We could fix that for
the particular scenario we're looking at here, perhaps by having such
initplans be initialized the same way EvalPlanQualStart treats subplans.
I'm worried though about whether any referenceable initplans might be
attached even higher in the plan tree.  If we can ensure that the planner
will never do that, then a fix along these lines should be fairly
straightforward.

regards, tom lane



Re: [HACKERS] Bug in to_timestamp().

2018-09-13 Thread Tom Lane
Alexander Korotkov  writes:
> I've closed commitfest entry.  I think we can add new commitfest entry if
> required.  Regarding FX part, it easy to extract it as separate patch, but
> it's hard to find consensus.  I think there are at least three possible
> decisions.

> 1) Change FX mode to require separators to be the same.
> 2) Leave FX mode "as is".
> 3) Introduce GUC variable controlling behavior of FX mode.

> Any thoughts?

A GUC variable is a horrid solution.  Years ago we thought it'd be OK
to have GUCs that change query behavior, but we've regretted it every
time we did that, and often removed them again later (e.g. regex_flavor,
sql_inheritance).  Applications that want to be portable have to contend
with all possible values of the GUC, and that's no fun for anybody.

Given the lack of consensus, it's hard to make a case for breaking
backwards compatibility, so I'd have to vote for option 2.

regards, tom lane



Re: stat() on Windows might cause error if target file is larger than 4GB

2018-09-13 Thread Robert Haas
On Thu, Sep 13, 2018 at 10:29 AM, Tom Lane  wrote:
> What I was vaguely imagining is that win32_port.h could #include
> whichever Windows header defines these functions and structs, and
> then do
>
> #define stat __stat64
>
> static inline ... __stat64(...) { return _stat64(...); }
>
> What would need testing is whether the #define has nasty side-effects
> even if we've already included the system header.  I don't think it'd
> hurt, eg, local variables named "stat"; though people might be surprised
> when examining things in a debugger.

This, to me, seems way too clever.  Replacing 'struct stat' with
something else everywhere in the code is more churn, but far less
likely to have surprising consequences down the road.  Or so I would
think, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-13 Thread Alexander Korotkov
On Tue, Sep 11, 2018 at 6:18 PM Arthur Zakirov 
wrote:

> On Sun, Sep 09, 2018 at 09:52:46PM +0300, Alexander Korotkov wrote:
> > So, pushed!  Thanks to every thread participant for review and feedback.
>
> Great! Should we close the commitfest entry? There is FX part of the
> patch though. But it seems that nobody is happy with it. It could be
> done with a separate patch anyway.
>

I've closed commitfest entry.  I think we can add new commitfest entry if
required.  Regarding FX part, it easy to extract it as separate patch, but
it's hard to find consensus.  I think there are at least three possible
decisions.

1) Change FX mode to require separators to be the same.
2) Leave FX mode "as is".
3) Introduce GUC variable controlling behavior of FX mode.

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-09-13 Thread Andrew Gierth
Here's what I have queued up to push.

My changes are:

 - added to the header comment of list_concat_unique that callers have
   ordering expectations. Didn't touch list_union, since I ended up
   sticking with list_concat_unique for this patch.

 - WindowClauseSortNode renamed WindowClauseSortData

 - added and rewrote some comments

 - tidied up some casting in common_prefix_cmp

 - pgindent and some layout tweaks

-- 
Andrew (irc:RhodiumToad)

>From 9c89883ffe2153cc9d047f71a2b0e611f2c60452 Mon Sep 17 00:00:00 2001
From: Andrew Gierth 
Date: Thu, 13 Sep 2018 18:12:37 +0100
Subject: [PATCH] Order active window clauses for greater reuse of Sort nodes.

By sorting the active window list lexicographically by the sort clause
list but putting longer clauses before shorter prefixes, we generate
more chances to elide Sort nodes when building the path.

Author: Daniel Gustafsson (with some editorialization by me)
Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se
---
 src/backend/nodes/list.c |   7 +-
 src/backend/optimizer/plan/planner.c | 154 +--
 src/test/regress/expected/window.out |  60 +++---
 src/test/regress/sql/window.sql  |  16 
 4 files changed, 177 insertions(+), 60 deletions(-)

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f3e1800708..55fd4c359b 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1011,8 +1011,11 @@ list_append_unique_oid(List *list, Oid datum)
  * via equal().
  *
  * This is almost the same functionality as list_union(), but list1 is
- * modified in-place rather than being copied.  Note also that list2's cells
- * are not inserted in list1, so the analogy to list_concat() isn't perfect.
+ * modified in-place rather than being copied. However, callers of this
+ * function may have strict ordering expectations -- i.e. that the relative
+ * order of those list2 elements that are not duplicates is preserved. Note
+ * also that list2's cells are not inserted in list1, so the analogy to
+ * list_concat() isn't perfect.
  */
 List *
 list_concat_unique(List *list1, List *list2)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 96bf0601a8..94b85721fa 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -110,6 +110,17 @@ typedef struct
 	int		   *tleref_to_colnum_map;
 } grouping_sets_data;
 
+/*
+ * Temporary structure for use during WindowClause reordering in order to be
+ * be able to sort WindowClauses on partitioning/ordering prefix.
+ */
+typedef struct
+{
+	WindowClause *wc;
+	List	   *uniqueOrder;	/* A List of unique ordering/partitioning
+ * clauses per Window */
+} WindowClauseSortData;
+
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -237,6 +248,7 @@ static void create_partitionwise_grouping_paths(PlannerInfo *root,
 static bool group_by_has_partkey(RelOptInfo *input_rel,
 	 List *targetList,
 	 List *groupClause);
+static int	common_prefix_cmp(const void *a, const void *b);
 
 
 /*
@@ -5260,68 +5272,120 @@ postprocess_setop_tlist(List *new_tlist, List *orig_tlist)
 static List *
 select_active_windows(PlannerInfo *root, WindowFuncLists *wflists)
 {
-	List	   *result;
-	List	   *actives;
+	List	   *windowClause = root->parse->windowClause;
+	List	   *result = NIL;
 	ListCell   *lc;
+	int			nActive = 0;
+	WindowClauseSortData *actives = palloc(sizeof(WindowClauseSortData)
+		   * list_length(windowClause));
 
-	/* First, make a list of the active windows */
-	actives = NIL;
-	foreach(lc, root->parse->windowClause)
+	/* First, construct an array of the active windows */
+	foreach(lc, windowClause)
 	{
 		WindowClause *wc = lfirst_node(WindowClause, lc);
 
 		/* It's only active if wflists shows some related WindowFuncs */
 		Assert(wc->winref <= wflists->maxWinRef);
-		if (wflists->windowFuncs[wc->winref] != NIL)
-			actives = lappend(actives, wc);
+		if (wflists->windowFuncs[wc->winref] == NIL)
+			continue;
+
+		actives[nActive].wc = wc;	/* original clause */
+
+		/*
+		 * For sorting, we want the list of partition keys followed by the
+		 * list of sort keys. But pathkeys construction will remove duplicates
+		 * between the two, so we can as well (even though we can't detect all
+		 * of the duplicates, since some may come from ECs - that might mean
+		 * we miss optimization chances here). We must, however, ensure that
+		 * the order of entries is preserved with respect to the ones we do
+		 * keep.
+		 *
+		 * partitionClause and orderClause had their own duplicates removed in
+		 * parse analysis, so we're only concerned here with removing
+		 * orderClause entries tha

Re: [patch] Support LLVM 7

2018-09-13 Thread Andres Freund
On 2018-09-12 23:07:34 +0200, Christoph Berg wrote:
> Re: Andres Freund 2018-09-12 
> <20180912210338.h3vsss5lkuu26...@alap3.anarazel.de>
> > Hi,
> > 
> > On 2018-09-12 14:45:17 +0200, Christoph Berg wrote:
> > > LLVM 7 landed in Debian unstable, this patch teaches ./configure to use
> > > it. (General patch, not specific to Debian.)
> > 
> > Thanks.  Yes, I think we should do that, especially because my patches
> > to add proper debugging and profiling support only landed in LLVM
> > 7. Therefore I'm planning to add this to both v11 and master.   Unless
> > somebody protests?
> 
> I plan to switch postgresql-11.deb to LLVM 7 over the next days
> because of the support for non-x86 architectures, so this should
> definitely land in 11.

Pushed, thanks for the patch!

Greetings,

Andres Freund



Re: cache lookup failed for constraint when alter table referred by partition table

2018-09-13 Thread Alvaro Herrera
On 2018-Sep-13, Tom Lane wrote:

> Alvaro Herrera  writes:
> > That's the problem all right.  The solution is to drop all
> > index/constraint objects together in one performMultipleDeletions()
> > instead of performDeletion() one by one, as in the attached patch.
> 
> Looks reasonable as far as it goes.  Given that we no longer require
> any of this:
> 
> -  * Now we can drop the existing constraints and indexes --- constraints
> -  * first, since some of them might depend on the indexes.  In fact, we
> -  * have to delete FOREIGN KEY constraints before UNIQUE constraints, but
> -  * we already ordered the constraint list to ensure that would happen.
> 
> can we make any simplifications in earlier steps?  At the very least,
> look for comments related to this assumption.

Ah, I had looked, but not hard enough.  In this new version I removed
some code in ATExecAlterColumnType that's now irrelevant.  I tested this
by changing both lappend calls to lcons in that function; seems to
behave the same.  (Also added more constraints to the test case.)

Another thing I found I can change is to move the add_object_address()
calls to the other loops scanning the same lists, so that we don't have
to walk each the two lists twice.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 9a9fe65eeb0bc3074793474b9d85b10c3e260e78 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 13 Sep 2018 13:26:18 -0300
Subject: [PATCH v2] fix ALTER TYPE

---
 src/backend/commands/tablecmds.c  | 71 +++
 src/test/regress/expected/foreign_key.out | 12 ++
 src/test/regress/sql/foreign_key.sql  | 11 +
 3 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e96512e051..03c0b739b3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9527,33 +9527,12 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation 
rel,
{
char   *defstring = 
pg_get_constraintdef_command(foundObject.objectId);
 
-   /*
-* Put NORMAL dependencies at the front 
of the list and
-* AUTO dependencies at the back.  This 
makes sure that
-* foreign-key constraints depending on 
this column will
-* be dropped before unique or 
primary-key constraints of
-* the column; which we must have 
because the FK
-* constraints depend on the indexes 
belonging to the
-* unique constraints.
-*/
-   if (foundDep->deptype == 
DEPENDENCY_NORMAL)
-   {
-   tab->changedConstraintOids =
-   
lcons_oid(foundObject.objectId,
- 
tab->changedConstraintOids);
-   tab->changedConstraintDefs =
-   lcons(defstring,
- 
tab->changedConstraintDefs);
-   }
-   else
-   {
-   tab->changedConstraintOids =
-   
lappend_oid(tab->changedConstraintOids,
-   
foundObject.objectId);
-   tab->changedConstraintDefs =
-   
lappend(tab->changedConstraintDefs,
-   
defstring);
-   }
+   tab->changedConstraintOids =
+   
lappend_oid(tab->changedConstraintOids,
+   
foundObject.objectId);
+   tab->changedConstraintDefs =
+   
lappend(tab->changedConstraintDefs,
+   defstring);
}
break;
 
@@ -9893,10 +9872,18 @@ static void
 ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
 {
Objec

PostgreSQL 11 {Beta 4, RC1} Release: 2018-09-20

2018-09-13 Thread Jonathan S. Katz
Hi,

We are planning to have another release of PostgreSQL 11, either Beta 4
or RC1, next week on Thursday, 2018-09-20. The version will be
determined based on the state of the open items list[1] around the time
of stamping.

Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items#Open_Issues



signature.asc
Description: OpenPGP digital signature


Re: cache lookup failed for constraint when alter table referred by partition table

2018-09-13 Thread Tom Lane
Alvaro Herrera  writes:
> That's the problem all right.  The solution is to drop all
> index/constraint objects together in one performMultipleDeletions()
> instead of performDeletion() one by one, as in the attached patch.

Looks reasonable as far as it goes.  Given that we no longer require
any of this:

-* Now we can drop the existing constraints and indexes --- constraints
-* first, since some of them might depend on the indexes.  In fact, we
-* have to delete FOREIGN KEY constraints before UNIQUE constraints, but
-* we already ordered the constraint list to ensure that would happen.

can we make any simplifications in earlier steps?  At the very least,
look for comments related to this assumption.

regards, tom lane



Re: cache lookup failed for constraint when alter table referred by partition table

2018-09-13 Thread Alvaro Herrera
On 2018-Sep-10, Alvaro Herrera wrote:

> ATPostAlterTypeCleanup is trying to search the original constraint by
> OID in order to drop it, but it's not there -- I suppose it has already
> been dropped by recursion in a previous step.

That's the problem all right.  The solution is to drop all
index/constraint objects together in one performMultipleDeletions()
instead of performDeletion() one by one, as in the attached patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 4e1ec5fcb396e3c71fc399c986d9bbd9610d7849 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 13 Sep 2018 13:26:18 -0300
Subject: [PATCH] fix ALTER TYPE

---
 src/backend/commands/tablecmds.c  | 28 ++--
 src/test/regress/expected/foreign_key.out | 12 
 src/test/regress/sql/foreign_key.sql  | 11 +++
 3 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e96512e051..f8c5d71ccf 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9893,6 +9893,7 @@ static void
 ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
 {
ObjectAddress obj;
+   ObjectAddresses *objects;
ListCell   *def_item;
ListCell   *oid_item;
 
@@ -9963,29 +9964,28 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo 
*tab, LOCKMODE lockmode)
}
 
/*
-* Now we can drop the existing constraints and indexes --- constraints
-* first, since some of them might depend on the indexes.  In fact, we
-* have to delete FOREIGN KEY constraints before UNIQUE constraints, but
-* we already ordered the constraint list to ensure that would happen. 
It
-* should be okay to use DROP_RESTRICT here, since nothing else should 
be
-* depending on these objects.
+* Now we can drop the existing constraints and indexes. Do them all in 
a
+* single call, so that we don't have to worry about dependencies among
+* them.  It should be okay to use DROP_RESTRICT here, since nothing 
else
+* should be depending on these objects.
 */
+   objects = new_object_addresses();
foreach(oid_item, tab->changedConstraintOids)
{
-   obj.classId = ConstraintRelationId;
-   obj.objectId = lfirst_oid(oid_item);
-   obj.objectSubId = 0;
-   performDeletion(&obj, DROP_RESTRICT, PERFORM_DELETION_INTERNAL);
+   ObjectAddressSet(obj, ConstraintRelationId, 
lfirst_oid(oid_item));
+   add_exact_object_address(&obj, objects);
}
 
foreach(oid_item, tab->changedIndexOids)
{
-   obj.classId = RelationRelationId;
-   obj.objectId = lfirst_oid(oid_item);
-   obj.objectSubId = 0;
-   performDeletion(&obj, DROP_RESTRICT, PERFORM_DELETION_INTERNAL);
+   ObjectAddressSet(obj, RelationRelationId, lfirst_oid(oid_item));
+   add_exact_object_address(&obj, objects);
}
 
+   performMultipleDeletions(objects, DROP_RESTRICT, 
PERFORM_DELETION_INTERNAL);
+
+   free_object_addresses(objects);
+
/*
 * The objects will get recreated during subsequent passes over the work
 * queue.
diff --git a/src/test/regress/expected/foreign_key.out 
b/src/test/regress/expected/foreign_key.out
index b90c4926e2..1854867381 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -1518,6 +1518,18 @@ DETAIL:  Key (a, b)=(2500, 2502) is still referenced 
from table "fk_partitioned_
 ALTER TABLE fk_partitioned_fk DROP CONSTRAINT fk_partitioned_fk_a_fkey;
 -- done.
 DROP TABLE fk_notpartitioned_pk, fk_partitioned_fk;
+-- Altering a type referenced by a foreign key needs to drop/recreate the FK.
+-- Ensure that works.
+CREATE TABLE fk_notpartitioned_pk (a INT, PRIMARY KEY(a));
+CREATE TABLE fk_partitioned_fk (a INT REFERENCES fk_notpartitioned_pk(a)) 
PARTITION BY RANGE(a);
+CREATE TABLE fk_partitioned_fk_1 PARTITION OF fk_partitioned_fk FOR VALUES 
FROM (MINVALUE) TO (MAXVALUE);
+INSERT INTO fk_notpartitioned_pk VALUES (1);
+INSERT INTO fk_partitioned_fk VALUES (1);
+ALTER TABLE fk_notpartitioned_pk ALTER COLUMN a TYPE bigint;
+DELETE FROM fk_notpartitioned_pk WHERE a = 1;
+ERROR:  update or delete on table "fk_notpartitioned_pk" violates foreign key 
constraint "fk_partitioned_fk_a_fkey" on table "fk_partitioned_fk"
+DETAIL:  Key (a)=(1) is still referenced from table "fk_partitioned_fk".
+DROP TABLE fk_notpartitioned_pk, fk_partitioned_fk;
 -- Test some other exotic foreign key features: MATCH SIMPLE, ON UPDATE/DELETE
 -- actions
 CREATE TABLE fk_notpartitioned_pk (a int, b int, primary key (a, b));
diff --git a/src/test/regress/sql/foreign_key.sql 
b/src/test/regress/sq

Re: Index Skip Scan

2018-09-13 Thread Jesper Pedersen

Hi Alexander.

On 9/13/18 9:01 AM, Alexander Kuzmenkov wrote:

While testing this patch


Thanks for the review !

I noticed that current implementation doesn't 
perform well when we have lots of small groups of equal values. Here is 
the execution time of index skip scan vs unique over index scan, in ms, 
depending on the size of group. The benchmark script is attached.


group size    skip        unique
1     2,293.85    132.55
5             464.40      106.59
10            239.61      102.02
50            56.59       98.74
100           32.56       103.04
500           6.08        97.09



Yes, this doesn't look good. Using your test case I'm seeing that unique 
is being chosen when the group size is below 34, and skip above. This is 
with the standard initdb configuration; did you change something else ? 
Or did you force the default plan ?


So, the current implementation can lead to performance regression, and 
the choice of the plan depends on the notoriously unreliable ndistinct 
statistics. 


Yes, Peter mentioned this, which I'm still looking at.

The regression is probably because skip scan always does 
_bt_search to find the next unique tuple. 


Very likely.

I think we can improve this, 
and the skip scan can be strictly faster than index scan regardless of 
the data. As a first approximation, imagine that we somehow skipped 
equal tuples inside _bt_next instead of sending them to the parent 
Unique node. This would already be marginally faster than Unique + Index 
scan. A more practical implementation would be to remember our position 
in tree (that is, BTStack returned by _bt_search) and use it to skip 
pages in bulk. This looks straightforward to implement for a tree that 
does not change, but I'm not sure how to make it work with concurrent 
modifications. Still, this looks a worthwhile direction to me, because 
if we have a strictly faster skip scan, we can just use it always and 
not worry about our unreliable statistics. What do you think?




This is something to look at -- maybe there is a way to use 
btpo_next/btpo_prev instead/too in order to speed things up. Atm we just 
have the scan key in BTScanOpaqueData. I'll take a look after my 
upcoming vacation; feel free to contribute those changes in the meantime 
of course.


Thanks again !

Best regards,
 Jesper



Re: speeding up planning with partitions

2018-09-13 Thread Dilip Kumar
On Tue, Sep 4, 2018 at 12:44 PM, Amit Langote
 wrote:
> Hi Dilip,
>
> Thanks for taking a look.
>
> On 2018/09/03 20:57, Dilip Kumar wrote:
>> The idea looks interesting while going through the patch I observed
>> this comment.
>>
>> /*
>>  * inheritance_planner
>>  *   Generate Paths in the case where the result relation is an
>>  *   inheritance set.
>>  *
>>  * We have to handle this case differently from cases where a source relation
>>  * is an inheritance set. Source inheritance is expanded at the bottom of the
>>  * plan tree (see allpaths.c), but target inheritance has to be expanded at
>>  * the top.
>>
>> I think with your patch these comments needs to be change?
>
> Yes, maybe a good idea to mention that partitioned table result relations
> are not handled here.
>
> Actually, I've been wondering if this patch (0001) shouldn't get rid of
> inheritance_planner altogether and implement the new approach for *all*
> inheritance sets, not just partitioned tables, but haven't spent much time
> on that idea yet.

That will be interesting.

>
>>   if (parse->resultRelation &&
>> - rt_fetch(parse->resultRelation, parse->rtable)->inh)
>> + rt_fetch(parse->resultRelation, parse->rtable)->inh &&
>> + rt_fetch(parse->resultRelation, parse->rtable)->relkind !=
>> + RELKIND_PARTITIONED_TABLE)
>>   inheritance_planner(root);
>>   else
>>   grouping_planner(root, false, tuple_fraction);
>>
>> I think we can add some comments to explain if the target rel itself
>> is partitioned rel then why
>> we can directly go to the grouping planner.
>
> Okay, I will try to add more explanatory comments here and there in the
> next version I will post later this week.

Okay.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: stat() on Windows might cause error if target file is larger than 4GB

2018-09-13 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Sep 12, 2018 at 08:17:05PM -0400, Tom Lane wrote:
>> And we can't just "#define stat __stat64" because
>> that would break the calls to stat().  Or wait, maybe we could do that
>> and also have a one-liner function named __stat64 that would catch the
>> calls and redirect to _stat64?

> I don't think I grab your point here.  "#define stat __stat64" cannot
> exist from the start so how would you do that?

What I was vaguely imagining is that win32_port.h could #include
whichever Windows header defines these functions and structs, and
then do

#define stat __stat64

static inline ... __stat64(...) { return _stat64(...); }

What would need testing is whether the #define has nasty side-effects
even if we've already included the system header.  I don't think it'd
hurt, eg, local variables named "stat"; though people might be surprised
when examining things in a debugger.

regards, tom lane



Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-09-13 Thread Tom Lane
Andrew Gierth  writes:
> (aside: I itch to rewrite the comment that says "the spec requires that
> there be only one sort" - number of sorts is an implementation detail
> about which the spec is silent, what it _actually_ requires is that peer
> rows must be presented in the same order in all order-equivalent
> windows, which we choose to implement by ensuring there is only one sort
> for such windows, rather than, for example, adding extra sort keys to
> provide stability.)

Sure, rewrite away.

> (Perhaps worth noting for future work is that this code and the grouping
> sets code have a common issue: currently we allow only one sort order to
> be requested as query_pathkeys, but this means that both window paths
> and grouping sets paths have to make an essentially arbitrary choice of
> query_pathkeys, rather than having a set of possible "useful" orderings
> and taking whichever can be produced most cheaply.)

Yeah, I've had a bee in my bonnet for awhile about replacing
query_pathkeys with a list of potentially-desirable result orderings.
So far there hasn't been a truly compelling reason to do it, but if
somebody felt like generalizing the window function ordering stuff
in that direction, it'd be a nice project.

regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2018-09-13 Thread Tom Lane
Thomas Munro  writes:
> I noticed that the patch does a bunch of s/Olson/IANA/.  That leaves
> only one place in the tree that still refers to the "Olson" database:
> dt_common.c.  Might want to change that too?

Ah, I didn't realize we were that close to wiping out the old
terminology.  Yeah, I'll get that one too.  Thanks for noticing.

regards, tom lane



Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-09-13 Thread Tom Lane
Amit Langote  writes:
> On 2018/09/13 1:14, Tom Lane wrote:
>> That seems excessively restrictive.  Anything that has storage (e.g.
>> matviews) ought to be truncatable, no?

> Not by heap_truncate it seems.  The header comment of heap_truncate says
> that it concerns itself only with ON COMMIT truncation of temporary tables:

Ah.  Well, in that case I'm OK with just a simple test for
RELKIND_RELATION, but I definitely feel that it should be inside
heap_truncate.  Callers don't need to know about the limited scope
of what that does.

regards, tom lane



Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-09-13 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> I'm also a bit suspicious as to whether the code is even correct;
 Tom> does it *really* match what will happen later when we create sort
 Tom> plan nodes? (Maybe it's fine; I haven't looked.)

As things stand before the patch, the code that actually generates the
path of sort+window nodes requires only this assumption: that
order-equivalent windows (as defined by the spec) must end up together
in the list, or otherwise separated only by entries that don't add a new
sort node.

(aside: I itch to rewrite the comment that says "the spec requires that
there be only one sort" - number of sorts is an implementation detail
about which the spec is silent, what it _actually_ requires is that peer
rows must be presented in the same order in all order-equivalent
windows, which we choose to implement by ensuring there is only one sort
for such windows, rather than, for example, adding extra sort keys to
provide stability.)

The path-generation code simply concatenates the partition and order
lists and creates pathkeys. The pathkeys creation removes redundant
entries. So if we're guaranteed that two entries considered equal by the
patch code are also considered equal by the pathkey mechanism, which I
believe is the case, then the logic is still correct (enough to satisfy
the spec and produce correct query results).

There are optimizations that can be done once we have the pathkeys that
can't be anticipated by select_active_windows because that function is
run before we set up equivalence classes. This might lead path creation
to produce fewer sorts than anticipated, but not more sorts.

So I'm satisfied, as far as I can tell, that the logic is both correct
and an improvement over what we currently have.

(Perhaps worth noting for future work is that this code and the grouping
sets code have a common issue: currently we allow only one sort order to
be requested as query_pathkeys, but this means that both window paths
and grouping sets paths have to make an essentially arbitrary choice of
query_pathkeys, rather than having a set of possible "useful" orderings
and taking whichever can be produced most cheaply.)

-- 
Andrew (irc:RhodiumToad)



Re: pg_dump test instability

2018-09-13 Thread Tom Lane
Peter Eisentraut  writes:
> On 12/09/2018 18:06, Tom Lane wrote:
>> No.  In both code paths, the array slot at index first_te is being
>> physically dropped from the set of valid entries (by incrementing
>> first_te).  In the first path, that slot holds the item we want to
>> remove logically from the set, so that incrementing first_te is
>> all we have to do: the remaining entries are still in the range
>> first_te..last_te, and they're still sorted.  In the second code
>> path, the item that was in that slot is still wanted as part of
>> the set, so we copy it into the valid range (overwriting the item
>> in slot i, which is no longer wanted).  Now the valid range is
>> probably not sorted, so we have to flag that a re-sort is needed.

> I see.  Why not shift all items up to the i'th up by one place, instead
> of moving only the first one?  That way the sortedness would be
> preserved.  Otherwise we'd move the first one into the middle, then
> sorting would move it to the front again, etc.

Hmmm ... might be worth doing, but I'm not sure.  The steady-state cycle
will probably be that after one task has been dispatched, we'll sleep
until some task finishes and then that will unblock some pending items,
resulting in new entries at the end of the list, forcing a sort anyway
before we next dispatch a task.  So I was expecting that avoiding a sort
here wasn't really going to be worth expending much effort for.  But my
intuition about that could be wrong.  I'll run a test case with some
instrumentation added and see how often we could avoid sorts by
memmove'ing.

regards, tom lane



Re: Loaded footgun open_datasync on Windows

2018-09-13 Thread Laurenz Albe
Michael Paquier wrote:
> On Mon, Sep 10, 2018 at 04:46:40PM +0200, Laurenz Albe wrote:
> > I didn't get pg_upgrade to work without the log file hacks; I suspect
> > that there is more than just log file locking going on, but my Windows
> > skills are limited.
> > 
> > How shall I proceed?
> 
> I do like this patch, and we have an occasion to clean a bunch of things
> in pg_upgrade, so this argument is enough to me to put my hands in the
> dirt and check by myself, so...
> 
> I really thought that this was not ambitious enough, so I have hacked on
> top of your patch, so as pg_upgrade concurrent issues are removed, and I
> have found one barrier in pg_ctl which decides that it is smarter to
> redirect the log file (here pg_upgrade_server.log) using CMD.  The
> problem is that the lock taken by the process which does the redirection
> does not work nicely with what pg_upgrade does in parallel.  So I think
> that it is better to drop that part.

I got you now.
So I won't try to deal with that at this point.

> +#ifdef WIN32
> +   if ((infile = fopen(path, "rt")) == NULL)
> +#else
> if ((infile = fopen(path, "r")) == NULL)
> +#endif
> This should have a comment, saying roughly that as this uses
> win32_fopen, text mode needs to be enforced to get proper CRLF.

Done.

> One spot for open() is missed in file_utils.c, please see
> pre_sync_fname().

Done.

> The patch fails to apply for pg_verify_checksums, with a conflict easy
> enough to fix.

Fixed.

> Laurenz, could you update your patch?  I am switching that as waiting on
> author for now.

Attached is the new, improved version of the patch.

Thanks again!

Yours,
Laurenz Albe

From 8cd3d1a75fc3b18e6928b9b26841f1947d1f72ba Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 13 Sep 2018 14:24:59 +0200
Subject: [PATCH] Use pgwin32_open in frontend code on Windows

This is particularly important for pg_test_fsync which does not handle
O_DSYNC correctly otherwise, leading to false claims that disks are unsafe.

With pgwin32_open, files won't get locked when opened.
This could be used to get rid of some of the workarounds for Windows'
file locking, but that is left for later because it proved not to be as
trivial as hoped.

Discussion: https://www.postgresql.org/message-id/1527846213.2475.31.camel%40cybertec.at

Laurenz Albe, reviewed by Michael Paquier and Kuntal Ghosh
---
 src/bin/initdb/initdb.c   | 8 
 src/bin/pg_basebackup/pg_receivewal.c | 2 +-
 src/bin/pg_verify_checksums/pg_verify_checksums.c | 2 +-
 src/common/file_utils.c   | 4 ++--
 src/include/port.h| 3 ---
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 32746c7703..bf234c0763 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -490,7 +490,15 @@ readfile(const char *path)
 	char	   *buffer;
 	int			c;
 
+#ifdef WIN32
+	/*
+	 * On Windows, we have to open the file in text mode
+	 * so that "carriage return" characters are stripped.
+	 */
+	if ((infile = fopen(path, "rt")) == NULL)
+#else
 	if ((infile = fopen(path, "r")) == NULL)
+#endif
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
 progname, path, strerror(errno));
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 8be8d48a8a..912aed8d7c 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -288,7 +288,7 @@ FindStreamingStart(uint32 *tli)
 
 			snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name);
 
-			fd = open(fullpath, O_RDONLY | PG_BINARY);
+			fd = open(fullpath, O_RDONLY | PG_BINARY, 0);
 			if (fd < 0)
 			{
 fprintf(stderr, _("%s: could not open compressed file \"%s\": %s\n"),
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index d46bac2cd6..507f83ca4c 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -80,7 +80,7 @@ scan_file(const char *fn, BlockNumber segmentno)
 	int			f;
 	BlockNumber blockno;
 
-	f = open(fn, O_RDONLY | PG_BINARY);
+	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
diff --git a/src/common/file_utils.c b/src/common/file_utils.c
index 48876061c3..d952bc8c88 100644
--- a/src/common/file_utils.c
+++ b/src/common/file_utils.c
@@ -222,7 +222,7 @@ pre_sync_fname(const char *fname, bool isdir, const char *progname)
 {
 	int			fd;
 
-	fd = open(fname, O_RDONLY | PG_BINARY);
+	fd = open(fname, O_RDONLY | PG_BINARY, 0);
 
 	if (fd < 0)
 	{
@@ -283,7 +283,7 @@ fsync_fname(const char *fname, bool isdir, const char *progname)
 	 * unsupported operations, e.g. opening a directory under Windows), and
 	 * logging others.
 	 */
-	fd = open(fname, flags);
+	fd = open(fname, flags, 0);
 	if (fd

Re: Index Skip Scan

2018-09-13 Thread Alexander Kuzmenkov

Hi Jesper,

While testing this patch I noticed that current implementation doesn't 
perform well when we have lots of small groups of equal values. Here is 
the execution time of index skip scan vs unique over index scan, in ms, 
depending on the size of group. The benchmark script is attached.


group size    skip        unique
1     2,293.85    132.55
5             464.40      106.59
10            239.61      102.02
50            56.59       98.74
100           32.56       103.04
500           6.08        97.09

So, the current implementation can lead to performance regression, and 
the choice of the plan depends on the notoriously unreliable ndistinct 
statistics. The regression is probably because skip scan always does 
_bt_search to find the next unique tuple. I think we can improve this, 
and the skip scan can be strictly faster than index scan regardless of 
the data. As a first approximation, imagine that we somehow skipped 
equal tuples inside _bt_next instead of sending them to the parent 
Unique node. This would already be marginally faster than Unique + Index 
scan. A more practical implementation would be to remember our position 
in tree (that is, BTStack returned by _bt_search) and use it to skip 
pages in bulk. This looks straightforward to implement for a tree that 
does not change, but I'm not sure how to make it work with concurrent 
modifications. Still, this looks a worthwhile direction to me, because 
if we have a strictly faster skip scan, we can just use it always and 
not worry about our unreliable statistics. What do you think?


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



test-skip.sql
Description: application/sql


Re: Protect syscache from bloating with negative cache entries

2018-09-13 Thread Kyotaro HORIGUCHI
Hello. Thank you for looking this.

At Wed, 12 Sep 2018 05:16:52 +, "Ideriha, Takeshi" 
 wrote in 
<4E72940DA2BF16479384A86D54D0988A6F197012@G01JPEXMBKW04>
> Hi, 
> 
> >Subject: Re: Protect syscache from bloating with negative cache entries
> >
> >Hello. The previous v4 patchset was just broken.
> 
> >Somehow the 0004 was merged into the 0003 and applying 0004 results in 
> >failure. I
> >removed 0004 part from the 0003 and rebased and repost it.
> 
> I have some questions about syscache and relcache pruning
> though they may be discussed at upper thread or out of point.
> 
> Can I confirm about catcache pruning?
> syscache_memory_target is the max figure per CatCache.
> (Any CatCache has the same max value.)
> So the total max size of catalog caches is estimated around or 
> slightly more than # of SysCache array times syscache_memory_target.

Right.

> If correct, I'm thinking writing down the above estimation to the document 
> would help db administrators with estimation of memory usage.
> Current description might lead misunderstanding that syscache_memory_target
> is the total size of catalog cache in my impression.

Honestly I'm not sure that is the right design. Howerver, I don't
think providing such formula to users helps users, since they
don't know exactly how many CatCaches and brothres live in their
server and it is a soft limit, and finally only few or just one
catalogs can reach the limit.

The current design based on the assumption that we would have
only one extremely-growable cache in one use case.

> Related to the above I just thought changing sysycache_memory_target per 
> CatCache
> would make memory usage more efficient.

We could easily have per-cache settings in CatCache, but how do
we provide the knobs for them? I can guess only too much
solutions for that.

> Though I haven't checked if there's a case that each system catalog cache 
> memory usage varies largely,
> pg_class cache might need more memory than others and others might need less.
> But it would be difficult for users to check each CatCache memory usage and 
> tune it
> because right now postgresql hasn't provided a handy way to check them.

I supposed that this is used without such a means. Someone
suffers syscache bloat just can set this GUC to avoid the
bloat. End.

Apart from that, in the current patch, syscache_memory_target is
not exact at all in the first place to avoid overhead to count
the correct size. The major difference comes from the size of
cache tuple itself. But I came to think it is too much to omit.

As a *PoC*, in the attached patch (which applies to current
master), size of CTups are counted as the catcache size.

It also provides pg_catcache_size system view just to give a
rough idea of how such view looks. I'll consider more on that but
do you have any opinion on this?

=# select relid::regclass, indid::regclass, size from pg_syscache_sizes order 
by size desc;
  relid  |   indid   |  size  
-+---+
 pg_class| pg_class_oid_index| 131072
 pg_class| pg_class_relname_nsp_index| 131072
 pg_cast | pg_cast_source_target_index   |   5504
 pg_operator | pg_operator_oprname_l_r_n_index   |   4096
 pg_statistic| pg_statistic_relid_att_inh_index  |   2048
 pg_proc | pg_proc_proname_args_nsp_index|   2048
..


> Another option is that users only specify the total memory target size and 
> postgres 
> dynamically change each CatCache memory target size according to a certain 
> metric.
> (, which still seems difficult and expensive to develop per benefit)
> What do you think about this?

Given that few caches bloat at once, it's effect is not so
different from the current design.

> As you commented here, guc variable syscache_memory_target and
> syscache_prune_min_age are used for both syscache and relcache (HTAB), right?

Right, just not to add knobs for unclear reasons. Since ...

> Do syscache and relcache have the similar amount of memory usage?

They may be different but would make not so much in the case of
cache bloat.

> If not, I'm thinking that introducing separate guc variable would be fine.
> So as syscache_prune_min_age.

I implemented that so that it is easily replaceable in case, but
I'm not sure separating them makes significant difference..

Thanks for the opinion, I'll put consideration on this more.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bee4afbe4e..6a00141fc9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1617,6 +1617,44 @@ include_dir 'conf.d'
   
  
 
+ 
+  syscache_memory_target (integer)
+  
+   syscache_memory_target configuration parameter
+  
+

Re: Collation versioning

2018-09-13 Thread Stephen Frost
Greetings,

* Christoph Berg (m...@debian.org) wrote:
> Re: Peter Eisentraut 2018-09-13 
> <4f60612c-a7b5-092d-1532-21ff7a106...@2ndquadrant.com>
> > Moreover, the fix for a collation version mismatch is, in the simplest
> > case, to go around and REINDEX everything.  Making the collation or
> > collation version global doesn't fix that.  It would actually make it
> > harder because you couldn't run ALTER COLLATION REFRESH VERSION until
> > after you have rebuilt all affected objects *in all databases*.
> 
> Btw, I think a "reindexdb --all --collation" (and the SQL per-database
> equivalent) that only rebuilds indexes that are affected by collations
> would be immensely useful to have.

As I was discussing w/ Peter G during PostgresOpen, we'd have to wait
until that reindexdb is complete before actually using anything in the
system and that's pretty painful.  While it sounds like it'd be a good
bit of work, it seems like we really need to have a way to support
multiple collation versions concurrently and to do that we'll need to
have the library underneath actually providing that to us.  Once we have
that, we can build new indexes concurrently and swap to them (or,
ideally, just issue REINDEX CONCURRENTLY once we support that..).

Until then, it seems like we really need to have a way to realize that a
given upgrade is going to require a big reindex, before actually doing
the reindex and suddenly discovering that we can't use a bunch of
indexes because they're out of date and extending the downtime for the
upgrade to be however long it takes to rebuild those indexes...

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: executor relation handling

2018-09-13 Thread Jesper Pedersen

Hi Amit,

On 9/13/18 12:58 AM, Amit Langote wrote:

Attached updated patches.

Beside the issue that caused eval-plan-qual isolation test to crash, I
also spotted and fixed an oversight in the 0002 patch which would lead to
EState.es_output_cid being set to wrong value and causing unexpected error
during tuple locking as result of that.



Thanks for the update.

However, the subscription TAP test 
(src/test/subscription/t/001_rep_changes.pl) is still failing.


CFBot has the same log

 https://travis-ci.org/postgresql-cfbot/postgresql/builds/42769

as locally.

Best regards,
 Jesper



Re: Problem while setting the fpw with SIGHUP

2018-09-13 Thread Amit Kapila
On Mon, Sep 10, 2018 at 11:54 AM Amit Kapila  wrote:
>
> Thanks, but what I wanted you to verify is the commit that broke it in
> 9.5.  On again looking at it, I think it is below code in commit
> 2076db2aea that caused this problem.  If possible, can you once test
> it before and at this commit or at least do the manual review of same
> to cross-verify?
>

I have myself investigated this further and found that this problem
started occuring due to code rearrangement in commits 2c03216d83 and
2076db2aea.  In commit 2076db2aea, below check for comparing the old
and new value for fullPageWrites got changed:

> -   if ((Insert->fullPageWrites || Insert->forcePageWrites) &&
> !doPageWrites)
> +   if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr &&
> doPageWrites)
> {

However, it alone didn't lead to this problem because
XLogRecordAssemble() gives the valid value of fpw_lsn irrespective of
whether full-page-writes was enabled or not. Later in commit
2c03216d83, we changed XLogRecordAssemble() such that it will give the
valid value of fpw_lsn only if doPageWrites is enabled, basically this
part of the commit:

+ /* Determine if this block needs to be backed up */
+ if (regbuf->flags & REGBUF_FORCE_IMAGE)
+ needs_backup = true;
+ else if (regbuf->flags & REGBUF_NO_IMAGE)
+ needs_backup = false;
+ else if (!doPageWrites)
+ needs_backup = false;
+ else
  {
- /* Simple data, just include it */
- len += rdt->len;
+ /*
+ * We assume page LSN is first data on *every* page that can be
+ * passed to XLogInsert, whether it has the standard page layout
+ * or not.
+ */
+ XLogRecPtr page_lsn = PageGetLSN(regbuf->page);
+
+ needs_backup = (page_lsn <= RedoRecPtr);
+ if (!needs_backup)
+ {
+ if (*fpw_lsn == InvalidXLogRecPtr || page_lsn < *fpw_lsn)
+ *fpw_lsn = page_lsn;
+ }
  }

So, the problem started appearing after some rearrangement of code in
both the above-mentioned commits.  I verified that this problem
doesn't exist in versions <=9.4, so backpatch-through 9.5.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Restricting maximum keep segments by repslots

2018-09-13 Thread Kyotaro HORIGUCHI
Hello.

Thank you for the comments, Sawada-san, Peter.

At Mon, 10 Sep 2018 19:52:24 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180910.195224.22629595.horiguchi.kyot...@lab.ntt.co.jp>
> At Thu, 6 Sep 2018 22:32:21 +0200, Peter Eisentraut 
>  wrote in 
> <29bbd79d-696b-509e-578a-0fc38a3b9...@2ndquadrant.com>
> Thanks for pointing that. That's a major cause of confusion. Does
> the following make sense?
> 
> > Specify the maximum size of WAL files that  > linkend="streaming-replication-slots">replication slots
> > are allowed to retain in the pg_wal
> > directory at checkpoint time.  If
> > max_slot_wal_keep_size is zero (the
> > default), replication slots retain unlimited size of WAL files.
> + If restart_lsn of a replication slot gets behind more than that
> + bytes from the current LSN, the standby using the slot may not
> + be able to reconnect due to removal of required WAL records.
...
> > Also, I don't think 0 is a good value for the default behavior.  0 would
> > mean that a slot is not allowed to retain any more WAL than already
> > exists anyway.  Maybe we don't want to support that directly, but it's a
> > valid configuration.  So maybe use -1 for infinity.
> 
> In realtion to the reply just sent to Sawada-san, remain of a
> slot can be at most 16MB in the 0 case with the default segment
> size. So you're right in this sense. Will fix in the coming
> version. Thanks.

I did the following thinkgs in the new version.

- Changed the disable (or infinite) and default value of
  max_slot_wal_keep_size to -1 from 0.
  (patch 1, 2. guc.c, xlog.c: GetOldestKeepSegment())

- Fixed documentation for max_slot_wal_keep_size tomention what
  happnes when WAL exceeds the size, and additional rewrites.
  (patch 4, catalogs.sgml, config.sgml)

- Folded parameter list of GetOldestKeepSegment().
  (patch 1, 2. xlog.c)

- Provided the plural form of errdetail of checkpoint-time
  warning.  (patch 1, xlog.c: KeepLogSeg())

- Some cosmetic change and small refactor.
  (patch 1, 2, 3)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From ee8ddfa69b6fb6832307d15374ea5f2446bda85f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 21 Dec 2017 21:20:20 +0900
Subject: [PATCH 1/4] Add WAL relief vent for replication slots

Adds a capability to limit the number of segments kept by replication
slots by a GUC variable.
---
 src/backend/access/transam/xlog.c | 108 --
 src/backend/utils/misc/guc.c  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 4 files changed, 97 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 85a7b285ec..deda43607d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -105,6 +105,7 @@ int			wal_level = WAL_LEVEL_MINIMAL;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
+int			max_slot_wal_keep_size_mb = -1;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -867,6 +868,7 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
@@ -9505,6 +9507,53 @@ CreateRestartPoint(int flags)
 	return true;
 }
 
+/*
+ * Returns minimum segment number the next checkpoint must leave considering
+ * wal_keep_segments, replication slots and max_slot_wal_keep_size.
+ *
+ * currLSN is the current insert location
+ * minSlotLSN is the minimum restart_lsn of all active slots
+ */
+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
+{
+	uint64		keepSegs = 0;
+	XLogSegNo	currSeg;
+	XLogSegNo	minSlotSeg;
+
+	XLByteToSeg(currLSN, currSeg, wal_segment_size);
+	XLByteToSeg(minSlotLSN, minSlotSeg, wal_segment_size);
+
+	/*
+	 * Calculate keep segments by slots first. The second term of the
+	 * condition is just a sanity check.
+	 */
+	if (minSlotLSN != InvalidXLogRecPtr && minSlotSeg <= currSeg)
+		keepSegs = currSeg - minSlotSeg;
+
+	/* Cap keepSegs by max_slot_wal_keep_size */
+	if (max_slot_wal_keep_size_mb >= 0)
+	{
+		uint64 limitSegs;
+
+		limitSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
+
+		/* Apply max_slot_wal_keep_size to keepSegs */
+		if (limitSegs < keepSegs)
+			keepSegs = limitSegs;
+	}
+
+	/* but, keep at least wal_keep_segments segments if any */
+	if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
+		keepSegs = wal_keep_segments;
+
+	/* avoid underflow, don't go below 1 */
+	if (currSeg 

Re: Indicate anti-wraparound autovacuum in log_autovacuum_min_duration

2018-09-13 Thread Sergei Kornilov
Hello

> The brackets look rather useless to me, wouldn't it be better to remove
> them? By doing so the longest message becomes:
> "automatic aggressive vacuum to prevent wraparound of table blah.blah"
Hmm,

> 2018-09-13 11:48:09.303 MSK 6994 @ from  [vxid:6/713 txid:0] [] LOG:  
> automatic aggressive vacuum (to prevent wraparound) of table 
> "template0.pg_toast.pg_toast_12252": index scans: 0
or
> 2018-09-13 11:54:55.095 MSK 10115 @ from  [vxid:3/100278 txid:0] [] LOG:  
> automatic aggressive vacuum to prevent wraparound of table 
> "template0.pg_toast.pg_toast_12252": index scans: 0

Looks better for me. Updated patch attached.

regards, Sergeidiff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 5649a70..8996d36 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -374,10 +374,20 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 			 * emitting individual parts of the message when not applicable.
 			 */
 			initStringInfo(&buf);
-			if (aggressive)
-msgfmt = _("automatic aggressive vacuum of table \"%s.%s.%s\": index scans: %d\n");
+			if (params->is_wraparound)
+			{
+if (aggressive)
+	msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
+else
+	msgfmt = _("automatic vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
+			}
 			else
-msgfmt = _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n");
+			{
+if (aggressive)
+	msgfmt = _("automatic aggressive vacuum of table \"%s.%s.%s\": index scans: %d\n");
+else
+	msgfmt = _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n");
+			}
 			appendStringInfo(&buf, msgfmt,
 			 get_database_name(MyDatabaseId),
 			 get_namespace_name(RelationGetNamespace(onerel)),


Re: Collation versioning

2018-09-13 Thread Christoph Berg
Re: Peter Eisentraut 2018-09-13 
<4f60612c-a7b5-092d-1532-21ff7a106...@2ndquadrant.com>
> Moreover, the fix for a collation version mismatch is, in the simplest
> case, to go around and REINDEX everything.  Making the collation or
> collation version global doesn't fix that.  It would actually make it
> harder because you couldn't run ALTER COLLATION REFRESH VERSION until
> after you have rebuilt all affected objects *in all databases*.

Btw, I think a "reindexdb --all --collation" (and the SQL per-database
equivalent) that only rebuilds indexes that are affected by collations
would be immensely useful to have.

Christoph



Re: Unused argument from execute_sql_string()

2018-09-13 Thread Michael Paquier
On Thu, Sep 13, 2018 at 03:47:26PM +0900, Tatsuo Ishii wrote:
> Anyway, considering it's a static function, chance of breaking
> backward compatibility is minimum, I think.
> 
> So +1 to remove the unused argument.

Same opinion and arguments here, so I have committed the patch.
--
Michael


signature.asc
Description: PGP signature


Re: pg_dump test instability

2018-09-13 Thread Peter Eisentraut
On 12/09/2018 18:06, Tom Lane wrote:
> No.  In both code paths, the array slot at index first_te is being
> physically dropped from the set of valid entries (by incrementing
> first_te).  In the first path, that slot holds the item we want to
> remove logically from the set, so that incrementing first_te is
> all we have to do: the remaining entries are still in the range
> first_te..last_te, and they're still sorted.  In the second code
> path, the item that was in that slot is still wanted as part of
> the set, so we copy it into the valid range (overwriting the item
> in slot i, which is no longer wanted).  Now the valid range is
> probably not sorted, so we have to flag that a re-sort is needed.

I see.  Why not shift all items up to the i'th up by one place, instead
of moving only the first one?  That way the sortedness would be
preserved.  Otherwise we'd move the first one into the middle, then
sorting would move it to the front again, etc.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Create a new session in postmaster by calling setsid()

2018-09-13 Thread Paul Guo
On Thu, Sep 13, 2018 at 8:20 AM, Michael Paquier 
wrote:

> On Wed, Sep 12, 2018 at 03:55:00PM -0400, Tom Lane wrote:
> > Although pg_ctl could sneak it in between forking and execing, it seems
> > like it'd be cleaner to have the postmaster proper do it in response to
> > a switch that pg_ctl passes it.  That avoids depending on the fork/exec
> > separation, and makes the functionality available for other postmaster
> > startup mechanisms, and opens the possibility of delaying the detach
> > to the end of startup.
>
> I would be fine with something happening only once the postmaster thinks
> that consistency has been reached and is open for business, though I'd
> still think that this should be controlled by a switch, where the
> default does what we do now.  Feel free to ignore me if I am outvoted ;)
> --
> Michael
>

>From the respective of users instead of developers, I think for such
important
service, tty should be dissociated, i.e. postmaster should be daemonized by
default (and even should have default log filename?) or be setsid()-ed at
least.
For concerns from developers, maybe a shell alias, or an internal switch in
pg_ctl,
or env variable guard in postmaster code could address.

I'm not sure the several-years-ago LINUX_OOM_ADJ issue (to resolve that,
silient_mode was removed) still exists or not. I don't have context  about
that, so
I conceded to use setsid() even I prefer the deleted silent_mode code.


Re: Collation versioning

2018-09-13 Thread Peter Eisentraut
On 12/09/2018 13:25, Christoph Berg wrote:
> Re: Peter Eisentraut 2018-09-12 
> <0447ec7b-cdb6-7252-7943-88a4664e7...@2ndquadrant.com>
>>> Naive idea: make that catalog shared? Collations are system-wide after
>>> all.
>>
>> By the same argument, extensions should be shared, but they are not.
> 
> But extensions put a lot of visible stuff into a database, whereas a
> collation is just a line in some table that doesn't get into the way.

How about C functions?  They are just a system catalog representation of
something that exists on the OS.

Anyway, we also want to support application-specific collation
definitions, so that users can CREATE COLLATION
"my_specific_requirements" and use that that in their application, so
global collations wouldn't be appropriate for that.

Moreover, the fix for a collation version mismatch is, in the simplest
case, to go around and REINDEX everything.  Making the collation or
collation version global doesn't fix that.  It would actually make it
harder because you couldn't run ALTER COLLATION REFRESH VERSION until
after you have rebuilt all affected objects *in all databases*.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services