date:20170724

Re: [HACKERS] cache lookup failed error for partition key with custom opclass

2017-07-24 Thread Rushabh Lathia

On Mon, Jul 24, 2017 at 7:23 PM, Tom Lane  wrote:

> Rushabh Lathia  writes:
> > PFA patch, where added elog() to add the error message same as all other
> > places.
>
> Some looking around says that this *isn't* the only place that just
> blithely assumes that it will find an opfamily entry.  But I agree
> that not checking for that isn't up to project standards.
>

Thanks Tom.

I go thorough the get_opfamily_proc() in the system and added the
check for InvalidOid.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 5267a01..c9c1a54 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1640,6 +1640,9 @@ ExecInitExprRec(Expr *node, PlanState *parent, ExprState *state,
 			 lefttype,
 			 righttype,
 			 BTORDER_PROC);
+	if (!OidIsValid(proc))/* should not happen */
+		elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+			 BTORDER_PROC, lefttype, righttype, opfamily);
 
 	/* Set up the primary fmgr lookup information */
 	finfo = palloc0(sizeof(FmgrInfo));
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d8aceb1..ff758d6 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -1367,6 +1367,9 @@ ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
 			 op_lefttype,
 			 op_righttype,
 			 BTORDER_PROC);
+if (!OidIsValid(opfuncid))/* should not happen */
+	elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+		 BTORDER_PROC, op_lefttype, op_righttype, opfamily);
 
 inputcollation = lfirst_oid(collids_cell);
 collids_cell = lnext(collids_cell);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 43238dd..6bd93b0 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -945,6 +945,10 @@ RelationBuildPartitionKey(Relation relation)
    opclassform->opcintype,
    opclassform->opcintype,
    BTORDER_PROC);
+		if (!OidIsValid(funcid))	/* should not happen */
+			elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+ BTORDER_PROC, opclassform->opcintype, opclassform->opcintype,
+ opclassform->opcfamily);
 
 		fmgr_info(funcid, &key->partsupfunc[i]);
 
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 7ec31eb..bc2f164 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -440,6 +440,10 @@ lookup_type_cache(Oid type_id, int flags)
 		 typentry->btree_opintype,
 		 typentry->btree_opintype,
 		 BTORDER_PROC);
+		if (!OidIsValid(cmp_proc))/* should not happen */
+			elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+ BTORDER_PROC, typentry->btree_opintype,
+ typentry->btree_opintype, typentry->btree_opf);
 
 		/* As above, make sure array_cmp or record_cmp will succeed */
 		if (cmp_proc == F_BTARRAYCMP &&

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testlib.pm vs msys

2017-07-24 Thread Tom Lane

I wrote:
> Andrew Dunstan  writes:
>> The problem is command_like's use of redirection to strings. Why this
>> should be a problem for this particular use is a matter of speculation.

> I looked at jacana's two recent pg_ctlCheck failures, and they both
> seem to have failed on this:
> command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data",
>   '-l', "$TestLib::log_path/001_start_stop_server.log" ],
>   qr/done.*server started/s, 'pg_ctl start');
> That is redirecting the postmaster's stdout/stderr into a file,
> for sure, so the child processes shouldn't impact EOF detection AFAICS.
> It's also hard to explain this way why it only fails some of the time.

I reflected a bit more on this issue, and realized that there's a pretty
obvious mechanism for this to happen.  While on Unix, we have pg_ctl
fork-and-exec the postmaster so that no extra processes are laying about,
that's not the case on Windows.  The Windows code in pg_ctl.c creates a
subprocess that runs CMD.EXE, which in turn runs the postmaster as a
subprocess.  The CMD.EXE process doesn't disappear until the postmaster
exits.  Now, we tell CMD to redirect the postmaster's stdout and stderr
into files, but there's no way to tell CMD to redirect its own handles.
So if the CMD process inherits pg_ctl's stdout and stderr, then the prove
process would not see EOF on those pipes after pg_ctl exits.

Now, this theory still has a Mack-truck-sized hole in it, which is
if that's the failure mechanism then why isn't it 100% effective?
Seems like the TAP test should fail every time, if this is a full
description.  But maybe it's part of the answer, so it seems worth
experimenting in this direction.

A bit of googling Microsoft's documentation suggests that maybe all
we have to do is pass CreateProcessAsUser's bInheritHandles parameter
as FALSE not TRUE in pg_ctl.c.  It's not apparent to me that there
are any handles we do need the CMD process to inherit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-24 Thread Amit Kapila

On Mon, Jul 24, 2017 at 9:21 PM, Jeff Janes  wrote:
> On Sat, Jul 22, 2017 at 8:53 PM, Amit Kapila 
> wrote:
>>
>> On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila 
>> wrote:
>> > On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes 
>> > wrote:
>> >>
>> >>
>> >>
>> >> Setting parallel_workers to 8 changes the threshold for the parallel to
>> >> even
>> >> be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076.  So it
>> >> is
>> >> going in the correct direction, but not by enough to matter.
>> >>
>> >
>> > You might want to play with cpu_tuple_cost and or seq_page_cost.
>> >
>>
>> I don't know whether the patch will completely solve your problem, but
>> this seems to be the right thing to do.  Do you think we should stick
>> this for next CF?
>
>
> It doesn't solve the problem for me, but I agree it is an improvement we
> should commit.
>

Okay, added the patch for next CF.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More race conditions in logical replication

2017-07-24 Thread Alvaro Herrera

Alvaro Herrera wrote:
> I'm back at looking into this again, after a rather exhausting week.  I
> think I have a better grasp of what is going on in this code now,

Here's an updated patch, which I intend to push tomorrow.

> and it
> appears to me that we should change the locking so that active_pid is
> protected by ReplicationSlotControlLock instead of the slot's spinlock;
> but I haven't analyzed the idea fully yet and I don't have the patch
> done yet either.

This doesn't seem to be a good idea actually.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 33f08678bf20eed3a4cb3d10960bb06543a1b3db Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 12 Jul 2017 18:38:33 -0400
Subject: [PATCH v4] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |   2 +-
 src/backend/replication/slot.c | 115 ++---
 src/backend/replication/slotfuncs.c|  32 ---
 src/backend/replication/walsender.c|   6 +-
 src/include/replication/slot.h |  10 ++-
 5 files changed, 110 insertions(+), 55 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c 
b/src/backend/replication/logical/logicalfuncs.c
index 363ca82cb0..a3ba2b1266 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, 
bool confirm, bool bin
else
end_of_wal = GetXLogReplayRecPtr(&ThisTimeLineID);
 
-   ReplicationSlotAcquire(NameStr(*name));
+   ReplicationSlotAcquire(NameStr(*name), true);
 
PG_TRY();
{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20e11..ea9cd1f22b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -157,6 +157,7 @@ ReplicationSlotsShmemInit(void)
/* everything else is zeroed by the memset above */
SpinLockInit(&slot->mutex);
LWLockInitialize(&slot->io_in_progress_lock, 
LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS);
+   ConditionVariableInit(&slot->active_cv);
}
}
 }
@@ -313,25 +314,35 @@ ReplicationSlotCreate(const char *name, bool db_specific,
LWLockRelease(ReplicationSlotControlLock);
 
/*
-* Now that the slot has been marked as in_use and in_active, it's safe 
to
+* Now that the slot has been marked as in_use and active, it's safe to
 * let somebody else try to allocate a slot.
 */
LWLockRelease(ReplicationSlotAllocationLock);
+
+   /* Let everybody know we've modified this slot */
+   ConditionVariableBroadcast(&slot->active_cv);
 }
 
 /*
  * Find a previously created slot and mark it as used by this backend.
  */
 void
-ReplicationSlotAcquire(const char *name)
+ReplicationSlotAcquire(const char *name, bool nowait)
 {
-   ReplicationSlot *slot = NULL;
+   ReplicationSlot *slot;
+   int active_pid;
int i;
-   int active_pid = 0; /* Keep compiler quiet */
 
+retry:
Assert(MyReplicationSlot == NULL);
 
-   /* Search for the named slot and mark it active if we find it. */
+   /*
+* Search for the named slot and mark it active if we find it.  If the
+* slot is already active, we exit the loop with active_pid set to the 
PID
+* of the backend that owns it.
+*/
+   active_pid = 0;
+   slot = NULL;
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
for (i = 0; i < max_replication_slots; i++)
{
@@ -339,35 +350,59 @@ ReplicationSlotAcquire(const char *name)
 
if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
{
+   /* Found the slot we want -- can we activate it? */
SpinLockAcquire(&s->mutex);
+
active_pid = s->active_pid;
if (active_pid == 0)
active_pid = s->active_pid = MyProcPid;
+
SpinLockRelease(&s->mutex);
slot = s;
+
break;
}
}
LWLockRelease(ReplicationSlotControlLock);
 
-   /* If we did not find the slot or it was already active, error out. */
+   /* If we did not find the slot, error out. */
if (slot == NULL)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
 errmsg("replication slot \"%s\" does not 
exist", name)));
+
+   /*
+* If we found the slot but it's already active in another backend, we
+* either error out or retry after a short

Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-07-24 Thread Tom Lane

Tomas Vondra  writes:
> It seems to me that VACUUM and ANALYZE somewhat disagree on what exactly 
> reltuples means. VACUUM seems to be thinking that
>  reltuples = live + dead
> while ANALYZE apparently believes that
>  reltuples = live

> The question is - which of the reltuples definitions is the right one? 
> I've always assumed that "reltuples = live + dead" but perhaps not?

I think the planner basically assumes that reltuples is the live tuple
count, so maybe we'd better change VACUUM to get in step.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-07-24 Thread Tomas Vondra


Hi,

It seems to me that VACUUM and ANALYZE somewhat disagree on what exactly 
reltuples means. VACUUM seems to be thinking that


reltuples = live + dead

while ANALYZE apparently believes that

reltuples = live

This causes somewhat bizarre changes in the value, depending on which of 
those commands was executed last.


To demonstrate the issue, let's create a simple table with 1M rows, 
delete 10% rows and then we'll do a bunch of VACUUM / ANALYZE and check 
reltuples, n_live_tup and n_dead_tup in the catalogs.


I've disabled autovacuum so that it won't interfere with this, and 
there's another transaction blocking VACUUM from actually cleaning any 
dead tuples.



test=# create table t as
   select i from generate_series(1,100) s(i);

test=# select reltuples, n_live_tup, n_dead_tup
 from pg_stat_user_tables join pg_class using (relname)
where relname = 't';

 reltuples | n_live_tup | n_dead_tup
---++
 1e+06 |100 |  0

So, that's nice. Now let's delete 10% of rows, and run VACUUM and 
ANALYZE a few times.


test=# delete from t where random() < 0.1;

test=# vacuum t;

test=# select reltuples, n_live_tup, n_dead_tup
 from pg_stat_user_tables join pg_class using (relname)
where relname = 't';

 reltuples | n_live_tup | n_dead_tup
---++
 1e+06 | 900413 |  99587


test=# analyze t;

 reltuples | n_live_tup | n_dead_tup
---++
900413 | 900413 |  99587

test=# vacuum t;

 reltuples | n_live_tup | n_dead_tup
---++
 1e+06 | 900413 |  99587


So, analyze and vacuum disagree.

To further confuse the poor DBA, VACUUM always simply ignores the old 
values while ANALYZE combines the old and new values on large tables 
(and converges to the "correct" value after a few steps). This table is 
small (less than 30k pages), so ANALYZE does not do that.


This is quite annoying, because people tend to look at reltuples while 
investigating bloat (e.g. because the check_postgres query mentioned on 
our wiki [1] uses reltuples in the formula).


[1] https://wiki.postgresql.org/wiki/Show_database_bloat

And when the cleanup is blocked for some reason (as in the example 
above), VACUUM tends to be running much more often (because it can't 
cleanup anything). So reltuples tend to be set to the higher value, 
which I'd argue is the wrong value for estimating bloat.


I haven't looked at the code yet, but I've confirmed this happens both 
on 9.6 and 10. I haven't checked older versions, but I guess those are 
affected too.


The question is - which of the reltuples definitions is the right one? 
I've always assumed that "reltuples = live + dead" but perhaps not?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Fwd: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-24 Thread Tom Lane

Mat Arye  writes:
> On Mon, Jul 24, 2017 at 1:38 PM, Tom Lane  wrote:
>> I'm not really sure why planner hooks would have anything to do with your
>> exposed SQL API?

> Sorry what I meant was i'd like to package different versions of my
> extension -- both .sql and .c --
> and have the extension act consistently for any version until I do a ALTER
> EXTENSION UPDATE.
> So, I'd prefer a DB with an older extension to have the logic/code in the
> hook not change even if I install a newer version's .so for use in another
> database
> (but don't update the extension to the newer version).  Does that make any
> sense?

The newer version's .so simply is not going to load into the older
version; we intentionally prevent that from happening.  It's not necessary
anyway because versions do not share library directories.  Therefore,
you can have foo.so for 9.5 in your 9.5 version's library directory,
and foo.so for 9.6 in your 9.6 version's library directory, and the
filesystem will keep them straight for you.  It is not necessary to
call them foo-9.5.so and foo-9.6.so.

As for the other point, the usual idea is that if you have a
SQL-accessible C function xyz() that needs to behave differently after an
extension version update, then you make the extension update script point
the SQL function to a different library entry point.  If your 1.0
extension version originally had

CREATE FUNCTION xyz(...) RETURNS ...
  LANGUAGE C AS 'MODULE_PATHNAME', 'xyz';

(note that the second part of the AS clause might have been implicit;
no matter), then your update script for version 1.1 could do

CREATE OR REPLACE FUNCTION xyz(...) RETURNS ...
  LANGUAGE C AS 'MODULE_PATHNAME', 'xyz_1_1';

Then in the 1.1 version of the C code, the xyz_1_1() C function provides
the new behavior, while the xyz() C function provides the old behavior,
or maybe just throws an error if you conclude it's impractical to emulate
the old behavior exactly.  As I mentioned earlier, you can usually set
things up so that you can share much of the code between two such
functions.

The pgstattuple C function in contrib/pgstattuple is one example of
having changed a C function's behavior in this way over multiple versions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables

2017-07-24 Thread Robert Haas

On Mon, Jul 24, 2017 at 6:21 AM, Amit Langote
 wrote:
> Yes, we need that there too.
>
> Done that in the attached v3 (including the test where
> ExecPartitionCheck() would have crashed without the patch).

Committed.  Thanks to all of you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fwd: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-24 Thread Mat Arye

(adding -hackers back into thread, got dropped by my email client, sorry)

On Mon, Jul 24, 2017 at 1:38 PM, Tom Lane  wrote:

> Mat Arye  writes:
> > I tried looking in the contrib modules and didn't find many with lots of
> > planner hook usage.
>
> I'm not really sure why planner hooks would have anything to do with your
> exposed SQL API?
>

Sorry what I meant was i'd like to package different versions of my
extension -- both .sql and .c --
and have the extension act consistently for any version until I do a ALTER
EXTENSION UPDATE.
So, I'd prefer a DB with an older extension to have the logic/code in the
hook not change even if I install a newer version's .so for use in another
database
(but don't update the extension to the newer version).  Does that make any
sense?

>
> You will need to have separate builds of your extension for each PG
> release branch you work with; we force that through PG_MODULE_MAGIC
> whether you like it or not.  But that doesn't translate to needing
> different names for the library .so files.  Generally people either
> mantain separate source code per-branch (just as the core code does)
> or put in a lot of #ifs testing CATALOG_VERSION_NO to see which
> generation of PG they're compiling against.
>

Yeah we plan to use different branches for different PG versions.

>
> regards, tom lane
>

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-24 Thread Erik Rijkers


On 2017-07-24 23:31, Mark Rofail wrote:

On Mon, Jul 24, 2017 at 11:25 PM, Erik Rijkers  wrote:


This patch doesn't apply to HEAD at the moment ( e2c8100e6072936 ).



My bad, I should have mentioned that the patch is dependant on the 
original

patch.
Here is a *unified* patch that I just tested.


Thanks.  Apply is now good, but I get this error when compiling:

ELEMENT' not present in UNRESERVED_KEYWORD section of gram.y
make[4]: *** [gram.c] Error 1
make[3]: *** [parser/gram.h] Error 2
make[2]: *** [../../src/include/parser/gram.h] Error 2
make[1]: *** [all-common-recurse] Error 2
make: *** [all-src-recurse] Error 2





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-24 Thread Erik Rijkers


On 2017-07-24 23:08, Mark Rofail wrote:
Here is the new Patch with the bug fixes and the New Patch with the 
Index

in place performance results.

I just want to point this out because I still can't believe the 
numbers. In

reference to the old patch:
The new patch without the index suffers a 41.68% slow down, while the 
new

patch with the index has a 95.18% speed up!



[elemOperatorV4.patch]


This patch doesn't apply to HEAD at the moment ( e2c8100e6072936 ).

Can you have a look?

thanks,

Erik Rijkers




patching file doc/src/sgml/ref/create_table.sgml
Hunk #1 succeeded at 816 with fuzz 3.
patching file src/backend/access/gin/ginarrayproc.c
patching file src/backend/utils/adt/arrayfuncs.c
patching file src/backend/utils/adt/ri_triggers.c
Hunk #1 FAILED at 2650.
Hunk #2 FAILED at 2694.
2 out of 2 hunks FAILED -- saving rejects to file 
src/backend/utils/adt/ri_triggers.c.rej

patching file src/include/catalog/pg_amop.h
patching file src/include/catalog/pg_operator.h
patching file src/include/catalog/pg_proc.h
patching file src/test/regress/expected/arrays.out
patching file src/test/regress/expected/opr_sanity.out
patching file src/test/regress/sql/arrays.sql



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-24 Thread Mark Rofail

It certainly is, thank you for the heads up. I included a note to encourage
the user to index the referencing column instead.

On Sun, Jul 23, 2017 at 4:41 AM, Robert Haas  wrote:
>
> This is a jumbo king-sized can of worms, and even a very experienced
> contributor would likely find it extremely difficult to sort all of
> the problems that would result from a change in this area.

Best Regards,
Mark Rofail

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-24 Thread Mark Rofail

>
> However, there is a bug that prevented me from testing the third scenario,
> I assume there's an issue of incompatible types problem since the right
> operand type is anyelement and the supporting procedures expect anyarray.
> I am working on debugging it right now.
>

I have also solved the bug that prevented me from performance testing the
New Patch with the Index in place.

Here is a summary of the results:

A-  Original Patch
DELETE Average Execution time = 3.508 ms
UPDATE Average Execution time = 3.239 ms

B- New Patch
DELETE Average Execution time = 4.970 ms
UPDATE Average Execution time = 4.170 ms

C- With Index
DELETE Average Execution time = 0.169 ms
UPDATE Average Execution time = 0.147 ms

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-24 Thread Robert Haas

On Fri, Jul 21, 2017 at 8:05 AM, Alexey Chernyshov
 wrote:
> the following patch transfers functionality from gevel module
> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
> GIN indexes.

It's not clear from the web site in question that the relevant code is
released under the PostgreSQL license.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Mishandling of WCO constraints in direct foreign table modification

2017-07-24 Thread Robert Haas

On Fri, Jul 21, 2017 at 6:21 AM, Etsuro Fujita
 wrote:
> I mean constraints derived from WITH CHECK OPTIONs specified for parent
> views.  We use the words "WITH CHECK OPTION constraints" in comments in
> nodeModifyTable.c, so the expression "CHECK OPTION constrains" doesn't sound
> not that bad to me.  (I used "CHECK OPTION", not "WITH CHECK OPTION",
> because we use "CHECK OPTION" a lot more in the documentation than "WITH
> CHECK OPTION".)

Yeah, it seems OK to me, too; if the consensus is otherwise, we also
have the option to change it later.  Committed and back-patched as you
had it, but I removed a spurious comma.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread David Steele


On 7/24/17 3:28 PM, David Steele wrote:


Yes, and this is another behavioral change to consider -- one that is 
more likely to impact users than the change to pg_stop_backup().  If 
pg_basebackup is not currently waiting for WAL on a standby (but does on 
a primary) then that's pretty serious.


My bad, before PG10 pg_basebackup did not check that WAL was archived on 
a primary *or* a standby unless the -x option was specified, as documented.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with circular references in VIEW

2017-07-24 Thread Gilles Darold

Le 24/07/2017 à 21:18, Tom Lane a écrit :
> Gilles Darold  writes:
>> Le 24/07/2017 à 19:19, Tom Lane a écrit :
>>> ... I'm inclined to think in terms of fixing it at that level
>>> rather than in pg_dump.  It doesn't look like it would be hard to fix:
>>> both functions ultimately call get_query_def(), it's just that one passes
>>> down a tuple descriptor for the view while the other currently doesn't.
>> I was thinking that this was intentional that pg_get_ruledef() returns
>> the raw code typed by the user. I will fix it and send a patch following
>> your explanation.
> Oh, I just committed a patch.

That's fine, I'm sure it is better than the one I could produce :-)
Thanks for fixing this issue.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread Michael Paquier

On Mon, Jul 24, 2017 at 6:45 PM, Stephen Frost  wrote:
> What the change would do is make the pg_stop_backup() caller block until
> the last WAL is archvied, and perhaps that ends up taking hours, and
> then the connection is dropped for whatever reason and the backup fails
> where it otherwise what?  wouldn't have been valid anyway at that
> point, since it's not valid until the last WAL is actually archived.
> Perhaps eventually it would be archived and the caller was planning for
> that and everything is fine, but, well, that feels like an awful lot of
> wishful thinking.

Letting users taking unconsciously inconsistent backups is worse than
potentially breaking scripts that were actually not working as
Postgres would expect. So I am +1 for back-patching a lighter version
of the proposed patch that makes the wait happen on purpose.

>> > I'd hate to have to do it, but we could technically add a GUC to address
>> > this in the back-branches, no?  I'm not sure that's really worthwhile
>> > though..
>>
>> That would be mighty ugly.
>
> Oh, absolutely agreed.

Yes, let's avoid that. We are talking about a switch aimed at making
backups potentially inconsistent.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c

2017-07-24 Thread Tom Lane

Thomas Munro  writes:
> Ahh, I think I see it.  This is an EXEC_BACKEND build farm animal.
> Theory: After the backend we see had removed the scratch entry and
> before it had restored it, another backend started up and ran
> InitPredicateLocks(), which inserted a new scratch entry without
> interlocking.

Ouch.  Yes, I think you're probably right.  It needs to skip that if
IsUnderPostmaster.  Seems like there ought to be an Assert(!found)
there, too.  And I don't think I entirely like the fact that there's
no assertions about the found/not found cases below, either.

Will fix, unless you're already on it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread David Steele


On 7/24/17 12:28 PM, Stephen Frost wrote:


* David Steele (da...@pgmasters.net) wrote:


While this patch brings pg_stop_backup() in line with the
documentation, it also introduces a behavioral change compared to
9.6.  Currently, the default 9.6 behavior on a standby is to return
immediately from pg_stop_backup(), but this patch will cause it to
block by default instead.  Since action on the master may be
required to unblock the process, I see this as a pretty significant
change.  I still think we should fix and backpatch, but I wanted to
point this out.


This will need to be highlighted in the release notes for 9.6.4 also,
assuming there is agreement to back-patch this, and we'll need to update
the documentation in 9.6 also to be clearer about what happens on a
standby.


Agreed.


"If the WAL segments required for this backup have not been archived
then it might be necessary to force a segment switch on the
primary."


I'm a bit on the fence regarding the distinction here for the
backup-from-standby and this errhint().  The issue is that the WAL for
the backup hasn't been archived yet and that may be because of either:

archive_command is failing
OR
No WAL is getting generated

In either case, both of these apply to the primary and the standby.  As
such, I'm inclined to try to mention both in the original errhint()
instead of making two different ones.  I'm open to suggestions on this,
of course, but my thinking would be more like:

-
Either archive_command is failing or not enough WAL has been generated
to require a segment switch.  Run pg_switch_wal() to request a WAL
switch and monitor your logs to check that your archive_command is
executing properly.
-


Yes, that seems more concise.  I like the idea of not having to maintain 
two separate hints.



And then change pg_switch_wal()'s errhint for when it's run on a replica
to be:


HINT: WAL control functions cannont be executed during recovery; they
should be executed on the primary instead.



Looks good to me.  This explanation is useful in general.


2) At backup.sgml L1015 it says that pg_stop_backup() will
automatically switch the WAL segment.  There should be a caveat here
for standby backups, like:

When called on a master it terminates the backup mode and performs
an automatic switch to the next WAL segment.  The reason for the
switch is to arrange for the last WAL segment written during the
backup interval to be ready to archive.  When called on a standby
the WAL segment switch must be performed manually on the master if
it does not happen due to normal write activity.


s/master/primary/g


Yes.


3) The fact that this fix requires "archive_mode = always" seems
like a pretty big caveat, thought I don't have any ideas on how to
make it better without big code changes.  Maybe it would be help to
change:

+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived when archive_mode

To:

+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived *only* when archive_mode


I'm thinking of rewording that a bit to say "When archive_mode = always"
instead, as I think that might be a little clearer.


Sure.


Perhaps this should be noted in the pg_basebackup docs as well?


That seems like it's probably a good idea too, as people might wonder
why pg_basebackup hasn't finished yet if it's waiting for WAL to be
archived.


Yes, and this is another behavioral change to consider -- one that is 
more likely to impact users than the change to pg_stop_backup().  If 
pg_basebackup is not currently waiting for WAL on a standby (but does on 
a primary) then that's pretty serious.


Any thoughts on this, Magnus?

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c

2017-07-24 Thread Thomas Munro

On Tue, Jul 25, 2017 at 7:24 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Ahh, I think I see it.  This is an EXEC_BACKEND build farm animal.
>> Theory: After the backend we see had removed the scratch entry and
>> before it had restored it, another backend started up and ran
>> InitPredicateLocks(), which inserted a new scratch entry without
>> interlocking.
>
> Ouch.  Yes, I think you're probably right.  It needs to skip that if
> IsUnderPostmaster.  Seems like there ought to be an Assert(!found)
> there, too.  And I don't think I entirely like the fact that there's
> no assertions about the found/not found cases below, either.
>
> Will fix, unless you're already on it?

I was going to send a short patch that would test IsUnderPostmaster,
but I got lost down a rabbit hole trying to figure out how to make my
EXEC_BACKEND builds run on this machine...  Please go ahead.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with circular references in VIEW

2017-07-24 Thread Tom Lane

Gilles Darold  writes:
> Le 24/07/2017 à 19:19, Tom Lane a écrit :
>> ... I'm inclined to think in terms of fixing it at that level
>> rather than in pg_dump.  It doesn't look like it would be hard to fix:
>> both functions ultimately call get_query_def(), it's just that one passes
>> down a tuple descriptor for the view while the other currently doesn't.

> I was thinking that this was intentional that pg_get_ruledef() returns
> the raw code typed by the user. I will fix it and send a patch following
> your explanation.

Oh, I just committed a patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c

2017-07-24 Thread Thomas Munro

On Mon, Jul 24, 2017 at 11:51 AM, Thomas Munro
 wrote:
> On Sun, Jul 23, 2017 at 8:32 AM, Tom Lane  wrote:
>> Meanwhile, it's still pretty unclear what happened yesterday on
>> culicidae.
>
> That failure is indeed baffling.  The only code that inserts
> (HASH_ENTER[_NULL]) into PredicateLockTargetHash:
>
> 1.  CreatePredicateLock().  I would be a bug if that ever tried to
> insert a { 0, 0, 0, 0 } tag, and in any case it holds
> SerializablePredicateLockListLock in LW_SHARED.
>
> 2.  TransferPredicateLocksToNewTarget(), which removes and restores
> the scratch entry and also explicitly inserts a transferred entry.  It
> asserts that it holds SerializablePredicateLockListLock and is called
> only by PredicateLockPageSplit() which acquires it in LW_EXCLUSIVE.
>
> 3.  DropAllPredicateLocksFromTable(), which removes and restores the
> scratch entry and also explicitly inserts a transferred entry.
> Acquires SerializablePredicateLockListLock in LW_EXCLUSIVE.

Ahh, I think I see it.  This is an EXEC_BACKEND build farm animal.
Theory: After the backend we see had removed the scratch entry and
before it had restored it, another backend started up and ran
InitPredicateLocks(), which inserted a new scratch entry without
interlocking.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with circular references in VIEW

2017-07-24 Thread Gilles Darold

Le 24/07/2017 à 19:19, Tom Lane a écrit :
> Gilles Darold  writes:
>> There is an issue with version prior to 10 when dumping views with circular
>> references. I know that these views are now exported as views in 10 but they
>> are still exported as TABLE + RULE in prior versions. This conduct to the
>> following error when columns of sub-queries doesn't have the same aliases
>> names:
> The core of this issue, I think, is that pg_get_viewdef() knows that it
> should make what it prints have output column names that match the view,
> whereas pg_get_ruledef() does not, even when it is printing an ON SELECT
> rule.  This is a little bit surprising --- you'd really expect those
> functions to produce identical SELECT statements --- and I think it's
> likely to break other tools even if pg_dump has managed to skirt the
> issue.  So I'm inclined to think in terms of fixing it at that level
> rather than in pg_dump.  It doesn't look like it would be hard to fix:
> both functions ultimately call get_query_def(), it's just that one passes
> down a tuple descriptor for the view while the other currently doesn't.

I was thinking that this was intentional that pg_get_ruledef() returns
the raw code typed by the user. I will fix it and send a patch following
your explanation.

Thanks.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Definitional questions for pg_sequences view

2017-07-24 Thread Peter Eisentraut

On 7/20/17 16:36, Tom Lane wrote:
> What exactly is the point of the new pg_sequences view?

It is analogous to pg_tables, pg_matviews, pg_indexes, and other such
system views that are sort of half-way between system catalogs and
information schema.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-24 Thread Peter Geoghegan

On Mon, Jul 24, 2017 at 10:50 AM, Joshua D. Drake  
wrote:
> Does this suggest that we don't have a cleanup problem but a fragmentation
> problem (or both at least for the index)? Having an index that is almost
> twice the uncleaned up size isn't that uncommon.

As Tom pointed out up-thread, it's important to distinguish between
inherent overhead, and overhead due to garbage that needs to be
cleaned-up by vacuum. It's really hard to delineate which is which
here, and I'm not going to try to put a number on it. What I will
point out is that you can see quite a significant difference between
the space utilization of a B-Tree without any dead tuples, just from
the order in which tuples are initially inserted.

You can get about a 1/3 loss of space by inserting randomly, rather
than inserting in sorted order, which is what REINDEX will more or
less do for you. That's because random workloads almost entirely get
50:50 page splits, whereas sorted input will always split the
rightmost page, and so will always get 90:10 splits. The space in the
random case isn't exactly wasted; it's there for the taking, for key
values that happen to fit on the page. You effectively require a
larger average reserve of free space on pages with the random
workload, because the implementation does not and cannot reason that
it would be best to concentrate free space in parts of the keyspace
where there is most need for it.

That having been said, I do think that this workload suffers from
index bloat in a way that isn't so easily explained. It does seem to
be an issue with VACUUM controlling bloat in the index in particular.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-24 Thread Andres Freund

On 2017-07-24 13:27:58 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >> I seriously, seriously, seriously dislike that.  You practically might as
> >> well put miscadmin.h into postgres.h.  Instead, what do you think of
> >> requiring the individual ExecProcNode functions to perform
> >> CHECK_FOR_INTERRUPTS?  Since they're already responsible for doing that
> >> if they have any long-running internal loops, this doesn't seem like a
> >> modularity violation.  It is a risk for bugs-of-omission, sure, but so
> >> are a lot of other things that the per-node code has to do.
> 
> > That'd work. Another alternative would be to move the inline definition
> > of ExecProcNode() (and probably a bunch of other related functions) into
> > a more internals oriented header. It seems likely that we're going to
> > add more inline functions to the executor, and that'd reduce the
> > coupling of external and internal users a bit.
> 
> Well, it still ends up that the callers of ExecProcNode need to include
> miscadmin.h, whereas if we move it into the per-node functions, then the
> per-node files need to include miscadmin.h.  I think the latter is better
> because those files may need to have other CHECK_FOR_INTERRUPTS calls
> anyway.

> It's less clear from a modularity standpoint that executor
> callers should need miscadmin.h.

Well, that's why I'm pondering an executor_internal.h or something -
there shouldn't be ExecProcNode() callers that don't also need CFI(),
and no executor callers should need ExecProcNode(). executor.h right now
really defines infrastructure to *use* the executor
(Executor{Start,Run,Finish,End,Rewind}), functions internal to the
executor (lots of initialization functions, EPQ, partition logic), some
things inbetween (e.g. expression related stuff), and some things that
really should be separate ExecOpenIndices etc, execReplication.c
functions. But that's not something we can easily clear up just now.

> (Or in short, I'm not really okay with *any* header file including
> miscadmin.h.)

Perhaps that's a sign that we should split it up? It's a weird grab bag
atm. utils/interrupt.h or such would e.g. make sense for for the
*INTERRUPTS, and *CRIT_SECTION macros, as well as ProcessInterrupts()
itself, which imo isn't super well placed in postgres.c
either. Including utils/interrupt.h in a header would be much less
odious in my opinion than including miscadmin.h.

> >> * Can we redefine the ExecCustomScan function pointer as type
> >> ExecProcNodeCB, eliminating the unsightly cast in nodeCustom.c?
> 
> > That'd change an "extension API", which is why I skipped it at this
> > point of the release cycle. It's not like we didn't have this type of
> > cast all over before. Ok, with changing it, but that's where I came
> > down.
> 
> Is this patch really not changing anything else that a custom-scan
> extension would touch?  If not, I'm okay with postponing this bit
> of cleanup to v11.

Not that I can see - I've build & tested citus which uses custom scans
these days with and without patch without trouble.  Nor do I see any
change in the current patch that'd be troublesome - after all the
API of ExecProcNode() stays the same.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2017-07-24 Thread Claudio Freire

On Mon, Jul 24, 2017 at 2:43 PM, Peter Geoghegan  wrote:
> On Mon, Jul 24, 2017 at 10:13 AM, Claudio Freire  
> wrote:
>> Vacuum *might* be able to redistribute tuples too, while holding locks
>> to all 3 pages (the two children and the parent page), since it can
>> move the TID to the middle point of two actual child TIDs, mindlessly,
>> without checking to see if such TID actually exists - it just needs to
>> choose a TID cutoff point that will distribute item pointers evently.
>> I haven't tried this, but it is theoretically possible.
>
> I don't understand what you mean here. As long as the TIDs are a first
> class part of the keyspace, what VACUUM does or doesn't do should not
> matter. Page deletion stashes a TID in the highkey position of a leaf
> page within _bt_mark_page_halfdead(), but that shouldn't matter to
> you.
>
> I guess you're talking about contriving a TID value that is the mean
> of two real TID values -- the two that are on each side of the split
> point during a leaf page split. While that seems possible, I don't see
> much value in it. Unless you have normalized keys, you can only really
> truncate whole attributes. And, I think it's a bad idea to truncate
> anything other than the new high key for leaf pages, with or without
> normalized keys. Changing the keys once they're in internal pages is
> never going to be worth it.

That's what I'm saying. It might not be worth it. I haven't tested yet.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-24 Thread Joshua D. Drake


On 07/23/2017 12:03 PM, Joshua D. Drake wrote:

As you can see even with aggressive vacuuming, over a period of 36 hours 
life gets increasingly miserable.


The largest table is:

postgres=# select 
pg_size_pretty(pg_total_relation_size('bmsql_order_line'));

  pg_size_pretty

  148 GB
(1 row)



[snip]


With the PK being

postgres=# select 
pg_size_pretty(pg_relation_size('bmsql_order_line_pkey'));

  pg_size_pretty

  48 GB
(1 row)

I tried to see how much data we are dealing with here:


-hackers,

I cleaned up the table with VACUUM FULL and ended up with the following:

postgres=# select 
pg_size_pretty(pg_total_relation_size('bmsql_order_line'));

 pg_size_pretty

 118 GB
(1 row)

postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey'));
 pg_size_pretty

 27 GB
(1 row)

Does this suggest that we don't have a cleanup problem but a 
fragmentation problem (or both at least for the index)? Having an index 
that is almost twice the uncleaned up size isn't that uncommon.


Thanks in advance,

JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2017-07-24 Thread Peter Geoghegan

On Mon, Jul 24, 2017 at 10:13 AM, Claudio Freire  wrote:
> In most cases, the TID itself can be omitted when the key itself
> differs already.
>
> When a split happens, a TID will be added referring to a real tid from
> a child page iff necessary.
>
> This gives a lot of leeway in choosing the cutoff point, since a
> cutoff point is only added when necessary.

I agree with all that. That just sounds like a basic implementation of
suffix truncation, to support making heap TID a part of the keyspace
without paying a big cost in fan-in.

> Vacuum *might* be able to redistribute tuples too, while holding locks
> to all 3 pages (the two children and the parent page), since it can
> move the TID to the middle point of two actual child TIDs, mindlessly,
> without checking to see if such TID actually exists - it just needs to
> choose a TID cutoff point that will distribute item pointers evently.
> I haven't tried this, but it is theoretically possible.

I don't understand what you mean here. As long as the TIDs are a first
class part of the keyspace, what VACUUM does or doesn't do should not
matter. Page deletion stashes a TID in the highkey position of a leaf
page within _bt_mark_page_halfdead(), but that shouldn't matter to
you.

I guess you're talking about contriving a TID value that is the mean
of two real TID values -- the two that are on each side of the split
point during a leaf page split. While that seems possible, I don't see
much value in it. Unless you have normalized keys, you can only really
truncate whole attributes. And, I think it's a bad idea to truncate
anything other than the new high key for leaf pages, with or without
normalized keys. Changing the keys once they're in internal pages is
never going to be worth it.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-24 Thread Tom Lane

Andres Freund  writes:
> On 2017-07-21 20:17:54 -0400, Tom Lane wrote:
>>> I dislike having the miscadmin.h include in executor.h - but I don't
>>> quite see a better alternative.

>> I seriously, seriously, seriously dislike that.  You practically might as
>> well put miscadmin.h into postgres.h.  Instead, what do you think of
>> requiring the individual ExecProcNode functions to perform
>> CHECK_FOR_INTERRUPTS?  Since they're already responsible for doing that
>> if they have any long-running internal loops, this doesn't seem like a
>> modularity violation.  It is a risk for bugs-of-omission, sure, but so
>> are a lot of other things that the per-node code has to do.

> That'd work. Another alternative would be to move the inline definition
> of ExecProcNode() (and probably a bunch of other related functions) into
> a more internals oriented header. It seems likely that we're going to
> add more inline functions to the executor, and that'd reduce the
> coupling of external and internal users a bit.

Well, it still ends up that the callers of ExecProcNode need to include
miscadmin.h, whereas if we move it into the per-node functions, then the
per-node files need to include miscadmin.h.  I think the latter is better
because those files may need to have other CHECK_FOR_INTERRUPTS calls
anyway.  It's less clear from a modularity standpoint that executor
callers should need miscadmin.h.  (Or in short, I'm not really okay
with *any* header file including miscadmin.h.)

>> * I think the comments need more work.  Am willing to make a pass over
>> that if you want.

> That'd be good, but let's wait till we have something more final.

Agreed, I'll wait till you produce another version.

>> * Can we redefine the ExecCustomScan function pointer as type
>> ExecProcNodeCB, eliminating the unsightly cast in nodeCustom.c?

> That'd change an "extension API", which is why I skipped it at this
> point of the release cycle. It's not like we didn't have this type of
> cast all over before. Ok, with changing it, but that's where I came
> down.

Is this patch really not changing anything else that a custom-scan
extension would touch?  If not, I'm okay with postponing this bit
of cleanup to v11.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Claudio Freire

On Mon, Jul 24, 2017 at 2:20 PM, Claudio Freire  wrote:
> On Mon, Jul 24, 2017 at 2:10 PM, Sokolov Yura
>  wrote:
>> On 2017-07-24 19:11, Claudio Freire wrote:
>>> I was mostly thinking about something like the attached patch.
>>>
>>> Simple, unintrusive, and shouldn't cause any noticeable slowdown.
>>
>>
>> Your change is small, clear, and currently useful for huge tables under
>> high update load (until "allowing vacuum to use more than 1GB memory"
>> is merged).
>
> In high-bloat conditions, it doesn't take long to accumulate 1GB of
> dead tuples (which is about 178M tuples, btw).
>
> The index scan takes way longer than the heap scan in that case.
>
>> But it still delays updating fsm until whole first batch of dead tuples
>> cleared (ie all indices scanned, and all heap pages cleared), and on such
>> huge table it will be hours.
>
> So, true, it will get delayed considerably. But as you realized,
> there's not much point in trying to vacuum the FSM sooner, since it
> won't be accurate shortly afterwards anyway. Dead line pointers do use
> up a fair bit of space, especially on narrow tables.
>
> In a particular table I have that exhibits this problem, most of the
> time is spent scanning the index. It performs dozens of index scans
> before it's done, so it would vacuum the FSM quite often enough, even
> if I were to increase the mwm setting n-fold.

I hate to reply to myself, but I wanted to add: in any case, the case
I'm trying to avoid is the case where the FSM *never* gets vacuumed.
That's bad. But it may not be the phenomenon you're experiencing in
your tests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Claudio Freire

On Mon, Jul 24, 2017 at 2:10 PM, Sokolov Yura
 wrote:
> On 2017-07-24 19:11, Claudio Freire wrote:
>>
>> On Mon, Jul 24, 2017 at 6:37 AM, Sokolov Yura
>>  wrote:
>>>
>>> Good day, Claudio
>>>
>>>
>>> On 2017-07-22 00:27, Claudio Freire wrote:

 On Fri, Jul 21, 2017 at 2:41 PM, Sokolov Yura
  wrote:
>
>
>
> My friend noticed, that I didn't said why I bother with autovacuum.
> Our customers suffers from table bloating. I've made synthetic
> bloating test, and started experiments with modifying micro- and
> auto-vacuum. My first attempts were to update FSM early (both in
> micro and autovacuum) and update it upto root, not only low level.

 This FSM thing is probably not a bad idea as well.

 We're forced to run regular manual vacuums because for some tables
 autovacuums seems to never be enough, no matter how it's configured,
 mostly because it gets canceled all the time. These are high-churn,
 huge tables, so vacuuming them takes hours or days, there's always
 someone with a conflicting lock at some point that ends up canceling
 the autovacuum task.

 The above paragraph triggered me to go check, and it seems in those
 cases the FSM never gets vacuumed. That's probably not a good thing,
 but I don't see how to vacuum the FSM after a cancel. So vacuuming the
 FSM from time to time during long-running vacuums seems like a good
 idea at this point.
>>>
>>>
>>>
>>> Attached patch changes fsm update: instead of updating only lowest
>>> level, it propagates space increase up to root.
>>>
>>> It slows autovacuum a bit, so that I didn't propose it together with
>>> ring buffer increase.
>>
>>
>> I was mostly thinking about something like the attached patch.
>>
>> Simple, unintrusive, and shouldn't cause any noticeable slowdown.
>
>
> Your change is small, clear, and currently useful for huge tables under
> high update load (until "allowing vacuum to use more than 1GB memory"
> is merged).

In high-bloat conditions, it doesn't take long to accumulate 1GB of
dead tuples (which is about 178M tuples, btw).

The index scan takes way longer than the heap scan in that case.

> But it still delays updating fsm until whole first batch of dead tuples
> cleared (ie all indices scanned, and all heap pages cleared), and on such
> huge table it will be hours.

So, true, it will get delayed considerably. But as you realized,
there's not much point in trying to vacuum the FSM sooner, since it
won't be accurate shortly afterwards anyway. Dead line pointers do use
up a fair bit of space, especially on narrow tables.

In a particular table I have that exhibits this problem, most of the
time is spent scanning the index. It performs dozens of index scans
before it's done, so it would vacuum the FSM quite often enough, even
if I were to increase the mwm setting n-fold.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with circular references in VIEW

2017-07-24 Thread Tom Lane

Gilles Darold  writes:
> There is an issue with version prior to 10 when dumping views with circular
> references. I know that these views are now exported as views in 10 but they
> are still exported as TABLE + RULE in prior versions. This conduct to the
> following error when columns of sub-queries doesn't have the same aliases
> names:

The core of this issue, I think, is that pg_get_viewdef() knows that it
should make what it prints have output column names that match the view,
whereas pg_get_ruledef() does not, even when it is printing an ON SELECT
rule.  This is a little bit surprising --- you'd really expect those
functions to produce identical SELECT statements --- and I think it's
likely to break other tools even if pg_dump has managed to skirt the
issue.  So I'm inclined to think in terms of fixing it at that level
rather than in pg_dump.  It doesn't look like it would be hard to fix:
both functions ultimately call get_query_def(), it's just that one passes
down a tuple descriptor for the view while the other currently doesn't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Jeff Janes

On Thu, Jul 20, 2017 at 12:51 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > I think that's a valid point.  There are also other concerns here -
> > e.g. whether instead of adopting the patch as proposed we ought to (a)
> > use some smaller size, or (b) keep the size as-is but reduce the
> > maximum fraction of shared_buffers that can be consumed, or (c) divide
> > the ring buffer size through by autovacuum_max_workers.  Personally,
> > of those approaches, I favor (b).  I think a 16MB ring buffer is
> > probably just fine if you've got 8GB of shared_buffers but I'm
> > skeptical about it when you've got 128MB of shared_buffers.
>
> WFM.  I agree with *not* dividing the basic ring buffer size by
> autovacuum_max_workers.  If you have allocated more AV workers, I think
> you expect AV to go faster, not for the workers to start fighting among
> themselves.
>

But fighting among themselves is just what they do regarding the
autovacuum_vacuum_cost_limit, so I don't see why it should be one way there
but different here.  The reason for setting autovacuum_max_workers to N is
so that small tables aren't completely starved of vacuuming even if N-1
larger tables are already being vacuumed simultaneously.  Now the small
tables get vacuumed at speed 1/N, which kind of sucks, but that is the
mechanism we currently have.

Of course just because we are in a hole with vacuum_cost_limit doesn't mean
we should dig ourselves deeper, but we are being inconsistent then.

Cheers,

Jeff

Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2017-07-24 Thread Claudio Freire

On Mon, Jul 24, 2017 at 2:02 PM, Peter Geoghegan  wrote:
> On Mon, Jul 24, 2017 at 9:51 AM, Claudio Freire  
> wrote:
>> My point was that the TID doesn't have to point to an actual tuple.
>>
>> It's more of a keyspace thing, so it doesn't need to match real
>> tuples, it can just divide the keyspace with an arbitrary cutoff, and
>> should be cheapter to maintain without that requirement.
>
> I agree, but unless you're using normalized keys, then I don't see
> that you get much useful leeway from using fake or truncated TID
> values. Presumably the comparison logic will be based on comparing an
> ItemPointerData field, which is impractical to truncate.

In most cases, the TID itself can be omitted when the key itself
differs already.

When a split happens, a TID will be added referring to a real tid from
a child page iff necessary.

This gives a lot of leeway in choosing the cutoff point, since a
cutoff point is only added when necessary.

Vacuum *might* be able to redistribute tuples too, while holding locks
to all 3 pages (the two children and the parent page), since it can
move the TID to the middle point of two actual child TIDs, mindlessly,
without checking to see if such TID actually exists - it just needs to
choose a TID cutoff point that will distribute item pointers evently.
I haven't tried this, but it is theoretically possible.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Sokolov Yura


On 2017-07-24 19:11, Claudio Freire wrote:

On Mon, Jul 24, 2017 at 6:37 AM, Sokolov Yura
 wrote:

Good day, Claudio


On 2017-07-22 00:27, Claudio Freire wrote:


On Fri, Jul 21, 2017 at 2:41 PM, Sokolov Yura
 wrote:



My friend noticed, that I didn't said why I bother with autovacuum.
Our customers suffers from table bloating. I've made synthetic
bloating test, and started experiments with modifying micro- and
auto-vacuum. My first attempts were to update FSM early (both in
micro and autovacuum) and update it upto root, not only low level.



This FSM thing is probably not a bad idea as well.

We're forced to run regular manual vacuums because for some tables
autovacuums seems to never be enough, no matter how it's configured,
mostly because it gets canceled all the time. These are high-churn,
huge tables, so vacuuming them takes hours or days, there's always
someone with a conflicting lock at some point that ends up canceling
the autovacuum task.

The above paragraph triggered me to go check, and it seems in those
cases the FSM never gets vacuumed. That's probably not a good thing,
but I don't see how to vacuum the FSM after a cancel. So vacuuming 
the

FSM from time to time during long-running vacuums seems like a good
idea at this point.



Attached patch changes fsm update: instead of updating only lowest
level, it propagates space increase up to root.

It slows autovacuum a bit, so that I didn't propose it together with
ring buffer increase.


I was mostly thinking about something like the attached patch.

Simple, unintrusive, and shouldn't cause any noticeable slowdown.


Your change is small, clear, and currently useful for huge tables under
high update load (until "allowing vacuum to use more than 1GB memory"
is merged).

But it still delays updating fsm until whole first batch of dead tuples
cleared (ie all indices scanned, and all heap pages cleared), and on 
such

huge table it will be hours.

On the other hand, if "dead" tuples consumes all useful item pointer (
MaxHeapTuplesPerPage ~ 290 on 8k page), then space, that actually exists
on a page, could not be used until "dead" tuples are converted into
"unused" tuples.

With my patch I've seen that writing FSM until dead tuples cleared
helps a little: while bloating is slowed a little by this change, it
is stopped only after final cleaning of dead tuples.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2017-07-24 Thread Peter Geoghegan

On Mon, Jul 24, 2017 at 9:51 AM, Claudio Freire  wrote:
> My point was that the TID doesn't have to point to an actual tuple.
>
> It's more of a keyspace thing, so it doesn't need to match real
> tuples, it can just divide the keyspace with an arbitrary cutoff, and
> should be cheapter to maintain without that requirement.

I agree, but unless you're using normalized keys, then I don't see
that you get much useful leeway from using fake or truncated TID
values. Presumably the comparison logic will be based on comparing an
ItemPointerData field, which is impractical to truncate.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2017-07-24 Thread Claudio Freire

On Wed, Nov 23, 2016 at 12:27 AM, Peter Geoghegan  wrote:
> On Mon, Nov 21, 2016 at 5:15 PM, Claudio Freire  
> wrote:
>>> There are a couple
>>> of tricky issues with that that you'd have to look out for, like
>>> making sure that the high key continues to hold a real TID, which at a
>>> glance looks like something that just happens already. I'm not sure
>>> that we preserve the item pointer TID in all cases, since the current
>>> assumption is that a high key's TID is just filler -- maybe we can
>>> lose that at some point.
>>
>> I thought long about that, and inner pages don't need a real TID in
>> their keys, as those keys only specify key space cutoff points and not
>> real tids, and high tids in leaf pages are always real.
>
> Not if there are duplicates in the inner pages. Then, you have to add
> in the TID, and separate the key space, to guide insertions to the
> right leaf page directly (particularly for non-HOT UPDATEs). That's
> what I'm mostly interested in investigating, here.

My point was that the TID doesn't have to point to an actual tuple.

It's more of a keyspace thing, so it doesn't need to match real
tuples, it can just divide the keyspace with an arbitrary cutoff, and
should be cheapter to maintain without that requirement.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Jul 24, 2017 at 12:31 PM, Stephen Frost  wrote:
> > Those backup scripts might very well be, today, producing invalid
> > backups though, which would be much less good..
> 
> True.  However, I suspect that depends on what procedure is actually
> being followed.  If *everyone* who is using this is getting corrupt
> backups, then of course changing the behavior is a no-brainer.  But if
> *some* people are getting correct backups and *some* people are
> getting incorrect backups, depending on procedure, then I think
> changing it is unwise.  We should optimize for the case of a user who
> is currently doing something smart, not one who is doing something
> dumb.

I'm not sure that we can call "writing code that depends on what the
docs say" dumb, unfortunately, and if even one person is getting an
invalid backup while following what the docs say then that's a strong
case to do *something*.  Of course, we also don't want to break the
scripts of people who are doing things correctly, but I'm trying to
figure out exactly how this change would break such scripts.

What the change would do is make the pg_stop_backup() caller block until
the last WAL is archvied, and perhaps that ends up taking hours, and
then the connection is dropped for whatever reason and the backup fails
where it otherwise what?  wouldn't have been valid anyway at that
point, since it's not valid until the last WAL is actually archived.
Perhaps eventually it would be archived and the caller was planning for
that and everything is fine, but, well, that feels like an awful lot of
wishful thinking.

And that's assuming that the script author made sure to write additional
code that didn't mark the backup as valid until the ending WAL segment
from pg_stop_backup() was confirmed to have been archived.

Last, but perhaps not least, this is at least just an issue going back
to where pg_start/stop_backup was allowed on replicas, which is only 9.6
and therefore it's been out less than a year and anyone's script which
might break due to this would at least be relatively new code.

> > I'd hate to have to do it, but we could technically add a GUC to address
> > this in the back-branches, no?  I'm not sure that's really worthwhile
> > though..
> 
> That would be mighty ugly.

Oh, absolutely agreed.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread Robert Haas

On Mon, Jul 24, 2017 at 12:31 PM, Stephen Frost  wrote:
> Those backup scripts might very well be, today, producing invalid
> backups though, which would be much less good..

True.  However, I suspect that depends on what procedure is actually
being followed.  If *everyone* who is using this is getting corrupt
backups, then of course changing the behavior is a no-brainer.  But if
*some* people are getting correct backups and *some* people are
getting incorrect backups, depending on procedure, then I think
changing it is unwise.  We should optimize for the case of a user who
is currently doing something smart, not one who is doing something
dumb.

> I'd hate to have to do it, but we could technically add a GUC to address
> this in the back-branches, no?  I'm not sure that's really worthwhile
> though..

That would be mighty ugly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Jul 24, 2017 at 11:40 AM, David Steele  wrote:
> > Before reviewing the patch, I would note that it looks like this issue was
> > introduced in b6a323a8c before the 9.6 release.  The documentation states
> > that the pg_stop_backup() function will wait for all required WAL to be
> > archived, which corresponds to the default in the new patch of
> > waitforarchive = true.  The commit notes that the documentation needs
> > updating, but since that didn't happen I think we should consider this a bug
> > in 9.6 and back patch.
> >
> > While this patch brings pg_stop_backup() in line with the documentation, it
> > also introduces a behavioral change compared to 9.6.  Currently, the default
> > 9.6 behavior on a standby is to return immediately from pg_stop_backup(),
> > but this patch will cause it to block by default instead.  Since action on
> > the master may be required to unblock the process, I see this as a pretty
> > significant change.  I still think we should fix and backpatch, but I wanted
> > to point this out.
> 
> Hmm.  That seems to me like it could break backup scripts that are
> currently working, which seems like a *very* unfriendly thing to do in
> a minor release, especially since 9.6 has no option to override the
> default behavior (nor can we add one, since it would require a change
> to catalog contents).

Those backup scripts might very well be, today, producing invalid
backups though, which would be much less good..

I'd hate to have to do it, but we could technically add a GUC to address
this in the back-branches, no?  I'm not sure that's really worthwhile
though..

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread Stephen Frost

David,

* David Steele (da...@pgmasters.net) wrote:
> On 7/23/17 11:48 PM, Masahiko Sawada wrote:
> >On Sat, Jul 22, 2017 at 8:04 AM, Stephen Frost  wrote:
> >>
> >>I started discussing this with David off-list and he'd like a chance to
> >>review it in a bit more detail (he's just returning from being gone for
> >>a few weeks).  That review will be posted to this thread on Monday, and
> >>I'll wait until then to move forward with the patch.
> 
> Before reviewing the patch, I would note that it looks like this
> issue was introduced in b6a323a8c before the 9.6 release.  The
> documentation states that the pg_stop_backup() function will wait
> for all required WAL to be archived, which corresponds to the
> default in the new patch of waitforarchive = true.  The commit notes
> that the documentation needs updating, but since that didn't happen
> I think we should consider this a bug in 9.6 and back patch.

I tend to agree with this.  The documentation clearly says that
pg_stop_backup() waits for all WAL to be archived, but, today, if it's
run on a standby then it doesn't wait, which might lead to invalid
backups if the backup software is depending on that.

> While this patch brings pg_stop_backup() in line with the
> documentation, it also introduces a behavioral change compared to
> 9.6.  Currently, the default 9.6 behavior on a standby is to return
> immediately from pg_stop_backup(), but this patch will cause it to
> block by default instead.  Since action on the master may be
> required to unblock the process, I see this as a pretty significant
> change.  I still think we should fix and backpatch, but I wanted to
> point this out.

This will need to be highlighted in the release notes for 9.6.4 also,
assuming there is agreement to back-patch this, and we'll need to update
the documentation in 9.6 also to be clearer about what happens on a
standby.

> The patch looks sensible to me.  A few comments:
> 
> 1) I would change:
> 
> "Check if the WAL segment needed for this backup have been
> completed, in which case a forced segment switch may be needed on
> the primary."
> 
> To something like:
> 
> "If the WAL segments required for this backup have not been archived
> then it might be necessary to force a segment switch on the
> primary."

I'm a bit on the fence regarding the distinction here for the
backup-from-standby and this errhint().  The issue is that the WAL for
the backup hasn't been archived yet and that may be because of either:

archive_command is failing
OR
No WAL is getting generated

In either case, both of these apply to the primary and the standby.  As
such, I'm inclined to try to mention both in the original errhint()
instead of making two different ones.  I'm open to suggestions on this,
of course, but my thinking would be more like:

-
Either archive_command is failing or not enough WAL has been generated
to require a segment switch.  Run pg_switch_wal() to request a WAL
switch and monitor your logs to check that your archive_command is
executing properly.
-

And then change pg_switch_wal()'s errhint for when it's run on a replica
to be:

HINT: WAL control functions cannont be executed during recovery; they
should be executed on the primary instead.

(or similar, again, open to suggestions here).

> 2) At backup.sgml L1015 it says that pg_stop_backup() will
> automatically switch the WAL segment.  There should be a caveat here
> for standby backups, like:
> 
> When called on a master it terminates the backup mode and performs
> an automatic switch to the next WAL segment.  The reason for the
> switch is to arrange for the last WAL segment written during the
> backup interval to be ready to archive.  When called on a standby
> the WAL segment switch must be performed manually on the master if
> it does not happen due to normal write activity.

s/master/primary/g

Otherwise it looks alright..  Might try to reword to use similar
language as to what we have today in 25.3.3.1.

> 3) The fact that this fix requires "archive_mode = always" seems
> like a pretty big caveat, thought I don't have any ideas on how to
> make it better without big code changes.  Maybe it would be help to
> change:
> 
> + the backup is taken on a standby, pg_stop_backup waits
> + for WAL to be archived when archive_mode
> 
> To:
> 
> + the backup is taken on a standby, pg_stop_backup waits
> + for WAL to be archived *only* when archive_mode

I'm thinking of rewording that a bit to say "When archive_mode = always"
instead, as I think that might be a little clearer.

> Perhaps this should be noted in the pg_basebackup docs as well?

That seems like it's probably a good idea too, as people might wonder
why pg_basebackup hasn't finished yet if it's waiting for WAL to be
archived.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread Robert Haas

On Mon, Jul 24, 2017 at 11:40 AM, David Steele  wrote:
> Before reviewing the patch, I would note that it looks like this issue was
> introduced in b6a323a8c before the 9.6 release.  The documentation states
> that the pg_stop_backup() function will wait for all required WAL to be
> archived, which corresponds to the default in the new patch of
> waitforarchive = true.  The commit notes that the documentation needs
> updating, but since that didn't happen I think we should consider this a bug
> in 9.6 and back patch.
>
> While this patch brings pg_stop_backup() in line with the documentation, it
> also introduces a behavioral change compared to 9.6.  Currently, the default
> 9.6 behavior on a standby is to return immediately from pg_stop_backup(),
> but this patch will cause it to block by default instead.  Since action on
> the master may be required to unblock the process, I see this as a pretty
> significant change.  I still think we should fix and backpatch, but I wanted
> to point this out.

Hmm.  That seems to me like it could break backup scripts that are
currently working, which seems like a *very* unfriendly thing to do in
a minor release, especially since 9.6 has no option to override the
default behavior (nor can we add one, since it would require a change
to catalog contents).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-24 Thread Andres Freund

Hi,

On 2017-07-21 20:17:54 -0400, Tom Lane wrote:
> > I dislike having the miscadmin.h include in executor.h - but I don't
> > quite see a better alternative.
> 
> I seriously, seriously, seriously dislike that.  You practically might as
> well put miscadmin.h into postgres.h.  Instead, what do you think of
> requiring the individual ExecProcNode functions to perform
> CHECK_FOR_INTERRUPTS?  Since they're already responsible for doing that
> if they have any long-running internal loops, this doesn't seem like a
> modularity violation.  It is a risk for bugs-of-omission, sure, but so
> are a lot of other things that the per-node code has to do.

That'd work. Another alternative would be to move the inline definition
of ExecProcNode() (and probably a bunch of other related functions) into
a more internals oriented header. It seems likely that we're going to
add more inline functions to the executor, and that'd reduce the
coupling of external and internal users a bit.

> * I think the comments need more work.  Am willing to make a pass over
> that if you want.

That'd be good, but let's wait till we have something more final.

> * In most places, if there's an immediate return-if-trivial-case test,
> we check stack depth only after that.  There's no point in checking
> and then returning; either you already crashed, or you're at peak
> stack so far as this code path is concerned.

I went back/forth a bit on that one. The calling code might call other
functions that go deeper on the stack, which won't have the checks. Fine
with moving, just wanted to explain why I got there.

> * Can we redefine the ExecCustomScan function pointer as type
> ExecProcNodeCB, eliminating the unsightly cast in nodeCustom.c?

That'd change an "extension API", which is why I skipped it at this
point of the release cycle. It's not like we didn't have this type of
cast all over before. Ok, with changing it, but that's where I came
down.

> * The various callers of ExecScan() are pretty inconsistently coded.
> I don't care that much whether they use castNode() or just forcibly
> cast to ScanState*, but let's make them all look the same.

I tried changed the minimum, perfectly fine to move to castNode in a
wholesale manner.  Btw, I really want to get rid of ExecScan(), at least
as an external function. Does a lot of unnecessary things in a lot of
cases, and makes branch prediction a lot worse. Not v10 stuff tho.

> * I believe the usual term for what these function pointers are is
> "methods", not "callbacks".  Certainly we'd call them that if we
> were working in C++.

K.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Claudio Freire

On Mon, Jul 24, 2017 at 6:37 AM, Sokolov Yura
 wrote:
> Good day, Claudio
>
>
> On 2017-07-22 00:27, Claudio Freire wrote:
>>
>> On Fri, Jul 21, 2017 at 2:41 PM, Sokolov Yura
>>  wrote:
>>>
>>>
>>> My friend noticed, that I didn't said why I bother with autovacuum.
>>> Our customers suffers from table bloating. I've made synthetic
>>> bloating test, and started experiments with modifying micro- and
>>> auto-vacuum. My first attempts were to update FSM early (both in
>>> micro and autovacuum) and update it upto root, not only low level.
>>
>>
>> This FSM thing is probably not a bad idea as well.
>>
>> We're forced to run regular manual vacuums because for some tables
>> autovacuums seems to never be enough, no matter how it's configured,
>> mostly because it gets canceled all the time. These are high-churn,
>> huge tables, so vacuuming them takes hours or days, there's always
>> someone with a conflicting lock at some point that ends up canceling
>> the autovacuum task.
>>
>> The above paragraph triggered me to go check, and it seems in those
>> cases the FSM never gets vacuumed. That's probably not a good thing,
>> but I don't see how to vacuum the FSM after a cancel. So vacuuming the
>> FSM from time to time during long-running vacuums seems like a good
>> idea at this point.
>
>
> Attached patch changes fsm update: instead of updating only lowest
> level, it propagates space increase up to root.
>
> It slows autovacuum a bit, so that I didn't propose it together with
> ring buffer increase.

I was mostly thinking about something like the attached patch.

Simple, unintrusive, and shouldn't cause any noticeable slowdown.
From 5da264507058175e614f6ce7c77d2bd0491b1416 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 24 Jul 2017 13:09:10 -0300
Subject: [PATCH] Vacuum FSM after each index pass

This prevents concurrent writes from accumulating bloat due to
recently freed space being invisible in the FSM yet. When vacuum
can run for hours or days, this can make a huge difference.
---
 src/backend/commands/vacuumlazy.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index fabb2f8d52..4d8d90e833 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -735,6 +735,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			/* Remove tuples from heap */
 			lazy_vacuum_heap(onerel, vacrelstats);
 
+			/* Vacuum the Free Space Map */
+			FreeSpaceMapVacuum(onerel);
+
 			/*
 			 * Forget the now-vacuumed tuples, and press on, but be careful
 			 * not to reset latestRemovedXid since we want that value to be
-- 
2.12.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-24 Thread Tom Lane

Stephen Frost  writes:
> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>> On 3/6/17 03:33, Michael Banck wrote:
>>> Would this be a candidate for backpatching, or is the behaviour change
>>> in pg_dump trumping the issues it solves?

>> Unless someone literally has a materialized view on pg_policy, it
>> wouldn't make a difference, so I'm not very keen on bothering to
>> backpatch this.

> Agreed.

So actually, the problem with Jim's patch is that it doesn't fix the
problem.  pg_dump's attempts to REFRESH matviews will still fail in
common cases, because they still come out before GRANTs, because pg_dump
treats ACLs as a completely independent thing to be done last.  This
was noted as far back as 2015 (in a thread previously linked from this
thread), and it's also the cause of Jordan Gigov's current complaint at
https://www.postgresql.org/message-id/CA%2BnBocAmQ%2BOPNSKUzaaLa-6eGYVw5KqexWJaRoGvrvLyDir9gg%40mail.gmail.com

Digging around in the archives, I find that Kevin had already proposed
a fix in
https://www.postgresql.org/message-id/flat/20160202161407.2778.24659%40wrigleys.postgresql.org
which I didn't particularly care for, and apparently nobody else did
either.  But we really oughta do *something*.

The main problem with Kevin's fix, after thinking about it more, is that
it shoves matview refresh commands into the same final processing phase
where ACLs are done, which means that in a parallel restore they will not
be done in parallel.  That seems like a pretty serious objection, although
maybe not so serious that we'd be willing to accept a major rewrite in the
back branches to avoid it.

I'm wondering at this point about having restore create a fake DO_ACLS
object (fake in the sense that it isn't in the dump file) that would
participate normally in the dependency sort, and which we'd give a
priority before matview refreshes but after everything else.  "Restore"
of that object would perform the same operation we do now of running
through the whole TOC and emitting grants/revokes.  So it couldn't be
parallelized in itself (at least not without an additional batch of work)
but it could be treated as an indivisible parallelized task, and then the
matview refreshes could be parallelizable tasks after that.

There's also Peter's proposal of splitting up GRANTs from REVOKEs and
putting only the latter at the end.  I'm not quite convinced that that's
a good idea but it certainly merits consideration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-24 Thread Jeff Janes

On Sat, Jul 22, 2017 at 8:53 PM, Amit Kapila 
wrote:

> On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila 
> wrote:
> > On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes 
> wrote:
> >>
> >>
> >>
> >> Setting parallel_workers to 8 changes the threshold for the parallel to
> even
> >> be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076.  So it is
> >> going in the correct direction, but not by enough to matter.
> >>
> >
> > You might want to play with cpu_tuple_cost and or seq_page_cost.
> >
>
> I don't know whether the patch will completely solve your problem, but
> this seems to be the right thing to do.  Do you think we should stick
> this for next CF?
>

It doesn't solve the problem for me, but I agree it is an improvement we
should commit.

Cheers,

Jeff

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread David Steele


On 7/23/17 11:48 PM, Masahiko Sawada wrote:

On Sat, Jul 22, 2017 at 8:04 AM, Stephen Frost  wrote:


I started discussing this with David off-list and he'd like a chance to
review it in a bit more detail (he's just returning from being gone for
a few weeks).  That review will be posted to this thread on Monday, and
I'll wait until then to move forward with the patch.


Before reviewing the patch, I would note that it looks like this issue 
was introduced in b6a323a8c before the 9.6 release.  The documentation 
states that the pg_stop_backup() function will wait for all required WAL 
to be archived, which corresponds to the default in the new patch of 
waitforarchive = true.  The commit notes that the documentation needs 
updating, but since that didn't happen I think we should consider this a 
bug in 9.6 and back patch.


While this patch brings pg_stop_backup() in line with the documentation, 
it also introduces a behavioral change compared to 9.6.  Currently, the 
default 9.6 behavior on a standby is to return immediately from 
pg_stop_backup(), but this patch will cause it to block by default 
instead.  Since action on the master may be required to unblock the 
process, I see this as a pretty significant change.  I still think we 
should fix and backpatch, but I wanted to point this out.


The patch looks sensible to me.  A few comments:

1) I would change:

"Check if the WAL segment needed for this backup have been completed, in 
which case a forced segment switch may be needed on the primary."


To something like:

"If the WAL segments required for this backup have not been archived 
then it might be necessary to force a segment switch on the primary."


2) At backup.sgml L1015 it says that pg_stop_backup() will automatically 
switch the WAL segment.  There should be a caveat here for standby 
backups, like:


When called on a master it terminates the backup mode and performs an 
automatic switch to the next WAL segment.  The reason for the switch is 
to arrange for the last WAL segment written during the backup interval 
to be ready to archive.  When called on a standby the WAL segment switch 
must be performed manually on the master if it does not happen due to 
normal write activity.


3) The fact that this fix requires "archive_mode = always" seems like a 
pretty big caveat, thought I don't have any ideas on how to make it 
better without big code changes.  Maybe it would be help to change:


+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived when archive_mode

To:

+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived *only* when archive_mode

Perhaps this should be noted in the pg_basebackup docs as well?

Regards,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-24 Thread Mat Arye

On Sat, Jul 22, 2017 at 10:50 AM, Robert Haas  wrote:

> On Fri, Jul 21, 2017 at 4:17 PM, Mat Arye  wrote:
> > (I
> > want to avoid having to keep backwards-compatibility for all functions in
> > future shared-libraries).
>
> Are you sure that's a good idea?

No :). But we have a lot of (most of) code that is not
user-called-functions (planner/other hooks etc.). It seems dangerous to
update that code in the .so and have it possibly affect customers that are
using old versions of the extension. While it's possible to do that kind of
_v1 suffix code for planner functions as well, this seems like a nightmare
in terms of code maintenance (we already have 1000s of lines of C code). I
think a dynamic loader might be more work upfront but have major payoffs
for speed of development in the long term for us. It may also have
advantages in terms of update safety.  It's also worth noting that our C
code has some SPI upcalls, so keeping some sync between the sql and C code
is even more of an issue for us (if we can't make the dynamic/stub loader
approach work, this might be an anti-pattern and we may have to review
doing upcalls at all).

> It seems like swimming upstream
> against the design.  I mean, instead of creating a dispatcher library
> that loads either v1 or v2, maybe you could just put it all in one
> library, add a "v1" or "v2" suffix to the actual function names where
> appropriate, and then set up the SQL definitions to call the correct
> one.  I mean, it's the same thing, but with less chance of the dynamic
> loader ruining your day.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

[HACKERS] Issue with circular references in VIEW

2017-07-24 Thread Gilles Darold

Hi,

There is an issue with version prior to 10 when dumping views with circular
references. I know that these views are now exported as views in 10 but they
are still exported as TABLE + RULE in prior versions. This conduct to the
following error when columns of sub-queries doesn't have the same aliases
names:

ERROR:  SELECT rule's target entry 1 has different column name from
column "col_a"
DETAIL:  SELECT target entry is named "other_name1".

Here is the steps to reproduce:

CREATE TABLE t1 (f1 INT PRIMARY KEY, f2 text);
 
CREATE VIEW v_t1 (col_a, col_b) AS
 WITH win_query AS (
   SELECT
 1::INTEGER AS col1,
 'b' ::text AS col2
)
SELECT
   imp.col1 AS other_name1,
   imp.col2 AS other_name2
FROM win_query imp
UNION
SELECT
   2::INTEGER AS col1,
   'z'::text AS col2
UNION
SELECT * FROM t1 GROUP BY f1 ;

This is translated into the following code by pg_dump with PostgreSQL 9.x:

CREATE TABLE t1 (
f1 integer NOT NULL,
f2 text
);

CREATE TABLE v_t1 (
col_a integer,
col_b text
);

COPY t1 (f1, f2) FROM stdin;
\.

CREATE RULE "_RETURN" AS
ON SELECT TO v_t1 DO INSTEAD  WITH win_query AS (
 SELECT 1 AS col1,
'b'::text AS col2
)
 SELECT imp.col1 AS other_name1,
imp.col2 AS other_name2
   FROM win_query imp
UNION
 SELECT 2 AS col1,
'z'::text AS col2
UNION
 SELECT t1.f1,
t1.f2
   FROM t1
  GROUP BY t1.f1;

and this dump can't be restored because of the error reported above.

It is clear that the user is responsible of using wrong aliases but
this doesn't generate error at creation time, and looking at the view
through the call of pg_get_viewdef(), aliases are correctly rewritten:

test_view=# \d+ v_t1
  View "public.v_t1"
 Column |  Type   | Modifiers | Storage  | Description
+-+---+--+-
 col_a  | integer |   | plain|
 col_b  | text|   | extended |
View definition:
 WITH win_query AS (
 SELECT 1 AS col1,
'b'::text AS col2
)
 SELECT imp.col1 AS col_a,
imp.col2 AS col_b
   FROM win_query imp
UNION
 SELECT 2 AS col_a,
'z'::text AS col_b
UNION
 SELECT t1.f1 AS col_a,
t1.f2 AS col_b
   FROM t1
  GROUP BY t1.f1;


The rule code retrieved using pg_get_ruledef() reports the use of original
incorrect column's aliases:

CREATE RULE "_RETURN" AS
ON SELECT TO v_t1 DO INSTEAD  WITH win_query AS (
 SELECT 1 AS col1,
'b'::text AS col2
)
 SELECT imp.col1 AS other_name1,
imp.col2 AS other_name2
   FROM win_query imp
UNION
 SELECT 2 AS col1,
'z'::text AS col2
UNION
 SELECT t1.f1,
t1.f2
   FROM t1
  GROUP BY t1.f1;

PostgreSQL 10 now use views and no more table+rule, so call to
pg_get_viewdef() self fix this issue. My question is do this
method to export views will be back-ported to prior version or
should we have to fix it an other way?

In the last case does the use of pg_get_viewdef() to reconstruct the
_RETURN rule could be a simple fix? For example:

'CREATE RULE "_RETURN" AS
ON SELECT TO v_t1 DO INSTEAD %s;', pg_get_viewdef(...)

Of course manually rewriting the view and replace it fixes the issue
but I think that generating dump that can't be restored easily can
confuse users.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cache lookup failed error for partition key with custom opclass

2017-07-24 Thread Tom Lane

Rushabh Lathia  writes:
> PFA patch, where added elog() to add the error message same as all other
> places.

Some looking around says that this *isn't* the only place that just
blithely assumes that it will find an opfamily entry.  But I agree
that not checking for that isn't up to project standards.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] cache lookup failed error for partition key with custom opclass

2017-07-24 Thread Rushabh Lathia

Hi,

Consider the following test:

CREATE OR REPLACE FUNCTION dummy_binaryint4(a int4, b int4) RETURNS int4 AS
$$ BEGIN RETURN a; END; $$ LANGUAGE 'plpgsql' IMMUTABLE;

CREATE OPERATOR CLASS custom_opclass2 FOR TYPE int2 USING BTREE AS OPERATOR
1 = , FUNCTION 1 dummy_binaryint4(int4, int4);

t=# CREATE TABLE list_tab(a int2, b int) PARTITION BY LIST (a
custom_opclass2);
*ERROR:  cache lookup failed for function 0*

In the above test creating OP class type int2, but passing the function of
int4
type. During CREATE PARTITION, ComputePartitionAttrs() able to resolve the
opclass
for the partition key (partition key type is int2), but while looking for a
method for the int2 -
it unable to find the proper function and end up with the cache lookup
failed error.
Error coming from RelationBuildPartitionKey().

I think overall this is expected but still error can be better - like all
the other
places where get_opfamily_proc() unable to find valid function oid.

PFA patch, where added elog() to add the error message same as all other
places.


Thanks,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 43238dd..6bd93b0 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -945,6 +945,10 @@ RelationBuildPartitionKey(Relation relation)
    opclassform->opcintype,
    opclassform->opcintype,
    BTORDER_PROC);
+		if (!OidIsValid(funcid))	/* should not happen */
+			elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+ BTORDER_PROC, opclassform->opcintype, opclassform->opcintype,
+ opclassform->opcfamily);
 
 		fmgr_info(funcid, &key->partsupfunc[i]);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables

2017-07-24 Thread Amit Langote

On 2017/07/24 17:30, Etsuro Fujita wrote:
> On 2017/07/24 16:16, Amit Khandekar wrote:
>> On 24 July 2017 at 12:11, Amit Langote 
>> wrote:
>>> Attached is the updated version of your patch.
> 
> Good catch, Amit K. and Amit L.!
> 
>> Now that this is done, any particular reason it is not done in
>> ExecPartitionCheck() ? I see that there is a do_convert_tuple() in
>> that function, again without changing the slot descriptor.
> 
> Yeah, I think we would need that in ExecPartitionCheck() as well.

Yes, we need that there too.

Done that in the attached v3 (including the test where
ExecPartitionCheck() would have crashed without the patch).

Thanks,
Amit
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b22de78516..78cbcd1a32 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1879,6 +1879,7 @@ ExecPartitionCheck(ResultRelInfo *resultRelInfo, 
TupleTableSlot *slot,
if (map != NULL)
{
tuple = do_convert_tuple(tuple, map);
+   ExecSetSlotDescriptor(slot, tupdesc);
ExecStoreTuple(tuple, slot, InvalidBuffer, 
false);
}
}
@@ -1956,6 +1957,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
if (map != NULL)
{
tuple = do_convert_tuple(tuple, 
map);
+   ExecSetSlotDescriptor(slot, 
tupdesc);
ExecStoreTuple(tuple, slot, 
InvalidBuffer, false);
}
}
@@ -2003,6 +2005,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
if (map != NULL)
{
tuple = do_convert_tuple(tuple, map);
+   ExecSetSlotDescriptor(slot, tupdesc);
ExecStoreTuple(tuple, slot, 
InvalidBuffer, false);
}
}
@@ -2112,6 +2115,7 @@ ExecWithCheckOptions(WCOKind kind, ResultRelInfo 
*resultRelInfo,
if (map != NULL)
{
tuple = 
do_convert_tuple(tuple, map);
+   
ExecSetSlotDescriptor(slot, tupdesc);
ExecStoreTuple(tuple, 
slot, InvalidBuffer, false);
}
}
diff --git a/src/test/regress/expected/insert.out 
b/src/test/regress/expected/insert.out
index c608ce377f..0dcc86fef4 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -410,6 +410,21 @@ with ins (a, b, c) as
  mlparted4  | 1 |  30 |  39
 (5 rows)
 
+alter table mlparted add c text;
+create table mlparted5 (c text, a int not null, b int not null) partition by 
list (c);
+create table mlparted5a (a int not null, c text, b int not null);
+alter table mlparted5 attach partition mlparted5a for values in ('a');
+alter table mlparted attach partition mlparted5 for values from (1, 40) to (1, 
50);
+alter table mlparted add constraint check_b check (a = 1 and b < 45);
+insert into mlparted values (1, 45, 'a');
+ERROR:  new row for relation "mlparted5a" violates check constraint "check_b"
+DETAIL:  Failing row contains (1, 45, a).
+create function mlparted5abrtrig_func() returns trigger as $$ begin new.c = 
'b'; return new; end; $$ language plpgsql;
+create trigger mlparted5abrtrig before insert on mlparted5a for each row 
execute procedure mlparted5abrtrig_func();
+insert into mlparted5 (a, b, c) values (1, 40, 'a');
+ERROR:  new row for relation "mlparted5a" violates partition constraint
+DETAIL:  Failing row contains (b, 1, 40).
+drop table mlparted5;
 -- check that message shown after failure to find a partition shows the
 -- appropriate key description (or none) in various situations
 create table key_desc (a int, b int) partition by list ((a+0));
diff --git a/src/test/regress/expected/updatable_views.out 
b/src/test/regress/expected/updatable_views.out
index caca81a70b..eab5c0334c 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2368,8 +2368,8 @@ DETAIL:  Failing row contains (-1, invalid).
 DROP VIEW v1;
 DROP TABLE t1;
 -- check that an auto-updatable view on a partitioned table works correctly
-create table pt (a int, b int) partition by range (a, b);
-create table pt1 (b int not null, a int not null) partition by range (b);
+create table pt (a int, b int, v varchar) partition by range (a, b);
+create

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Sokolov Yura


On 2017-07-21 20:41, Sokolov Yura wrote:

On 2017-07-21 19:32, Robert Haas wrote:

On Fri, Jul 21, 2017 at 4:19 AM, Sokolov Yura
 wrote:


Probably with increased ring buffer there is no need in raising
vacuum_cost_limit. Will you admit it?


No, I definitely won't admit that.  With default settings autovacuum
won't write more than ~2.3MB/s if I remember the math correctly, so if
you've got a 1TB table you're probably going to need a bigger value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


I've seed autovacuum process spending >50% of its time in fsync
(with current ring buffer) (but I used autovacuum_cost_delay=2ms).
fsync could lasts up to second on hdd if there is concurrent IO.
Even on ssd fsync could be really noticeable.

But, I agree that for 1TB table autovacuum_cost_limit still should
be increased, even with larger ring buffer.


My friend noticed, that I didn't said why I bother with autovacuum.
Our customers suffers from table bloating. I've made synthetic
bloating test, and started experiments with modifying micro- and
auto-vacuum. My first attempts were to update FSM early (both in
micro and autovacuum) and update it upto root, not only low level.

Then I looked to strace of autovacuum process, and noticed storm
of fsync. I catched backtraces with gdb rooting on fsync, and
found that evicting dirty pages from small ring buffer it the
reason.

After some experiments with combining my "early fsm update" and
size of ring buffer, I understood that increasing ring buffer
gives most of benefits: autovacuum runs faster, and bloating is
greatly reduced. On extreme case, 400mb table bloats to 17GB
on master, and only to 5GB with faster autovacuum.

I used custom scripts, and that is why my statistic is not full.
Though, I didn't found performance reduction. In fact, it looks
like tests with "larger autovacuum ring" did more queries per hour
than tests against master.

I will run pgbench for weekend, so latencies and percentiles
will be collected.

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


Default pgbench script wasn't able to trigger autovacuum of
pgbench_accounts table in 8 hours (scale 400, 40 clients, 900tps
average), so weekend testing were not useful.

I will re-run with custom script for next day-two.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Sokolov Yura


Good day, Claudio

On 2017-07-22 00:27, Claudio Freire wrote:

On Fri, Jul 21, 2017 at 2:41 PM, Sokolov Yura
 wrote:


My friend noticed, that I didn't said why I bother with autovacuum.
Our customers suffers from table bloating. I've made synthetic
bloating test, and started experiments with modifying micro- and
auto-vacuum. My first attempts were to update FSM early (both in
micro and autovacuum) and update it upto root, not only low level.


This FSM thing is probably not a bad idea as well.

We're forced to run regular manual vacuums because for some tables
autovacuums seems to never be enough, no matter how it's configured,
mostly because it gets canceled all the time. These are high-churn,
huge tables, so vacuuming them takes hours or days, there's always
someone with a conflicting lock at some point that ends up canceling
the autovacuum task.

The above paragraph triggered me to go check, and it seems in those
cases the FSM never gets vacuumed. That's probably not a good thing,
but I don't see how to vacuum the FSM after a cancel. So vacuuming the
FSM from time to time during long-running vacuums seems like a good
idea at this point.


Attached patch changes fsm update: instead of updating only lowest
level, it propagates space increase up to root.

It slows autovacuum a bit, so that I didn't propose it together with
ring buffer increase.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom 60f76fc83ee8752362e037c1e19ed089d861e026 Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 3 Jul 2017 15:14:07 +0300
Subject: [PATCH] fsm&vacuum: write increasing of free space on upper levels

Every RecordPageWithFreeSpace update upper levels, if amount of free
spaces increased.
Also, do FreeSpaceMapVacuum after scanning heap and before vacuuming
indices.
---
 src/backend/commands/vacuumlazy.c | 16 +-
 src/backend/storage/freespace/freespace.c | 49 ++-
 src/backend/storage/freespace/fsmpage.c   |  4 ++-
 src/include/storage/fsm_internals.h   |  2 +-
 4 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index fc9c4f0fb1..a7fff0c5ae 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -595,7 +595,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	maxoff;
 		bool		tupgone,
 	hastup;
-		int			prev_dead_count;
 		int			nfrozen;
 		Size		freespace;
 		bool		all_visible_according_to_vm = false;
@@ -925,7 +924,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -1245,16 +1243,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			vacrelstats->nonempty_pages = blkno + 1;
 
 		/*
-		 * If we remembered any tuples for deletion, then the page will be
-		 * visited again by lazy_vacuum_heap, which will compute and record
-		 * its post-compaction free space.  If not, then we're done with this
-		 * page, so remember its free space as-is.  (This path will always be
-		 * taken if there are no indexes.)
+		 * heap_page_prune could free a bit of space. Lets record it
+		 * immediatly despite it will by recorded again in lazy_vacuum_heap
+		 * after more compaction.
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
-			RecordPageWithFreeSpace(onerel, blkno, freespace);
+		RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* fix up all tiny bits of freed space before vacuuming indices */
+	FreeSpaceMapVacuum(onerel);
+
 	/* report that everything is scanned and vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 4648473523..ca0c356f28 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -107,6 +107,8 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 /* workhorse functions for various operations */
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
    uint8 newValue, uint8 minValue);
+static void fsm_set_recursive(Relation rel, FSMAddress addr, uint16 slot,
+  uint8 new_cat, bool only_increase);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
 static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof);
 static BlockNumber fsm_get_lastblckno(Relation rel, FSMAddress addr);
@@ -173,9 +175,8 @@ RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage,
 /*
  * RecordPageWithFreeSpace - update info about a page.
  *
- * Note that if the new spaceAvail value is higher than the old value stored
- * in the FSM, the space might not become visible to searchers until the next
- * FreeSpaceMa

Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables

2017-07-24 Thread Etsuro Fujita


On 2017/07/24 16:16, Amit Khandekar wrote:

On 24 July 2017 at 12:11, Amit Langote  wrote:

Attached is the updated version of your patch.


Good catch, Amit K. and Amit L.!


Now that this is done, any particular reason it is not done in
ExecPartitionCheck() ? I see that there is a do_convert_tuple() in
that function, again without changing the slot descriptor.


Yeah, I think we would need that in ExecPartitionCheck() as well.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables

2017-07-24 Thread Amit Khandekar

On 24 July 2017 at 12:11, Amit Langote  wrote:
> Hi Amit,
>
> On 2017/07/24 14:09, Amit Khandekar wrote:
 On 2017/07/10 14:15, Etsuro Fujita wrote:
 Another thing I noticed is the error handling in ExecWithCheckOptions; it
 doesn't take any care for partition tables, so the column description in
 the error message for WCO_VIEW_CHECK is built using the partition table's
 rowtype, not the root table's.  But I think it'd be better to build that
 using the root table's rowtype, like ExecConstraints, because that would
 make the column description easier to understand since the parent view(s)
 (from which WithCheckOptions evaluated there are created) would have been
 defined on the root table.  This seems independent from the above issue,
 so I created a separate patch, which I'm attaching. What do you think
 about that?
>>
>> + if (map != NULL)
>> + {
>> + tuple = do_convert_tuple(tuple, map);
>> + ExecStoreTuple(tuple, slot, InvalidBuffer, false);
>> + }
>>
>> Above, the tuple descriptor also needs to be set, since the parent and
>> child partitions can have different column ordering.
>>
>> Due to this, the following testcase crashes :
>>
>> CREATE TABLE range_parted (a text,b int, c int) partition by range (b);
>> CREATE VIEW upview AS SELECT * FROM range_parted WHERE (select c >
>> 120) WITH CHECK OPTION;
>> create table part_a_1_a_10(b int, c int, a text);
>> alter table range_parted attach partition part_a_1_a_10 for values
>> from (1) to (10);
>> insert into upview values ('a', 2, 100);
>
> Good catch.  Thanks for creating the patch.
>
> There are other places with similar code viz. those in ExecConstraints()
> that would need fixing.

Ah ok. I should have noticed that. Thanks.

> Attached is the updated version of your patch.
Now that this is done, any particular reason it is not done in
ExecPartitionCheck() ? I see that there is a do_convert_tuple() in
that function, again without changing the slot descriptor.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-24 Thread Andres Freund



On July 24, 2017 7:10:19 AM GMT+01:00, Noah Misch  wrote:
>On Tue, Jul 18, 2017 at 01:04:10PM -0700, Andres Freund wrote:
>> Ok, I'll flesh out the patch till Thursday.  But I do think we're
>going
>> to have to do something about the back branches, too.
>
>This PostgreSQL 10 open item is past due for your status update. 
>Kindly send
>a status update within 24 hours, and include a date for your subsequent
>status
>update.  Refer to the policy on open item ownership:
>https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I sent out a note fleshed out patch last week, which Tom reviewed. Planning to 
update it to address that review today or tomorrow.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

59 matches

Mail list logo