date:20160531

Re: [HACKERS] Statement timeout

2016-05-31 Thread Tatsuo Ishii

> Oops. Previous example was not appropriate. Here are revised
> examples. (in any case, the time consumed at parse and bind are small,
> and I omit the CHECK_FOR_INTERRUPTS after these commands)
> 
> [example 1]
> 
> statement_timeout = 3s
> 
> parse(statement1) -- time 0. set statement timout timer
> bind(statement1, portal1)
> execute(portal1) -- took 2 seconds
> CHECK_FOR_INTERRUPTS -- 2 seconds passed since time 0. statement timeout does 
> not fire
> parse(statement2)
> bind(statement2, portal2)
> execute(portal2) -- took 2 seconds
> CHECK_FOR_INTERRUPTS -- 4 seconds passed since time 0. the statement timeout 
> fires!
> 
> Another example.
> 
> [example 2]
> 
> statement_timeout = 3s
> 
> parse(statement1) -- time 0. set statement timout timer
> bind(statement1, portal1)
> execute(portal1) -- took 1 second
> CHECK_FOR_INTERRUPTS -- 1 second passed since time 0. statement timeout does 
> not fire
> parse(statement2)
> bind(statement2, portal2)
> execute(portal2) -- took 1 second
> CHECK_FOR_INTERRUPTS -- 2 seconds passed since time 0. the statement timeout 
> fires!
> [client is idle for 2 seconds]
> sync
> CHECK_FOR_INTERRUPTS -- 4 seconds passed since time 0. the statement timeout 
> fires!
> 
> I think current code is good in detecting statement timeout if each
> command execution time actually exceeds the specified timeout. However
> it could report false statement timeout in some cases like above.

Here is the patch against master branch to deal with the problem. I
create new functions in tcop/postgres.c:

static void enable_statement_timeout(void);
static void disable_statement_timeout(void);

Before the code was in start_xact_command() and finish_xact_command()
but I want to manage to disable timeout in exec_execute_command, so I
separated the code into the new functions.

Also start_xact_command() and finish_xact_command() were modified to
use these new functions to avoid code duplication.

I tested the patch using small C program linked with modified libpq
(PQsendQueryParams was modified to issue "flush" message instead of
"sync" message). The statement timeout was set to 3 seconds. With
these test programs, client sends pg_sleep(2) and flush message then
sleep 10 seconds. With current PostgreSQL, this triggers statement
timeout while the client is sleeping. With patched PostgreSQL, this
does not trigger the timeout as expected.

All regression tests passed.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b185c1b..564855a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -149,6 +149,11 @@ static bool doing_extended_query_message = false;
 static bool ignore_till_sync = false;
 
 /*
+ * Flag to keep track of whether we have started statement timeout timer.
+ */
+static bool st_timeout = false;
+
+/*
  * If an unnamed prepared statement exists, it's stored here.
  * We keep it separate from the hashtable kept by commands/prepare.c
  * in order to reduce overhead for short-lived queries.
@@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *parseTrees);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static void enable_statement_timeout(void);
+static void disable_statement_timeout(void);
 
 
 /* 
@@ -2002,6 +2009,11 @@ exec_execute_message(const char *portal_name, long max_rows)
 			 * those that start or end a transaction block.
 			 */
 			CommandCounterIncrement();
+
+			/*
+			 * We need to reset statement timeout if already set.
+			 */
+			disable_statement_timeout();
 		}
 
 		/* Send appropriate CommandComplete to client */
@@ -2433,14 +2445,10 @@ start_xact_command(void)
 (errmsg_internal("StartTransactionCommand")));
 		StartTransactionCommand();
 
-		/* Set statement timeout running, if any */
-		/* NB: this mustn't be enabled until we are within an xact */
-		if (StatementTimeout > 0)
-			enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
-		else
-			disable_timeout(STATEMENT_TIMEOUT, false);
-
 		xact_started = true;
+
+		/* Set statement timeout running, if any */
+		enable_statement_timeout();
 	}
 }
 
@@ -2450,7 +2458,7 @@ finish_xact_command(void)
 	if (xact_started)
 	{
 		/* Cancel any active statement timeout before committing */
-		disable_timeout(STATEMENT_TIMEOUT, false);
+		disable_statement_timeout();
 
 		/* Now commit the command */
 		ereport(DEBUG3,
@@ -4510,3 +4518,51 @@ log_disconnections(int code, Datum arg)
 	port->user_name, port->database_name, port->remote_host,
   port->remote_port[0] ? " port=" : "", port->remote_port)));
 }
+
+/*
+ * Set statement timeout if any.
+ */
+static void enable_statement_timeout(void)
+{
+	if (!st_timeout)
+	{
+		if (StatementTimeout > 0)
+		{
+
+			/*
+			 * Sanity check

Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat

2016-05-31 Thread Kyotaro HORIGUCHI

Hello,

At Fri, 27 May 2016 13:20:20 -0400, Tom Lane  wrote in 
<14603.1464369...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > By the way, the reason of the "invalid snapshot identifier" is
> > that some worker threads try to use it after the connection on
> > the first worker closed.
> 
> ... BTW, I don't quite see what the issue is there.  The snapshot is
> exported from the master session, so errors in worker sessions should not
> cause such failures in other workers.  And I don't see any such failure
> when setting up a scenario that will cause a worker to fail on Linux.
> The "invalid snapshot identifier" bleats would make sense if you had
> gotten a server-side error (and transaction abort) in the master session,
> but I don't see any evidence that that happened in that example.  Might be
> worth seeing if that's reproducible.

The master session died from lack of libz and the failure of
compressLevel's propagation already fixed. Some of the children
that started transactions after the master's death will get the
error.

Similary, sudden close of the session of the master child at very
early in its transaction could cause the same symptom but it
seems not likely if master surely arrives at command-waiting, or
"safe", state.

If we want prevent it perfectly, one solution could be that
non-master children explicitly wait the master to arrive at the
"safe" state before starting their transactions. But I suppose it
is not needed here.

Does this make sense?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

2016-05-31 Thread Vladimir Borodin

27 мая 2016 г., в 19:57, Vladimir Borodin  написал(а):-performance+hackers25 мая 2016 г., в 17:33, Vladimir Borodin  написал(а):Hi all.We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in	* REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb),	* REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053),	* master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.Shortly:OS			PostgreSQL version	TPS			Avg. latencyRHEL 6		9.4	44898		1.425 msRHEL 6		9.5	26199		2.443 msRHEL 6		9.5	43027		1.487 msUbuntu 14.04	9.4	67458		0.949 msUbuntu 14.04	9.5	64065		0.999 msUbuntu 14.04	9.6	64350		0.995 msThe results above are not really fair, pgbouncer.ini was a bit different on Ubuntu host (application_name_add_host was disabled). Here are the right results with exactly the same configuration:OS			PostgreSQL version	TPS			Avg. latencyRHEL 6		9.4	44898		1.425 msRHEL 6		9.5	26199		2.443 msRHEL 6		9.5	43027		1.487 msUbuntu 14.04	9.4	45971		1.392 msUbuntu 14.04	9.5	40282		1.589 msUbuntu 14.04	9.6	45410		1.409 msIt can be seen that there is a regression for 9.5 in Ubuntu also, but not so significant. We first thought that the reason is 38628db8d8caff21eb6cf8d775c0b2d04cf07b9b (Add memory barriers for PgBackendStatus.st_changecount protocol), but in that case the regression should also be seen in 9.6 also.There also was a bunch of changes in FE/BE communication (like 387da18874afa17156ee3af63766f17efb53c4b9 or 98a64d0bd713cb89e61bef6432befc4b7b5da59e) and that may answer the question of regression in 9.5 and normal results in 9.6. Probably the right way to find the answer is to do bisect. I’ll do it but if some more diagnostics information can help, feel free to ask about it.Yep, bisect confirms that the first bad commit in REL9_5_STABLE is 387da18874afa17156ee3af63766f17efb53c4b9. Full output is attached.And bisect for master branch confirms that the situation became much better after 98a64d0bd713cb89e61bef6432befc4b7b5da59e. Output is also attached.On Ubuntu performance degradation is ~15% and on RHEL it is ~100%. I don’t know what is the cause for different numbers on RHEL and Ubuntu but certainly there is a regression when pgbouncer is connected to postgres through localhost. When I try to connect pgbouncer to postgres through unix-socket performance is constantly bad on all postgres versions.Both servers are for testing but I can easily provide you SSH access only to Ubuntu host if necessary. I can also gather more diagnostics if needed.

bisect95.out
Description: Binary data


bisect96.out
Description: Binary data
You could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.Below are more details.RHEL 6:postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2693962latency average: 1.425 mstps = 44897.461518 (including connections establishing)tps = 44898.763258 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 1572014latency average: 2.443 mstps = 26198.928627 (including connections establishing)tps = 26199.803363 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2581645latency average: 1.487 mstps = 43025.676995 (including connections establishing)tps = 43027.038275 (excluding connections establishing)postgres@pgload05g ~ $Ubuntu 14.04 (the same hardware):postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of thr

[HACKERS] Binary I/O for isn extension

2016-05-31 Thread Shay Rojansky

Hi.

Attached is a small patch which adds binary input/output for the types
added by the isn extension.

Shay


isn-binary.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Marco Atzeri



On 31/05/2016 08:10, Tsunakawa, Takayuki wrote:

From: Michael Meskes [mailto:mes...@postgresql.org]

Yes, but Windows users probably don't understand or know it.  So, I
suggested explicitly describing the application binary compatibility
policy in the PostgreSQL manual.  What do you think about it?


Couldn't you point your customer to the system documentation?

Personally I don't think standard system behavior should be documented for
each application relying on it but ymmv.


I couldn't find appropriate system documentation.  Regarding Linux, I remember I saw some 
HOWTO on tldp.org website which explains the concept of shared library soname, but it's 
not very friendly for users who just want to know the application binary compatibility 
policy of PostgreSQL.  And I don't think there's suitable documentation on Windows.  Even 
if there is any, users will not be sure whether PostgreSQL follows those 
platform-specific conventions.  They may have doubts about it, because even the product 
version "PostgreSQL x.y.z" causes misconception that x is the major version and 
y is the minor one.

So, I suggested documenting the compatibility policy for clarification and user 
friendliness as in the Oracle Database documentation below.

http://docs.oracle.com/database/121/UPGRD/app.htm#UPGRD12547


BTW, although this may be a separate topic, it may be better that we add the 
major version in the library names like libpq5.dll and libecpg6.dll, so that 
the application can fail to run with the incompatible versions of libpq and 
libecpg.  FYI:

https://en.wikipedia.org/wiki/Side-by-side_assembly

[Excerpt]
Microsoft Visual C++ 2005 and 2008 employ SxS with all C runtime libraries. 
However, runtime libraries in Visual C++ 2010 no longer use this technology; 
instead, they include the version number of a DLL in its file name, which means 
that different versions of one DLL will technically be completely different 
DLLs now.


Any comments on these?  If there's no strong objection, I think I'll submit a 
documentation patch in the future.

Regards
Takayuki Tsunakawa



Hi,
on cygwin the postgresql binary package already include
the library versions:

  /usr/bin/cygecpg-6.dll
  /usr/bin/cygecpg_compat-3.dll
  /usr/bin/cygpgtypes-3.dll
  /usr/bin/cygpq-5.dll

attached the patch used for the build.

Regards
Marco


--- origsrc/postgresql-9.4.2/src/Makefile.shlib 2015-05-20 00:33:58.0 
+0200
+++ src/Makefile.shlib  2015-05-27 23:01:09.379468300 +0200
@@ -267,7 +267,7 @@ endif
 ifeq ($(PORTNAME), cygwin)
   LINK.shared  = $(CC) -shared
   ifdef SO_MAJOR_VERSION
-shlib  = cyg$(NAME)$(DLSUFFIX)
+shlib  = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX)
   endif
   haslibarule   = yes
 endif
@@ -359,12 +359,9 @@ ifeq ($(PORTNAME), cygwin)
 # Cygwin case
 
 $(shlib): $(OBJS) | $(SHLIB_PREREQS)
-   $(CC) $(CFLAGS)  -shared -o $@  $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) 
$(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE)
+   $(CC) $(CFLAGS)  -shared -o $@ -Wl,--out-implib=$(stlib) $(OBJS) 
$(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE)
 
-$(stlib): $(OBJS) | $(SHLIB_PREREQS)
-   rm -f $@
-   $(LINK.static) $@ $^
-   $(RANLIB) $@
+$(stlib): $(shlib) ;
 
 else
 
--- origsrc/postgresql-9.4.2/src/interfaces/libpq/Makefile  2015-05-20 
00:33:58.0 +0200
+++ src/interfaces/libpq/Makefile   2015-05-27 22:56:43.193200600 +0200
@@ -45,7 +45,7 @@ OBJS += ip.o md5.o
 OBJS += encnames.o wchar.o
 
 ifeq ($(PORTNAME), cygwin)
-override shlib = cyg$(NAME)$(DLSUFFIX)
+override shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX)
 endif
 
 ifeq ($(PORTNAME), win32)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IPv6 link-local addresses and init data type

2016-05-31 Thread Andreas Karlsson


On 05/31/2016 04:06 AM, Tom Lane wrote:

Andreas Karlsson  writes:

On 05/31/2016 02:37 AM, Haribabu Kommi wrote:

The % delimiter character is not only used at the end of the IPV6 address,
from the RFC document, it is possible as follows also.

fe80::%2/64

we need to handle both the scenarios,  it may not be a straight
forward to store the zone id data.



This case needs to be handled by the parser for at least the cidr type,
but I do not think it would make parsing any trickier.


Unless there's a semantic difference between fe80::1%2/64 and
fe80::1/64%2, this doesn't seem like a big deal to me.


As far as I can till only fe80::1%2/64 is valid, but I am not 100% sure.

Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Binary I/O for isn extension

2016-05-31 Thread Andreas Karlsson


Hi,

Thanks for the patch, you can add it to our commitfest app so it gets 
reviewed in the next commitfest.


https://commitfest.postgresql.org/

A suggestion to make it easier for your patch to be accepted:

When adding new functions to an extension you need to bump the version 
of the extension by renaming the file, updating the .control file, 
creating an upgrade script, and updating the Makefile to include the new 
files.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Binary I/O for isn extension

2016-05-31 Thread Fabien COELHO



Hello,

Attached is a small patch which adds binary input/output for the types 
added by the isn extension.


I added this patch to the next CF (2016-09) under "Miscellaneous".

Out of curiosity, what is the motivation?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Binary I/O for isn extension

2016-05-31 Thread Shay Rojansky

>
> When adding new functions to an extension you need to bump the version of
> the extension by renaming the file, updating the .control file, creating an
> upgrade script, and updating the Makefile to include the new files.


Thanks for the guidance, I'll fix all that and resubmit a patch.

Re: [HACKERS] Binary I/O for isn extension

2016-05-31 Thread Fabien COELHO



Thanks for the patch, you can add it to our commitfest app so it gets 
reviewed in the next commitfest.


I did this.


A suggestion to make it easier for your patch to be accepted:

When adding new functions to an extension you need to bump the version of the 
extension by renaming the file, updating the .control file, creating an 
upgrade script, and updating the Makefile to include the new files.


Indeed.

Moreover I'm not sure that a type may me updated incrementaly to add new 
send/receive functions.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Binary I/O for isn extension

2016-05-31 Thread Shay Rojansky

>
> I added this patch to the next CF (2016-09) under "Miscellaneous".
>

Thanks!


> Out of curiosity, what is the motivation?


I'm the owner of Npgsql, the open-source .NET driver for PostgreSQL, which
is a binary-first driver. That is, working with types that have no binary
I/O is possible but awkward.

I hope the general direction is (or will be) to have binary I/O for all
supported types, both for compatibility with binary-first consumers such as
Npgsql and for general efficiency.

Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Michael Meskes

> I couldn't find appropriate system documentation.  Regarding Linux, I
> remember I saw some HOWTO on tldp.org website which explains the
> concept of shared library soname, but it's not very friendly for
> users who just want to know the application binary compatibility
> policy of PostgreSQL.  And I don't think there's suitable 

I would expect Unix sysadmins to understand that howto, but there are
others, e.g. a random hit from google: https://www.bottomupcs.com/libra
ries_and_the_linker.xhtml

There even is a wikipedia page about it: https://en.wikipedia.org/wiki/
Soname

> documentation on Windows.  Even if there is any, users will not be
> sure whether PostgreSQL follows those platform-specific
> conventions.  They may have doubts about it, because even the product
> version "PostgreSQL x.y.z" causes misconception that x is the major
> version and y is the minor one.

I don't know anything about the Windows setup in PostgreSQL, but I
would find it fair to assume that PostgreSQL follows conventions.

> BTW, although this may be a separate topic, it may be better that we
> add the major version in the library names like libpq5.dll and
> libecpg6.dll, so that the application can fail to run with the
> incompatible versions of libpq and libecpg.  FYI:

Does this mean you cannot have to versions of libpq installed on the
same Windows system at the same time?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra


Hi,

On 05/26/2016 10:10 PM, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a patch that should fix the coalescing, including the clock
skew detection. In the end I reorganized the code a bit, moving the
check at the end, after the clock skew detection. Otherwise I'd have to
do the clock skew detection on multiple places, and that seemed ugly.


I hadn't been paying any attention to this thread, I must confess.
But I rediscovered this no-coalescing problem while pursuing the poor
behavior for shared catalogs that Peter complained of in
https://www.postgresql.org/message-id/56ad41ac.1030...@gmx.net

I posted a patch at
https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?


I've reviewed the patch today, and it seems fine to me - correct and 
achieving the same goal as the patch posted to this thread (plus fixing 
the issue with shared catalogs and fixing many comments).


FWIW do you still plan to back-patch this? Minimizing the amount of 
changes was one of the things I had in mind when writing "my" patch, 
which is why I ended up with parts that are less readable.


The one change I'm not quite sure about is the removal of clock skew 
detection in pgstat_recv_inquiry(). You've removed the first check on 
the inquiry, replacing it with this comment:


It seems sufficient to check for clock skew once per write round.

But the first check was comparing msg/req, while the second check looks 
at dbentry/cur_ts. I don't see how those two clock skew check are 
redundant - if they are, the comment should explain that I guess.


Another thing is that if you believe merging requests across databases 
is a silly idea, maybe we should bite the bullet and replace the list of 
requests with a single item. I'm not convinced about this, though.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication & oldest XID.

2016-05-31 Thread Konstantin Knizhnik


Hi,

We are using logical replication in multimaster and are faced with some 
interesting problem with "frozen" procArray->replication_slot_xmin.
This variable is adjusted by ProcArraySetReplicationSlotXmin which is 
invoked by ReplicationSlotsComputeRequiredXmin, which
is in turn is called by LogicalConfirmReceivedLocation. If transactions 
are executed at all nodes of multimaster, then everything works fine: 
replication_slot_xmin is advanced. But if we send transactions only to 
one multimaster node and broadcast this changes to other nodes, then no 
data is send through replications slot at this nodes. No data sends - no 
confirmations, LogicalConfirmReceivedLocation is not called and 
procArray->replication_slot_xmin preserves original value 599.


As a result GetOldestXmin function always returns 599, so autovacuum is 
actually blocked and our multimaster is not able to perform cleanup of 
XID->CSN map, which cause shared memory overflow. This situation happens 
only when write transactions are sent only to one node or if there are 
no write transactions at all.


Before implementing some workaround (for example forces all of 
ReplicationSlotsComputeRequiredXmin), I want to understand if it is real 
problem of logical replication or we are doing something wrong? BDR 
should be faced with the same problem if all updates are performed from 
one node...


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-31 Thread Robert Haas

On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner  wrote:
>> As far as I can see normal index builds will allow concurrent hot
>> prunes and everything; since those only require page-level
>> exclusive locks.
>>
>> So for !concurrent builds we might end up with a corrupt index.
>
> ... by which you mean an index that omits certainly-dead heap
> tuples which have been the subject of early pruning or vacuum, even
> if there are still registered snapshots that include those tuples?
> Or do you see something else?

I think that is the danger.

> Again, since both the heap pages involved and all the new index
> pages would have LSNs recent enough to trigger the old snapshot
> check, I'm not sure where the problem is, but will review closely
> to see what I might have missed.

This is a good point, though, I think.

>> I think many of the places relying on heap scans with !mvcc
>> snapshots are problematic in that way. Outdated reads will not be
>> detected by TestForOldSnapshot() (given the (snapshot)->satisfies
>> == HeapTupleSatisfiesMVCC condition therein), and rely on the
>> snapshot to be actually working.  E.g. CLUSTER/ VACUUM FULL rely
>> on accurate HEAPTUPLE_RECENTLY_DEAD
>
> Don't the "RECENTLY" values imply that the transaction is still
> running which cause the tuple to be dead?  Since tuples are not
> subject to early pruning or vacuum when that is the case, where do
> you see a problem?  The snapshot itself has the usual xmin et al.,
> so I'm not sure what you might mean by "the snapshot to be actually
> working" if not the override at the time of pruning/vacuuming.

Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that
is newer that the oldest registered snapshot in the system (based on
some snapshots being ignored) might get a return value of
HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD.  It seems
necessary to carefully audit all calls to HeapTupleSatisfiesVacuum()
to see whether that difference matters.  I took a quick look and
here's what I see:

statapprox_heap(): Statistical information for the DBA.  The
difference is non-critical.

heap_prune_chain(): Seeing the tuple as dead might cause it to be
removed early.  This should be OK.  Removing the tuple early will
cause the page LSN to be bumped unless RelationNeedsWAL() returns
false, and TransactionIdLimitedForOldSnapshots() includes that as a
condition for disabling early pruning.

IndexBuildHeapRangeScan(): We might end up with indexIt = false
instead of indexIt = true.  That should be OK because anyone using the
old snapshot will see a new page LSN and error out.  We might also
fail to set indexInfo->ii_BrokenHotChain = true.  I suspect that's a
problem, but I'm not certain of it.

acquire_sample_rows: Both return values are treated in the same way.
No problem.

copy_heap_data: We'll end up setting isdead = true instead of
tups_recently_dead += 1.  That means that the new heap won't include
the tuple, which is OK because old snapshots can't read the new heap
without erroring out, assuming that the new heap has LSNs.  The xmin
used here comes from vacuum_set_xid_limits() which goes through
TransactionIdLimitedForOldSnapshots() so this should be OK for the
same reasons as heap_prune_chain().  Another effect of seeing the
tuple as prematurely dead is that we'll call rewrite_heap_dead_tuple()
on it; rewrite_heap_dead_tuple() will presume that if this tuple is
dead, its predecessor in the ctid chain is also dead.  I don't see any
obvious problem with that.

lazy_scan_heap(): Basically, the same thing as heap_prune_chain().

CheckForSerializableConflictOut: Maybe a problem?  If the tuple is
dead, there's no issue, but if it's recently-dead, there might be.

We might want to add comments to some of these places addressing
snapshot_too_old specifically.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas

On Sun, May 29, 2016 at 1:33 AM, Noah Misch  wrote:
> On Fri, May 06, 2016 at 02:52:30PM -0400, Robert Haas wrote:
>> OK, my reading of this thread is that there is a consensus is to
>> redefine max_parallel_degree=1 as "no parallelism" and
>> max_parallel_degree>1 as "parallelism using a leader plus N-1
>> workers", and along with that, to keep the names unchanged.  However,
>> I don't think I can get that done before beta1, at least not without a
>> serious risk of breaking stuff.  I can look at this post-beta1.
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

Here is a patch.  Note that I still don't agree with this change, but
I'm bowing to the will of the group.

I think that some of the people who were in favor of this change
should review this patch, including especially the language I wrote
for the documentation.  If that happens, and the reviews are positive,
then I will commit this.  If that does not happen, then I will
interpret that to mean that there isn't actually all that much
interest in changing this after all and will accordingly recommend
that this open item be removed without further action.

Here is a test which shows how it works:

rhaas=# set max_parallel_degree = 100;
SET
rhaas=# alter table pgbench_accounts set (parallel_degree = 10);
ALTER TABLE
rhaas=# explain (analyze) select count(*) from pgbench_accounts;

QUERY PLAN

 Finalize Aggregate  (cost=177436.04..177436.05 rows=1 width=8)
(actual time=383.244..383.244 rows=1 loops=1)
   ->  Gather  (cost=177435.00..177436.01 rows=10 width=8) (actual
time=383.040..383.237 rows=9 loops=1)
 Workers Planned: 9
 Workers Launched: 8
 ->  Partial Aggregate  (cost=176435.00..176435.01 rows=1
width=8) (actual time=375.569..375.569 rows=1 loops=9)
   ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..173935.00 rows=100 width=0) (actual
time=0.103..260.352 rows=111 loops=9)
 Planning time: 0.135 ms
 Execution time: 384.387 ms
(8 rows)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


parallel-degree-1based.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statement timeout

2016-05-31 Thread Tom Lane

Tatsuo Ishii  writes:
>> Oops. Previous example was not appropriate. Here are revised
>> examples. (in any case, the time consumed at parse and bind are small,
>> and I omit the CHECK_FOR_INTERRUPTS after these commands)

FWIW, I think the existing behavior is just fine.  It corresponds to what
PQexec has always done with multi-statement query strings; that is,
statement_timeout governs the total time to execute the transaction (the
whole query string, unless you put transaction control commands in there).
In extended query mode, since you can only put one textual query per Parse
message, that maps to statement_timeout governing the total time from
initial Parse to Sync.  Which is what it does.

What you propose here is not a bug fix but a redefinition of what
statement_timeout does; and you've made it inconsistent with simple query
mode.  I don't really think it's an improvement.

BTW, aren't you missing a re-enable of the timeout for statements after
the first one?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-31 Thread Kevin Grittner

On Tue, May 31, 2016 at 10:03 AM, Robert Haas  wrote:
> On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner  wrote:
>>> As far as I can see normal index builds will allow concurrent hot
>>> prunes and everything; since those only require page-level
>>> exclusive locks.
>>>
>>> So for !concurrent builds we might end up with a corrupt index.
>>
>> ... by which you mean an index that omits certainly-dead heap
>> tuples which have been the subject of early pruning or vacuum, even
>> if there are still registered snapshots that include those tuples?
>> Or do you see something else?
>
> I think that is the danger.

Well, the *perceived* danger -- since every page in the new index
would be new enough to be recognized as too recently modified to be
safe for a snapshot that could still see the omitted rows, the only
question I'm sorting out on this is how safe it might be to cause
the index to be ignored in planning when using such a snapshot.
That and demonstrating the safe behavior to those not following
closely enough to see what will happen without a demonstration.

>> Again, since both the heap pages involved and all the new index
>> pages would have LSNs recent enough to trigger the old snapshot
>> check, I'm not sure where the problem is,

> This is a good point, though, I think.

The whole perception of risk in this area seems to be based on
getting that wrong; although the review of this area may yield some
benefit in terms of minimizing false positives.

>>> I think many of the places relying on heap scans with !mvcc
>>> snapshots are problematic in that way. Outdated reads will not be
>>> detected by TestForOldSnapshot() (given the (snapshot)->satisfies
>>> == HeapTupleSatisfiesMVCC condition therein), and rely on the
>>> snapshot to be actually working.  E.g. CLUSTER/ VACUUM FULL rely
>>> on accurate HEAPTUPLE_RECENTLY_DEAD
>>
>> Don't the "RECENTLY" values imply that the transaction is still
>> running which cause the tuple to be dead?  Since tuples are not
>> subject to early pruning or vacuum when that is the case, where do
>> you see a problem?  The snapshot itself has the usual xmin et al.,
>> so I'm not sure what you might mean by "the snapshot to be actually
>> working" if not the override at the time of pruning/vacuuming.
>
> Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that
> is newer that the oldest registered snapshot in the system (based on
> some snapshots being ignored) might get a return value of
> HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD.

Since the override xmin cannot advance past the earliest
transaction which is still active, HEAPTUPLE_DEAD indicates that
the transaction causing the tuple to be dead has completed and the
tuple is irrevocably dead -- even if there are still snapshots
registered which can see it.  Seeing HEAPTUPLE_DEAD rather than
HEAPTUPLE_RECENTLY_DEAD is strictly limited to tuples which are
certainly and permanently dead and for which the only possible
references are non-MVCC snapshots or existing snapshots subject to
"snapshot too old" monitoring.

> IndexBuildHeapRangeScan(): We might end up with indexIt = false
> instead of indexIt = true.  That should be OK because anyone using the
> old snapshot will see a new page LSN and error out.  We might also
> fail to set indexInfo->ii_BrokenHotChain = true.  I suspect that's a
> problem, but I'm not certain of it.

The latter flag is what I'm currently digging at; but my hope is
that whenever old_snapshot_threshold >= 0 we can set
indexInfo->ii_BrokenHotChain = true to cause the planner to skip
consideration of the index if the snapshot is an "old" one.  That
will avoid some false positives (seeing the error when not strictly
necessary).  If that works out the way I hope, the only down side
is that a scan using a snapshot from an old transaction or cursor
would use some other index or a heap scan; but we already have that
possibility in some cases -- that seems to be the point of the
flag.

> CheckForSerializableConflictOut: Maybe a problem?  If the tuple is
> dead, there's no issue, but if it's recently-dead, there might be.

If the tuple is not visible to the scan, the behavior is unchanged
(a simple return from the function on either HEAPTUPLE_DEAD or
HEAPTUPLE_RECENTLY_DEAD with !visible) and (thus) clearly correct.
If the tuple is visible to us it is currently subject to early
pruning or vacuum (since those operations would get the same
modified xmin) but has not yet had any such treatment since we made
it to this function in the first place.  The processing for SSI
purposes would be unaffected by the possibility that there could
later be early pruning/vacuuming.

Thanks for the review and feedback!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2016-05-31 Thread Hendrik Visage

Hi there,

 Refering to https://www.postgresql.org/message-id/1352742344.21373.4@mofo


 I'm running into situations where I'd need to bulk transfer of data
tables across servers, but a drop and recreate schema isn't feasible
as we are running different permissions etc. on the two databases.

Thus my question: Anybody working on getting this into pg_restore
proper, and any advice on getting this feature incorporated?

HEndrik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat

2016-05-31 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> At Fri, 27 May 2016 13:20:20 -0400, Tom Lane  wrote in 
> <14603.1464369...@sss.pgh.pa.us>
>> Kyotaro HORIGUCHI  writes:
>>> By the way, the reason of the "invalid snapshot identifier" is
>>> that some worker threads try to use it after the connection on
>>> the first worker closed.

>> ... BTW, I don't quite see what the issue is there.

> The master session died from lack of libz and the failure of
> compressLevel's propagation already fixed. Some of the children
> that started transactions after the master's death will get the
> error.

I don't think I believe that theory, because it would require the master
to not notice the lack of libz before it launches worker processes, but
instead while the workers are working.  But AFAICS, while there are worker
processes open, the master does nothing except wait for workers and
dispatch new jobs to them; it does no database work of its own.  So the
libz-isn't-there error has to have occurred in one of the workers.

> If we want prevent it perfectly, one solution could be that
> non-master children explicitly wait the master to arrive at the
> "safe" state before starting their transactions. But I suppose it
> is not needed here.

Actually, I believe the problem is in archive_close_connection, around
line 295 in HEAD: once the master realizes that one child has failed,
it first closes its own database connection and only second tries to kill
the remaining children.  So there's a race condition wherein remaining
children have time to see the missing-snapshot error.

In the patch I posted yesterday, I reversed the order of those two
steps, which should fix this problem in most scenarios:
https://www.postgresql.org/message-id/7005.1464657...@sss.pgh.pa.us

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread Robert Haas

On Thu, May 26, 2016 at 3:28 PM, Tom Lane  wrote:
> Nikolay Shaplov  writes:
>> Actually I did not expected any discussion for this case. Documentations
>> missed an optional keyword, documentation should be fixed.
>
> 99% of the time, you'd be right.  But this is an unusual case, for the
> reasons I mentioned before.

I tend to agree with Nikolay.  I can't see much upside in making this
change.  At best, nothing will break.  At worst, something will break.
But how do we actually come out ahead?  The only thing you've offered
is that it might make it easier to make the relevant documentation
pages 100% clear, but I feel like a man of your elocution can probably
surmount that impediment.  I guess we might save a few parser states,
too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread Tom Lane

Robert Haas  writes:
> On Thu, May 26, 2016 at 3:28 PM, Tom Lane  wrote:
>> 99% of the time, you'd be right.  But this is an unusual case, for the
>> reasons I mentioned before.

> I tend to agree with Nikolay.  I can't see much upside in making this
> change.  At best, nothing will break.  At worst, something will break.
> But how do we actually come out ahead?

We come out ahead by not having to make the documentation more confusing.

Basically, we have the opportunity to fix an ancient mistake here at
very low cost.  I do not think that doubling down on the mistake is
a better answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c is not marked as test covered

2016-05-31 Thread Peter Eisentraut


On 5/9/16 10:50 AM, Andres Freund wrote:

I think it's a good idea to run a force-parallel run on some buildfarm
members. But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.


I can confirm that with force_parallel_mode = on, parallel.c does get 
good test coverage.  But I agree that it is a problem that this does not 
happen in the default test setup.


I think we might want to have a new test file that sets 
force_parallel_mode = on and just runs a few queries to give parallel.c 
a small workout.  For example, I find that the following test file gives 
almost the same coverage as running the full test suite with f_p_m=on:


"""
SET force_parallel_mode = on;

EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 1;
SELECT * FROM tenk1 WHERE unique1 = 1;

-- provoke error in worker
SELECT stringu1::int2 FROM tenk1;
"""

While we're at it, one of the things that force_parallel_mode = regress 
does is remove an error context that is otherwise added to all messages:


errcontext("parallel worker, PID %d", *(int32 *) arg);

I think we should get rid of that error context altogether, except 
possibly as a debugging tool, because it's an implementation detail that 
does not need to be shown to users by default.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] copyParamList

2016-05-31 Thread Robert Haas

On Fri, May 27, 2016 at 6:07 PM, Andrew Gierth
 wrote:
> copyParamList does not respect from->paramMask, in what looks to me like
> an obvious oversight:
>
> retval->paramMask = NULL;
> [...]
> /* Ignore parameters we don't need, to save cycles and space. */
> if (retval->paramMask != NULL &&
> !bms_is_member(i, retval->paramMask))
>
> retval->paramMask is never set to anything not NULL in this function,
> so surely that should either be initializing it to from->paramMask, or
> checking from->paramMask in the conditional?

Oh, dear.  I think you are right.  I'm kind of surprised this didn't
provoke a test failure somewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 09:15 AM, Robert Haas wrote:
> On Sun, May 29, 2016 at 1:33 AM, Noah Misch  wrote:
>> On Fri, May 06, 2016 at 02:52:30PM -0400, Robert Haas wrote:
>>> OK, my reading of this thread is that there is a consensus is to
>>> redefine max_parallel_degree=1 as "no parallelism" and
>>> max_parallel_degree>1 as "parallelism using a leader plus N-1
>>> workers", and along with that, to keep the names unchanged.  However,
>>> I don't think I can get that done before beta1, at least not without a
>>> serious risk of breaking stuff.  I can look at this post-beta1.
>>
>> [This is a generic notification.]
>>
>> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
>> since you committed the patch believed to have created it, you own this open
>> item.  If some other commit is more relevant or if this does not belong as a
>> 9.6 open item, please let us know.  Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within 72 hours of this
>> message.  Include a date for your subsequent status update.  Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
>> efforts toward speedy resolution.  Thanks.
> 
> Here is a patch.  Note that I still don't agree with this change, but
> I'm bowing to the will of the group.
> 
> I think that some of the people who were in favor of this change
> should review this patch, including especially the language I wrote
> for the documentation.  If that happens, and the reviews are positive,
> then I will commit this.  If that does not happen, then I will
> interpret that to mean that there isn't actually all that much
> interest in changing this after all and will accordingly recommend
> that this open item be removed without further action.
> 
> Here is a test which shows how it works:
> 
> rhaas=# set max_parallel_degree = 100;
> SET
> rhaas=# alter table pgbench_accounts set (parallel_degree = 10);
> ALTER TABLE
> rhaas=# explain (analyze) select count(*) from pgbench_accounts;
> 
> QUERY PLAN
> 
>  Finalize Aggregate  (cost=177436.04..177436.05 rows=1 width=8)
> (actual time=383.244..383.244 rows=1 loops=1)
>->  Gather  (cost=177435.00..177436.01 rows=10 width=8) (actual
> time=383.040..383.237 rows=9 loops=1)
>  Workers Planned: 9
>  Workers Launched: 8


I realize there's a lot of water under the bridge here, but I think
we're going to get 1000 questions on -general of the type:  "I asked for
8 parallel workers, why did I only get 7?".  I believe we will regret
this change.

So, one vote from me to revert.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety tagging of extension functions

2016-05-31 Thread Tom Lane

Andreas Karlsson  writes:
> So how to best change the function signatures? I do not think it is 
> possible without locking indexes by just using the SQL commands. You 
> cannot drop a function from the operator family without dropping the 
> operator class first.

Ugh.  I had been thinking that you could use ALTER OPERATOR FAMILY DROP
FUNCTION, but these functions are not "loose" in the opfamily --- they're
assumed to be bound to the specific opclass.  So there's an opclass
dependency preventing them from getting dropped; and we have no SQL
commands for changing the contents of an opclass.  After some fooling
around, I couldn't find a way to do it without some manual catalog update
or other.

Given that, your original approach of manually updating proargtypes in the
existing pg_proc row for the functions may be the best way.  Anything else
is going to be more complicated and ultimately will still require at least
one direct catalog update.

Sometime we might want to think about making this sort of thing cleaner
with more ALTER commands, but post-beta is probably not the time for that;
it wouldn't be a small patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane

Tomas Vondra  writes:
> On 05/26/2016 10:10 PM, Tom Lane wrote:
>> I posted a patch at
>> https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
>> which I think is functionally equivalent to what you have here, but
>> it goes to some lengths to make the code more readable, whereas this
>> is just adding another layer of complication to something that's
>> already a mess (eg, the request_time field is quite useless as-is).
>> So I'd like to propose pushing that in place of this patch ... do you
>> care to review it first?

> I've reviewed the patch today, and it seems fine to me - correct and 
> achieving the same goal as the patch posted to this thread (plus fixing 
> the issue with shared catalogs and fixing many comments).

Thanks for reviewing!

> FWIW do you still plan to back-patch this? Minimizing the amount of 
> changes was one of the things I had in mind when writing "my" patch, 
> which is why I ended up with parts that are less readable.

Yeah, I think it's a bug fix and should be back-patched.  I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.

> The one change I'm not quite sure about is the removal of clock skew 
> detection in pgstat_recv_inquiry(). You've removed the first check on 
> the inquiry, replacing it with this comment:
>  It seems sufficient to check for clock skew once per write round.
> But the first check was comparing msg/req, while the second check looks 
> at dbentry/cur_ts. I don't see how those two clock skew check are 
> redundant - if they are, the comment should explain that I guess.

I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

?  That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've seen
more than one request.  But since req->request_time isn't actually being
used anywhere, it's useless code.

I reformatted the actual check for clock skew, but I do not think I
changed its behavior.

> Another thing is that if you believe merging requests across databases 
> is a silly idea, maybe we should bite the bullet and replace the list of 
> requests with a single item. I'm not convinced about this, though.

No, I don't want to do that either.  We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Josh berkus  writes:
> I realize there's a lot of water under the bridge here, but I think
> we're going to get 1000 questions on -general of the type:  "I asked for
> 8 parallel workers, why did I only get 7?".  I believe we will regret
> this change.
> So, one vote from me to revert.

Well, that gets back to the question of whether average users will
understand the "degree" terminology.  For the record, while I do not
like the current behavior either, this was not the solution I favored.
I thought we should rename the GUC and keep it as meaning the number
of additional worker processes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 10:03 AM, Tom Lane wrote:
> Josh berkus  writes:
>> I realize there's a lot of water under the bridge here, but I think
>> we're going to get 1000 questions on -general of the type:  "I asked for
>> 8 parallel workers, why did I only get 7?".  I believe we will regret
>> this change.
>> So, one vote from me to revert.
> 
> Well, that gets back to the question of whether average users will
> understand the "degree" terminology.  For the record, while I do not
> like the current behavior either, this was not the solution I favored.
> I thought we should rename the GUC and keep it as meaning the number
> of additional worker processes.

I will happily bet anyone a nice dinner in Ottawa that most users will
not understand it.

Compare this:

"max_parallel is the maximum number of parallel workers which will work
on each stage of the query which is parallizable.  If you set it to 4,
you get up to 4 workers."

with this:

"max_parallel_degree is the amount of parallelism in the query, with the
understanding that the original parent process counts as 1, which means
that if you set it to 1 you get no parallelism, and if you want 4
parallel workers you need to set it to 5."

Which one of those is going to require more explanations on -general and
-novice?  Bets?

Let's not be complicated for the sake of being complicated.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Joshua D. Drake


On 05/31/2016 10:10 AM, Josh berkus wrote:


Compare this:

"max_parallel is the maximum number of parallel workers which will work
on each stage of the query which is parallizable.  If you set it to 4,
you get up to 4 workers."

with this:

"max_parallel_degree is the amount of parallelism in the query, with the
understanding that the original parent process counts as 1, which means
that if you set it to 1 you get no parallelism, and if you want 4
parallel workers you need to set it to 5."

Which one of those is going to require more explanations on -general and
-novice?  Bets?

Let's not be complicated for the sake of being complicated.



+1

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 10:10 AM, Josh berkus  wrote:
> "max_parallel_degree is the amount of parallelism in the query, with the
> understanding that the original parent process counts as 1, which means
> that if you set it to 1 you get no parallelism, and if you want 4
> parallel workers you need to set it to 5."
>
> Which one of those is going to require more explanations on -general and
> -novice?  Bets?
>
> Let's not be complicated for the sake of being complicated.

But the distinction between parallel workers and backends that can
participate in parallel query does need to be user-visible. Worker
processes are a commodity (i.e. the user must consider
max_worker_processes).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra


On 05/31/2016 06:59 PM, Tom Lane wrote:

Tomas Vondra  writes:

On 05/26/2016 10:10 PM, Tom Lane wrote:

I posted a patch at
https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?



I've reviewed the patch today, and it seems fine to me - correct and
achieving the same goal as the patch posted to this thread (plus fixing
the issue with shared catalogs and fixing many comments).


Thanks for reviewing!


FWIW do you still plan to back-patch this? Minimizing the amount of
changes was one of the things I had in mind when writing "my" patch,
which is why I ended up with parts that are less readable.


Yeah, I think it's a bug fix and should be back-patched.  I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.


The one change I'm not quite sure about is the removal of clock skew
detection in pgstat_recv_inquiry(). You've removed the first check on
the inquiry, replacing it with this comment:
 It seems sufficient to check for clock skew once per write round.
But the first check was comparing msg/req, while the second check looks
at dbentry/cur_ts. I don't see how those two clock skew check are
redundant - if they are, the comment should explain that I guess.


I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.


Ah, you're right. I've made the mistake of writing the e-mail before 
drinking any coffee today, and I got distracted by the comment change.



I reformatted the actual check for clock skew, but I do not think I
changed its behavior.


I'm not sure it does not change the behavior, though. request_time only 
became unused after you removed the two places that set the value (one 
of them in the clock skew check).


I'm not sure this is a bad change, though. But there was a dependency 
between the new request and the preceding one.





Another thing is that if you believe merging requests across databases
is a silly idea, maybe we should bite the bullet and replace the list of
requests with a single item. I'm not convinced about this, though.


No, I don't want to do that either.  We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.



+1

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:10 AM, Josh berkus  wrote:
>> "max_parallel_degree is the amount of parallelism in the query, with the
>> understanding that the original parent process counts as 1, which means
>> that if you set it to 1 you get no parallelism, and if you want 4
>> parallel workers you need to set it to 5."
>>
>> Which one of those is going to require more explanations on -general and
>> -novice?  Bets?
>>
>> Let's not be complicated for the sake of being complicated.
> 
> But the distinction between parallel workers and backends that can
> participate in parallel query does need to be user-visible. Worker
> processes are a commodity (i.e. the user must consider
> max_worker_processes).

It's still WAY simpler to understand "max_parallel is the number of
parallel workers I requested".

Any system where you set it to 2 and get only 1 worker on an idle system
is going to cause endless queries on the mailing lists.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane

Tomas Vondra  writes:
> On 05/31/2016 06:59 PM, Tom Lane wrote:
>> I'm confused here --- are you speaking of having removed
>>  if (msg->cutoff_time > req->request_time)
>>  req->request_time = msg->cutoff_time;
>> ? That is not a check for clock skew, it's intending to be sure that
>> req->request_time reflects the latest request for this DB when we've
>> seen more than one request. But since req->request_time isn't
>> actually being used anywhere, it's useless code.

> Ah, you're right. I've made the mistake of writing the e-mail before 
> drinking any coffee today, and I got distracted by the comment change.

>> I reformatted the actual check for clock skew, but I do not think I
>> changed its behavior.

> I'm not sure it does not change the behavior, though. request_time only 
> became unused after you removed the two places that set the value (one 
> of them in the clock skew check).

Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time.  And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Josh berkus  writes:
> On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
>> But the distinction between parallel workers and backends that can
>> participate in parallel query does need to be user-visible. Worker
>> processes are a commodity (i.e. the user must consider
>> max_worker_processes).

> It's still WAY simpler to understand "max_parallel is the number of
> parallel workers I requested".

> Any system where you set it to 2 and get only 1 worker on an idle system
> is going to cause endless queries on the mailing lists.

I really think that a GUC named "max_parallel_workers", which in fact
limits the number of workers and not something else, is the way to go.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 10:23 AM, Josh berkus  wrote:
> It's still WAY simpler to understand "max_parallel is the number of
> parallel workers I requested".

(Sorry Josh, somehow hit reply, not reply-all)

Yes, it is. But as long as parallel workers are not really that
distinct to the leader-as-worker when executing a parallel query, then
you have another consideration. Which is that you need to care about
how many cores your query uses first and foremost, and not the number
of parallel workers used. I don't think that having only one worker
will cause too much confusion, because users will trust that we won't
allow something that simply makes no sense to happen.

In my parallel create index patch, the leader participates as a worker
to scan and sort runs. It's identical to a worker, practically
speaking, at least until time comes to merge those runs. Similarly,
parallel aggregate does not really have much for the leader process to
do other than act as a worker.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Joshua D. Drake


On 05/31/2016 10:30 AM, Tom Lane wrote:

Josh berkus  writes:

On 05/31/2016 10:16 AM, Peter Geoghegan wrote:

But the distinction between parallel workers and backends that can
participate in parallel query does need to be user-visible. Worker
processes are a commodity (i.e. the user must consider
max_worker_processes).



It's still WAY simpler to understand "max_parallel is the number of
parallel workers I requested".



Any system where you set it to 2 and get only 1 worker on an idle system
is going to cause endless queries on the mailing lists.


I really think that a GUC named "max_parallel_workers", which in fact
limits the number of workers and not something else, is the way to go.



I agree with Tom here. If we are being pedantic for the sake of being 
pedantic we are just causing frustration. The term max_parallel_workers 
is simple, easy to understand and accurate enough.


Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 10:38 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:23 AM, Josh berkus  wrote:
>> It's still WAY simpler to understand "max_parallel is the number of
>> parallel workers I requested".
> 
> (Sorry Josh, somehow hit reply, not reply-all)
> 
> Yes, it is. But as long as parallel workers are not really that
> distinct to the leader-as-worker when executing a parallel query, then
> you have another consideration. Which is that you need to care about
> how many cores your query uses first and foremost, and not the number
> of parallel workers used. I don't think that having only one worker
> will cause too much confusion, because users will trust that we won't
> allow something that simply makes no sense to happen.
> 
> In my parallel create index patch, the leader participates as a worker
> to scan and sort runs. It's identical to a worker, practically
> speaking, at least until time comes to merge those runs. Similarly,
> parallel aggregate does not really have much for the leader process to
> do other than act as a worker.

In parallel seq scan and join, do the "masters" behave as workers as well?


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread Jaime Casanova

Hi,

Are we going to change synchronous_standby_names? Certainly the GUC is
not *only* a list of names anymore.

synchronous_standby_config?
synchronous_standbys (adjust to correct english if necesary)?

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 10:46 AM, Josh berkus  wrote:
> In parallel seq scan and join, do the "masters" behave as workers as well?

It depends. They will if they can. If the parallel seq scan leader
isn't getting enough work to do from workers (enough tuples to process
from the shared memory queue), it will start acting as a worker fairly quickly.
With parallel aggregate, and some other cases, that will always happen.

Even when the leader is consuming input from workers, that's still perhaps
pegging one CPU core. So, it doesn't really invalidate what I said about
the number of cores being the primary consideration.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

I wrote:
> I really think that a GUC named "max_parallel_workers", which in fact
> limits the number of workers and not something else, is the way to go.

To be concrete, I suggest comparing the attached documentation patch
with Robert's.  Which one is more understandable?

(I have not bothered preparing code changes to go with this, but am
willing to do so.)

regards, tom lane

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 72b5920..fccde28 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include_dir 'conf.d'
*** 1998,2019 

!   
!max_parallel_degree (integer)

! max_parallel_degree configuration parameter

!  Sets the maximum number of workers that can be started for an
!  individual parallel operation.  Parallel workers are taken from the
!  pool of processes established by
!  .  Note that the requested
!  number of workers may not actually be available at runtime.  If this
!  occurs, the plan will run with fewer workers than expected, which may
!  be inefficient.  The default value is 2.  Setting this value to 0
!  disables parallel query execution.

--- 1998,2020 

!   
!max_parallel_workers (integer)

! max_parallel_workers configuration parameter

!  Sets the maximum number of worker processes that can be used to
!  assist an individual parallelizable operation.  Parallel workers are
!  taken from the cluster-wide pool of processes established by
!  .  Depending on the size of
!  the pool and the number of other parallel queries active, the
!  requested number of workers may not actually be available at runtime.
!  If this occurs, the plan will run with fewer workers than expected,
!  which may be inefficient.  The default value is 2.  Setting this
!  value to 0 disables parallel query execution.

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index d1807ed..d73cf39 100644
*** a/doc/src/sgml/ref/create_table.sgml
--- b/doc/src/sgml/ref/create_table.sgml
*** CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY 
*** 909,922 

! parallel_degree (integer)

!   The parallel degree for a table is the number of workers that should
!   be used to assist a parallel scan of that table.  If not set, the
!   system will determine a value based on the relation size.  The actual
!   number of workers chosen by the planner may be less, for example due to
!   the setting of .

--- 909,923 

! parallel_workers (integer)

!   This setting limits the number of parallel worker processes that should
!   be used to assist a parallel scan of this table.  If not set, the
!   planner will select a value based on the relation size.  The actual
!   number of workers chosen by the planner may be less, for example
!   because it is also constrained by
!   ; it will not be more.

diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index 38e1967..3d78b00 100644
*** a/doc/src/sgml/release-9.6.sgml
--- b/doc/src/sgml/release-9.6.sgml
***
*** 153,159 

  Use of parallel query execution can be controlled through the new
  configuration parameters
! ,
  ,
  , and
  .
--- 153,159 

  Use of parallel query execution can be controlled through the new
  configuration parameters
! ,
  ,
  , and
  .

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 10:51 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:46 AM, Josh berkus  wrote:
>> In parallel seq scan and join, do the "masters" behave as workers as well?
> 
> It depends. They will if they can. If the parallel seq scan leader
> isn't getting enough work to do from workers (enough tuples to process
> from the shared memory queue), it will start acting as a worker fairly 
> quickly.
> With parallel aggregate, and some other cases, that will always happen.
> 
> Even when the leader is consuming input from workers, that's still perhaps
> pegging one CPU core. So, it doesn't really invalidate what I said about
> the number of cores being the primary consideration.
> 

I get where you're coming from, but I think Haas's query plan output is
going to show us the confusion we're going to get.  So we need to either
change the parameter, the explain output, or brace ourselves for endless
repeated questions.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread Tom Lane

Jaime Casanova  writes:
> Are we going to change synchronous_standby_names? Certainly the GUC is
> not *only* a list of names anymore.

> synchronous_standby_config?
> synchronous_standbys (adjust to correct english if necesary)?

I could get behind renaming it to synchronous_standby_config ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 11:00 AM, Tom Lane wrote:
> !  If this occurs, the plan will run with fewer workers than expected,
> !  which may be inefficient.  The default value is 2.  Setting this
> !  value to 0 disables parallel query execution.

Is there a thread on how we determined this default of 2?  I can't find
one under likely search terms.

I'm concerned about the effect of overallocating parallel workers on
systems which are already running out of cores (e.g. AWS instances), and
run with default settings.  Possibly max_parallel_workers takes care of
this, which is why I want to understand the logic here.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tuesday, May 31, 2016, Tom Lane  wrote:

> Josh berkus > writes:
> > On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
> >> But the distinction between parallel workers and backends that can
> >> participate in parallel query does need to be user-visible. Worker
> >> processes are a commodity (i.e. the user must consider
> >> max_worker_processes).
>
> > It's still WAY simpler to understand "max_parallel is the number of
> > parallel workers I requested".
>
> > Any system where you set it to 2 and get only 1 worker on an idle system
> > is going to cause endless queries on the mailing lists.
>
> I really think that a GUC named "max_parallel_workers", which in fact
> limits the number of workers and not something else, is the way to go.
>
>
What is your opinion on the internal name for this?  Leave it as "degree"?
Skimming the patch the number of times that "parallel_degree - 1" is used
would concern me.

I would also probably reword "Parallel workers are taken from the pool...":
changing parallel with the word additional.  "Additional workers are taken
from the pool...".

I'm tending to think we are using the word "parallel" too often and that
selective removal would be better.  "Sets the maximum number of workers
(i.e., processes) that can be used to perform an operation."  An
introductory section on parallelism can lay out the framework and the
concept of parallelism.  Sentences like the one above can assume that the
reader understands that they work in parallel if the operation allows and
there are workers available.  The same observation can be applied to
"leader".  When more than one process is involved a leader is chosen.  This
is general background to cover upfront.  Elsewhere just talk of workers and
let the leader be implied.  No need to keep repeating "including the
leader".  In many ways leader is an implementation detail the user doesn't
care about.  The only impact is that 2 is not twice as good as 1 due to
overhead - whatever form that overhead takes.

The change from two to three for the default sneaked in there...

David J.

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Peter Geoghegan  writes:
> Even when the leader is consuming input from workers, that's still perhaps
> pegging one CPU core. So, it doesn't really invalidate what I said about
> the number of cores being the primary consideration.

Agreed, but if we think that people need to be thinking in those terms,
maybe the parameter should be "max_parallel_cores".

The alternate docs patch I just posted tries to deal with this by
describing max_parallel_workers as being the max number of worker
processes used to "assist" a parallel query.  That was terminology
already being used in one place, but not consistently.  If we use it
consistently, I think it would be sufficient to remind people that
they need to figure on one more core for the leader.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 11:02 AM, Josh berkus  wrote:
> I get where you're coming from, but I think Haas's query plan output is
> going to show us the confusion we're going to get.  So we need to either
> change the parameter, the explain output, or brace ourselves for endless
> repeated questions.

I get where you're coming from, too -- I think our positions are very close.

The only reason I favor defining parallel_degree = 1, rather than
doing what Tom proposes to do with that patch, is that we might as
well use the prevailing terminology used by SQL Server and Oracle (as
long as we match those semantics). Also, I think that number of cores
used is a more important consideration for users than the number of
workers used.

Users will definitely be confused about workers used vs. cores used,
but I don't think that any proposal fixes that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Josh berkus  writes:
> Is there a thread on how we determined this default of 2?  I can't find
> one under likely search terms.

The 9.6 open-items list cites

https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Joshua D. Drake


On 05/31/2016 11:05 AM, Josh berkus wrote:

On 05/31/2016 11:00 AM, Tom Lane wrote:

!  If this occurs, the plan will run with fewer workers than expected,
!  which may be inefficient.  The default value is 2.  Setting this
!  value to 0 disables parallel query execution.


Is there a thread on how we determined this default of 2?  I can't find
one under likely search terms.

I'm concerned about the effect of overallocating parallel workers on
systems which are already running out of cores (e.g. AWS instances), and
run with default settings.  Possibly max_parallel_workers takes care of
this, which is why I want to understand the logic here.



I don't remember the thread but I seem to recall that the default of 2 
is explicitly during Beta, RC etc... so that we can see what happens. 
Robert could speak better to that.


JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 11:09 AM, Peter Geoghegan  wrote:
> The only reason I favor defining parallel_degree = 1

I meant "redefining max_parallel_degree =1 to effectively disable
parallel query", of course.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tuesday, May 31, 2016, Tom Lane  wrote:

> I wrote:
> > I really think that a GUC named "max_parallel_workers", which in fact
> > limits the number of workers and not something else, is the way to go.
>
> To be concrete, I suggest comparing the attached documentation patch
> with Robert's.  Which one is more understandable?
>
> (I have not bothered preparing code changes to go with this, but am
> willing to do so.)
>
>
If going this route I'd still rather add the word "assisting"
or "additional" directly into the guc name so the need to read the docs to
determine inclusive or exclusive of the leader is alleviated.

Robert that it would be confusing but it cannot be worse than this...

David J.

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Eisentraut


On 5/31/16 2:02 PM, Josh berkus wrote:

I get where you're coming from, but I think Haas's query plan output is
going to show us the confusion we're going to get.  So we need to either
change the parameter, the explain output, or brace ourselves for endless
repeated questions.


Changing the explain output doesn't sound so bad to me.

The users' problem is that the parameter setting ought to match the 
EXPLAIN output.


The developers' problem is that the EXPLAIN output actually corresponds 
to leader + (N-1) workers internally.


I think we can hope that developers are going to be less confused about 
that than users.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tuesday, May 31, 2016, Peter Geoghegan  wrote:

> On Tue, May 31, 2016 at 11:02 AM, Josh berkus  > wrote:
> > I get where you're coming from, but I think Haas's query plan output is
> > going to show us the confusion we're going to get.  So we need to either
> > change the parameter, the explain output, or brace ourselves for endless
> > repeated questions.
>
> I get where you're coming from, too -- I think our positions are very
> close.
>
> The only reason I favor defining parallel_degree = 1, rather than
> doing what Tom proposes to do with that patch, is that we might as
> well use the prevailing terminology used by SQL Server and Oracle (as
> long as we match those semantics). Also, I think that number of cores
> used is a more important consideration for users than the number of
> workers used.
>
> Users will definitely be confused about workers used vs. cores used,
> but I don't think that any proposal fixes that.
>
>
Ideally each worker gets its own core - though obviously physical
limitations apply.  If that's not the case then, yes, at least one user is
confused...

David J.

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

"David G. Johnston"  writes:
> On Tuesday, May 31, 2016, Tom Lane  wrote:
>>> I really think that a GUC named "max_parallel_workers", which in fact
>>> limits the number of workers and not something else, is the way to go.

> If going this route I'd still rather add the word "assisting"
> or "additional" directly into the guc name so the need to read the docs to
> determine inclusive or exclusive of the leader is alleviated.

Dunno, "max_assisting_parallel_workers" seems awfully wordy and not
remarkably clearer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 11:17 AM, Peter Eisentraut
 wrote:
> Changing the explain output doesn't sound so bad to me.
>
> The users' problem is that the parameter setting ought to match the EXPLAIN
> output.
>
> The developers' problem is that the EXPLAIN output actually corresponds to
> leader + (N-1) workers internally.
>
> I think we can hope that developers are going to be less confused about that
> than users.

+1.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread Peter Eisentraut


On 5/31/16 1:47 PM, Jaime Casanova wrote:

Are we going to change synchronous_standby_names? Certainly the GUC is
not *only* a list of names anymore.

synchronous_standby_config?
synchronous_standbys (adjust to correct english if necesary)?


If the existing values are still going to be accepted, then I would 
leave it as is.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

"David G. Johnston"  writes:
> On Tuesday, May 31, 2016, Tom Lane  wrote:
>> I really think that a GUC named "max_parallel_workers", which in fact
>> limits the number of workers and not something else, is the way to go.

> What is your opinion on the internal name for this?  Leave it as "degree"?

If that proposal is accepted, I am willing to go through the code and
rename everything internally to match (and, in fact, would insist on
that happening).  But I think the current debate is primarily around
how we present this to users, so talking about the documentation seems
sufficient for the moment.

> The change from two to three for the default sneaked in there...

Well, that would be the same effective value under Robert's patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 11:17 AM, Peter Eisentraut wrote:
> On 5/31/16 2:02 PM, Josh berkus wrote:
>> I get where you're coming from, but I think Haas's query plan output is
>> going to show us the confusion we're going to get.  So we need to either
>> change the parameter, the explain output, or brace ourselves for endless
>> repeated questions.
> 
> Changing the explain output doesn't sound so bad to me.
> 
> The users' problem is that the parameter setting ought to match the
> EXPLAIN output.
> 
> The developers' problem is that the EXPLAIN output actually corresponds
> to leader + (N-1) workers internally.
> 
> I think we can hope that developers are going to be less confused about
> that than users.

Makes sense.

One more consistency question: what's the effect of running out of
max_parallel_workers?

That is, say max_parallel_workers is set to 10, and 8 are already
allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

Presumably the leader isn't counted towards max_parallel_workers?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 11:22 AM, Josh berkus  wrote:
>> I think we can hope that developers are going to be less confused about
>> that than users.
>
> Makes sense.

Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
can instead use a slightly broader terminology, possibly including the
term "core".

> One more consistency question: what's the effect of running out of
> max_parallel_workers?
>
> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

Well, it depends on the planner, of course. But when constrained only
by the availability of worker processes, then your example could use 3
cores -- the 2 remaining parallel workers, plus the leader itself.

> Presumably the leader isn't counted towards max_parallel_workers?

Exactly.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 2:22 PM, Josh berkus  wrote:

> On 05/31/2016 11:17 AM, Peter Eisentraut wrote:
> > On 5/31/16 2:02 PM, Josh berkus wrote:
> >> I get where you're coming from, but I think Haas's query plan output is
> >> going to show us the confusion we're going to get.  So we need to either
> >> change the parameter, the explain output, or brace ourselves for endless
> >> repeated questions.
> >
> > Changing the explain output doesn't sound so bad to me.
> >
> > The users' problem is that the parameter setting ought to match the
> > EXPLAIN output.
> >
> > The developers' problem is that the EXPLAIN output actually corresponds
> > to leader + (N-1) workers internally.
> >
> > I think we can hope that developers are going to be less confused about
> > that than users.
>
> Makes sense.
>
> One more consistency question: what's the effect of running out of
> max_parallel_workers?
>
> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?
>
> Presumably the leader isn't counted towards max_parallel_workers?
>

You'd have three O/S processes - the one dedicated to your session and
you'd pick up two additional processes from the worker pool to assist.

How the O/S assigns those to cores is outside PostgreSQL's jurisdiction.

David J.

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Josh berkus  writes:
> One more consistency question: what's the effect of running out of
> max_parallel_workers?

ITYM max_worker_processes (ie, the cluster-wide pool size)?

> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

One of my reasons for liking max_parallel_workers is that you can sensibly
compare it to max_worker_processes to figure out how many workers you're
likely to get.  If you think in terms of "degree" it takes some additional
mental arithmetic to understand what will happen.

> Presumably the leader isn't counted towards max_parallel_workers?

Not under what I'm proposing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 11:27 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 11:22 AM, Josh berkus  wrote:
>>> I think we can hope that developers are going to be less confused about
>>> that than users.
>>
>> Makes sense.
> 
> Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
> can instead use a slightly broader terminology, possibly including the
> term "core".
> 
>> One more consistency question: what's the effect of running out of
>> max_parallel_workers?
>>
>> That is, say max_parallel_workers is set to 10, and 8 are already
>> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?
> 
> Well, it depends on the planner, of course. But when constrained only
> by the availability of worker processes, then your example could use 3
> cores -- the 2 remaining parallel workers, plus the leader itself.
> 
>> Presumably the leader isn't counted towards max_parallel_workers?
 (oops, I meant max_worker_processes above)

So, there's six things we'd like to make consistent to limit user confusion:

1. max_parallel_X and number of cores used

2. max_parallel_X and EXPLAIN output

3. max_parallel_X and gatekeeping via max_worker_processes

4. max_parallel_X and parallelism of operations (i.e. 2 == 2 parallel
scanners)

5. settings in other similar databases (does someone have specific
citations for this)?

6. consistency with other GUCs (0 or -1 to disable settings)

We can't actually make all five things consistent, as some of them are
contradictory; for example, (1) and (3) cannot be reconciled.  So we
need to evaluate which things are more likely to cause confusion.

My general evaluation would be to make the GUC be the number of
additional workers used, not the total number (including the leader).
From my perspective, that makes (2), (3) and (6) consistent, and (4)
cannot be made consistent because different types of parallel nodes
behave different ways (i.e, some are parallel with only 1 additional
worker and some are not).

However, I'm resigned to the fact that user confusion is inevitable
whichever way we choose, and could be persuaded the other way.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 11:29 AM, Tom Lane wrote:
> Josh berkus  writes:
>> One more consistency question: what's the effect of running out of
>> max_parallel_workers?
> 
> ITYM max_worker_processes (ie, the cluster-wide pool size)?

Yes.  Sorry for contributing to the confusion.  Too many
similar-sounding parameter names.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 11:10 AM, Tom Lane wrote:
> Josh berkus  writes:
>> Is there a thread on how we determined this default of 2?  I can't find
>> one under likely search terms.
> 
> The 9.6 open-items list cites
> 
> https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de

Looks like we didn't decide for the release, just the beta.

I can see two ways to go for the final release:

1. Ship with max_parallel_X = 2 (or similar) and a very low
max_worker_processes (e.g. 4).

2. Ship with max_parallel_X = 1 (or 0, depending), and with a generous
max_worker_processes (e.g. 16).

Argument in favor of (1): we want parallelism to work out of the gate
for users running on low-concurrency systems.  These settings would let
some parallelism happen immediately, without overwhelming a 4-to-8-core
system/vm.  Tuning for the user would then be fairly easy, as we could
just tell them "set max_worker_processes to half the number of cores you
have".

Argument in favor of (2): parallelism is potentially risky for .0, and
as a result we want it disabled unless users choose to enable it.
Also, defaulting to off lets users make more use of the parallel_degree
table attribute to just enable parallelism on select tables.

Thoughts?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Josh berkus  writes:
> On 05/31/2016 11:29 AM, Tom Lane wrote:
>> Josh berkus  writes:
>>> One more consistency question: what's the effect of running out of
>>> max_parallel_workers?

>> ITYM max_worker_processes (ie, the cluster-wide pool size)?

> Yes.  Sorry for contributing to the confusion.  Too many
> similar-sounding parameter names.

Yeah, I was thinking the same thing while preparing my docs patch.
At the risk of opening another can of worms, what about renaming
max_worker_processes as well?  It would be a good thing if that
had "cluster" in it somewhere, or something that indicates it's a
system-wide value not a per-session value.  "max_workers_per_cluster"
would answer, though I'm not in love with it particularly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 2:37 PM, Josh berkus  wrote:

> On 05/31/2016 11:27 AM, Peter Geoghegan wrote:
> > On Tue, May 31, 2016 at 11:22 AM, Josh berkus  wrote:
> >>> I think we can hope that developers are going to be less confused about
> >>> that than users.
> >>
> >> Makes sense.
> >
> > Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
> > can instead use a slightly broader terminology, possibly including the
> > term "core".
>
2. max_parallel_X and EXPLAIN output
>

This is helped greatly if we can say:

Leader: xxx
Worker 0: yyy
Worker 1: zzz

David Rowley's  response on April 30th provides an example explain output
should anyone want to an easy source to look at.

https://www.postgresql.org/message-id/cakjs1f8_3y6xhp2xp5yovpetxmucmiwp6zwr_eterqy2fcn...@mail.gmail.com

Getting across the point that session process are not worker processes
seems doable.  The implementation detail that they do indeed perform work
can be made evident in the explain but otherwise left an implementation
detail.

David J.

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra


On 05/31/2016 07:24 PM, Tom Lane wrote:

Tomas Vondra  writes:

On 05/31/2016 06:59 PM, Tom Lane wrote:

I'm confused here --- are you speaking of having removed
if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;
? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.



Ah, you're right. I've made the mistake of writing the e-mail before
drinking any coffee today, and I got distracted by the comment change.



I reformatted the actual check for clock skew, but I do not think I
changed its behavior.



I'm not sure it does not change the behavior, though. request_time only
became unused after you removed the two places that set the value (one
of them in the clock skew check).


Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time.  And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.


I see, in that case it indeed is useless.

I've checked how this worked in 9.2 (before the 9.3 patch that split the 
file per db), and back then last_statsrequest (transformed to 
request_time) was used to decide whether we need to write something. But 
now we do that by simply checking whether the list is empty.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 2:19 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 5/31/16 1:47 PM, Jaime Casanova wrote:
>
>> Are we going to change synchronous_standby_names? Certainly the GUC is
>> not *only* a list of names anymore.
>>
>> synchronous_standby_config?
>> synchronous_standbys (adjust to correct english if necesary)?
>>
>
> If the existing values are still going to be accepted, then I would leave
> it as is.


+1

David J.

Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Josh berkus  writes:
> On 05/31/2016 11:10 AM, Tom Lane wrote:
>> The 9.6 open-items list cites
>> https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de

> Looks like we didn't decide for the release, just the beta.

Indeed.  I think it's premature to have this discussion.  The plan
was to evaluate near the end of beta, when we (hopefully) have a
better feeling for how buggy parallel query is likely to be.

> Also, defaulting to off lets users make more use of the parallel_degree
> table attribute to just enable parallelism on select tables.

Well, that's an interesting point.  The current coding is that
parallel_degree is an upper limit on per-table workers, and
max_parallel_degree also limits it.  So if you want parallel scans only on
a small set of tables, parallel_degree is not an especially convenient way
to get to that.  Whether we measure it in workers or cores doesn't change
this conclusion.

It might be worth reconsidering what per-table knobs we should provide
exactly, but that's orthogonal to the main point under discussion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane

Tomas Vondra  writes:
> I've checked how this worked in 9.2 (before the 9.3 patch that split the 
> file per db), and back then last_statsrequest (transformed to 
> request_time) was used to decide whether we need to write something. But 
> now we do that by simply checking whether the list is empty.

Right.  In effect, 9.3 moved the decision about "do we need to write stats
because of this request" from the main loop to pgstat_recv_inquiry()
... but we forgot to incorporate any check for whether the request was
already satisfied into pgstat_recv_inquiry().  We can do that, though,
as per either of the patches under discussion --- and once we do so,
maintaining DBWriteRequest.request_time seems a bit pointless.

It's conceivable that we'd try to implement merging of close-together
requests in a way that would take account of how far back the oldest
unsatisfied request is.  But I think that a global oldest-request time
would be sufficient for that; we don't need to track it per-database.
In any case, it's hard to see exactly how to make that work without
putting a gettimeofday() call into the inner message handling loop,
which I believe we won't want to do on performance grounds.  The previous
speculation about doing writes every N messages or when we have no input
to process seems more likely to lead to a useful answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

I wrote:
> At the risk of opening another can of worms, what about renaming
> max_worker_processes as well?  It would be a good thing if that
> had "cluster" in it somewhere, or something that indicates it's a
> system-wide value not a per-session value.  "max_workers_per_cluster"
> would answer, though I'm not in love with it particularly.

Actually, after a bit more thought, maybe "max_background_workers" would
be a good name?  That matches its existing documentation more closely:

 Sets the maximum number of background processes that the system
 can support.  This parameter can only be set at server start.  The
 default is 8.

However, that would still leave us with max_background_workers as the
cluster-wide limit and max_parallel_workers as the per-query-node limit.
That naming isn't doing all that much to clarify the distinction.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas

On Tue, May 31, 2016 at 3:19 PM, Tom Lane  wrote:
> I wrote:
>> At the risk of opening another can of worms, what about renaming
>> max_worker_processes as well?  It would be a good thing if that
>> had "cluster" in it somewhere, or something that indicates it's a
>> system-wide value not a per-session value.  "max_workers_per_cluster"
>> would answer, though I'm not in love with it particularly.
>
> Actually, after a bit more thought, maybe "max_background_workers" would
> be a good name?  That matches its existing documentation more closely:
>
>  Sets the maximum number of background processes that the system
>  can support.  This parameter can only be set at server start.  The
>  default is 8.
>
> However, that would still leave us with max_background_workers as the
> cluster-wide limit and max_parallel_workers as the per-query-node limit.
> That naming isn't doing all that much to clarify the distinction.

It sure isn't.  Also, I think that we might actually want to add an
additional GUC to prevent the parallel query system from consuming the
entire pool of processes established by max_worker_processes.  If
you're doing anything else with worker processes on your system, you
might well want to say, well, it's OK to have up to 20 worker
processes, but at most 10 of those can be used for parallel queries,
so that the other 10 are guaranteed to be available for whatever other
stuff I'm running that uses the background process facility.  It's
worth remembering that the background worker stuff was originally
invented by Alvaro to allow users to run daemons, not for parallel
query.  So I think in the long run we should have three limits:

1. Cluster-wide limit on number of worker processes for all purposes
(currently, max_worker_processes).

2. Cluster-wide limit on number of worker processes for parallelism
(don't have this yet).

3. Per-operation limit on number of worker processes for parallelism
(currently, max_parallel_degree).

Whatever we rename, there needs to be enough semantic space between #1
and #3 to allow for the possibility - I think the very likely
possibility - that we will eventually also want #2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 3:19 PM, Tom Lane  wrote:

> I wrote:
> > At the risk of opening another can of worms, what about renaming
> > max_worker_processes as well?  It would be a good thing if that
> > had "cluster" in it somewhere, or something that indicates it's a
> > system-wide value not a per-session value.  "max_workers_per_cluster"
> > would answer, though I'm not in love with it particularly.
>
> Actually, after a bit more thought, maybe "max_background_workers" would
> be a good name?  That matches its existing documentation more closely:
>
>  Sets the maximum number of background processes that the system
>  can support.  This parameter can only be set at server start.  The
>  default is 8.
>
> However, that would still leave us with max_background_workers as the
> cluster-wide limit and max_parallel_workers as the per-query-node limit.
> That naming isn't doing all that much to clarify the distinction.
>
>
Node execution is serial, right?

I know work_mem has a provision about possibly using multiples of the
maximum in a single query...

The per-node dynamic does seem important to at least well-document so that
when users see 3 workers on one explain node and 2 on another they aren't
left scratching their heads.  We don't reserve a set number of workers
per-statement but rather they are retrieved as needed during query
execution.

I think I see the same amount of added value in tacking on 'per_cluster" as
you do in adding "additional" to get "max_additional_parallel_workers" -
though further refinement of trying to actively NOT consider a leader a
worker would support the omission of addition[al]...

If we left max_worker_processes as defining only the maximum allowed
non-parallel workers and added a new "max_cluster_workers" that would cap
the combination of both "max_worker_processes" and "max_parallel_workers"...

David J.

Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread Robert Haas

On Tue, May 31, 2016 at 12:34 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, May 26, 2016 at 3:28 PM, Tom Lane  wrote:
>>> 99% of the time, you'd be right.  But this is an unusual case, for the
>>> reasons I mentioned before.
>
>> I tend to agree with Nikolay.  I can't see much upside in making this
>> change.  At best, nothing will break.  At worst, something will break.
>> But how do we actually come out ahead?
>
> We come out ahead by not having to make the documentation more confusing.
>
> Basically, we have the opportunity to fix an ancient mistake here at
> very low cost.  I do not think that doubling down on the mistake is
> a better answer.

I'm not convinced, but we don't have to agree on everything...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 3:35 PM, Robert Haas  wrote:

> On Tue, May 31, 2016 at 3:19 PM, Tom Lane  wrote:
> > I wrote:
> >> At the risk of opening another can of worms, what about renaming
> >> max_worker_processes as well?  It would be a good thing if that
> >> had "cluster" in it somewhere, or something that indicates it's a
> >> system-wide value not a per-session value.  "max_workers_per_cluster"
> >> would answer, though I'm not in love with it particularly.
> >
> > Actually, after a bit more thought, maybe "max_background_workers" would
> > be a good name?  That matches its existing documentation more closely:
> >
> >  Sets the maximum number of background processes that the system
> >  can support.  This parameter can only be set at server start.
> The
> >  default is 8.
> >
> > However, that would still leave us with max_background_workers as the
> > cluster-wide limit and max_parallel_workers as the per-query-node limit.
> > That naming isn't doing all that much to clarify the distinction.
>
> It sure isn't.  Also, I think that we might actually want to add an
> additional GUC to prevent the parallel query system from consuming the
> entire pool of processes established by max_worker_processes.


My mind started going here too...though the question is whether you need a
reserved pool or whether we apply some algorithm if (max_parallel +
max_other > max_cluster).  That algorithm could be configurable (i.e.,
target ratio  20:10 where 20+10 == max_cluster).

David J.

Re: [HACKERS] COMMENT ON, psql and access methods

2016-05-31 Thread Robert Haas

On Fri, May 27, 2016 at 7:53 AM, Michael Paquier
 wrote:
> As far as I can see, COMMENT ON has no support for access methods.
> Wouldn't we want to add it as it is created by a command? On top of
> that, perhaps we could have a backslash command in psql to list the
> supported access methods, like \dam[S]? The system methods would be in
> this case all the in-core ones.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread Nikolay Shaplov

В письме от 31 мая 2016 15:38:38 пользователь Robert Haas написал:

> >>> 99% of the time, you'd be right.  But this is an unusual case, for the
> >>> reasons I mentioned before.
> >> 
> >> I tend to agree with Nikolay.  I can't see much upside in making this
> >> change.  At best, nothing will break.  At worst, something will break.
> >> But how do we actually come out ahead?
> > 
> > We come out ahead by not having to make the documentation more confusing.
> > 
> > Basically, we have the opportunity to fix an ancient mistake here at
> > very low cost.  I do not think that doubling down on the mistake is
> > a better answer.
> 
> I'm not convinced, but we don't have to agree on everything...
I am not convinced too. But I will not argue hard for the patch as far as my 
main goal was to report inconsistency. Through the I consider Tom's proposal 
quite strange...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Alvaro Herrera

Robert Haas wrote:

> Also, I think that we might actually want to add an
> additional GUC to prevent the parallel query system from consuming the
> entire pool of processes established by max_worker_processes.  If
> you're doing anything else with worker processes on your system, you
> might well want to say, well, it's OK to have up to 20 worker
> processes, but at most 10 of those can be used for parallel queries,
> so that the other 10 are guaranteed to be available for whatever other
> stuff I'm running that uses the background process facility.  It's
> worth remembering that the background worker stuff was originally
> invented by Alvaro to allow users to run daemons, not for parallel
> query.

Agreed -- things like pglogical and BDR rely on background workers to do
their jobs.  Many other users of bgworkers have popped up, so I think
it'd be a bad idea if parallel queries are able to monopolize all the
available slots.

> So I think in the long run we should have three limits:
> 
> 1. Cluster-wide limit on number of worker processes for all purposes
> (currently, max_worker_processes).
> 
> 2. Cluster-wide limit on number of worker processes for parallelism
> (don't have this yet).
> 
> 3. Per-operation limit on number of worker processes for parallelism
> (currently, max_parallel_degree).
> 
> Whatever we rename, there needs to be enough semantic space between #1
> and #3 to allow for the possibility - I think the very likely
> possibility - that we will eventually also want #2.

max_background_workers sounds fine to me for #1, and I propose to add #2
in 9.6 rather than wait.  max_total_parallel_query_workers ?  I already
presented my proposal for #3 which, as you noted, nobody endorsed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 12:55 PM, Alvaro Herrera
 wrote:
> Agreed -- things like pglogical and BDR rely on background workers to do
> their jobs.  Many other users of bgworkers have popped up, so I think
> it'd be a bad idea if parallel queries are able to monopolize all the
> available slots.

That never occurred to me. I'd say it could be a serious problem, that
should be fixed promptly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perf Benchmarking and regression.

2016-05-31 Thread Robert Haas

On Fri, May 27, 2016 at 12:37 AM, Andres Freund  wrote:
> I don't think the situation is quite that simple. By *disabling* backend 
> flushing it's also easy to see massive performance regressions.  In 
> situations where shared buffers was configured appropriately for the workload 
> (not the case here IIRC).

On what kind of workload does setting backend_flush_after=0 represent
a large regression vs. the default settings?

I think we have to consider that pgbench and parallel copy are pretty
common things to want to do, and a non-zero default setting hurts
those workloads a LOT.  I have a really hard time believing that the
benefits on other workloads are large enough to compensate for the
slowdowns we're seeing here.  We have nobody writing in to say that
backend_flush_after>0 is making things way better for them, and
Ashutosh and I have independently hit massive slowdowns on unrelated
workloads.  We weren't looking for slowdowns in this patch.  We were
trying to measure other stuff, and ended up tracing the behavior back
to this patch.  That really, really suggests that other people will
have similar experiences.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Alvaro Herrera  writes:
> Robert Haas wrote:
>> So I think in the long run we should have three limits:
>> 
>> 1. Cluster-wide limit on number of worker processes for all purposes
>> (currently, max_worker_processes).
>> 
>> 2. Cluster-wide limit on number of worker processes for parallelism
>> (don't have this yet).
>> 
>> 3. Per-operation limit on number of worker processes for parallelism
>> (currently, max_parallel_degree).
>> 
>> Whatever we rename, there needs to be enough semantic space between #1
>> and #3 to allow for the possibility - I think the very likely
>> possibility - that we will eventually also want #2.

> max_background_workers sounds fine to me for #1, and I propose to add #2
> in 9.6 rather than wait.

+1 to both points.

> max_total_parallel_query_workers ?

The name should be closely related to what we use for #3.  I could go for
max_total_parallel_workers for #2 and max_parallel_workers for #3.
Or maybe max_parallel_workers_total?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus

On 05/31/2016 01:04 PM, Tom Lane wrote:
> The name should be closely related to what we use for #3.  I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

How about parallel_worker_pool?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas

On Tue, May 31, 2016 at 4:04 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Robert Haas wrote:
>>> So I think in the long run we should have three limits:
>>>
>>> 1. Cluster-wide limit on number of worker processes for all purposes
>>> (currently, max_worker_processes).
>>>
>>> 2. Cluster-wide limit on number of worker processes for parallelism
>>> (don't have this yet).
>>>
>>> 3. Per-operation limit on number of worker processes for parallelism
>>> (currently, max_parallel_degree).
>>>
>>> Whatever we rename, there needs to be enough semantic space between #1
>>> and #3 to allow for the possibility - I think the very likely
>>> possibility - that we will eventually also want #2.
>
>> max_background_workers sounds fine to me for #1, and I propose to add #2
>> in 9.6 rather than wait.
>
> +1 to both points.
>
>> max_total_parallel_query_workers ?
>
> The name should be closely related to what we use for #3.  I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

I just want to point out that if we change #1, we're breaking
postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
I'd just leave it alone.

I would propose to call #2 max_parallel_workers and #3
max_parallel_degree, which I think is clear as daylight, but since I
invented all of these names, I guess I'm biased.

In terms of forward-compatibility, one thing to note is that we
currently have only parallel QUERY, but in the future we will no doubt
also have parallel operations that are not queries, like VACUUM and
CLUSTER and ALTER TABLE.  If we put the word "query" into the name of
a GUC, we're committing to the idea that it only applies to queries,
and that there will be some other limit for non-queries.  If we don't
put the word query in there, we are not necessarily committing to the
reverse, but we're at least leaning in that direction.  So far I've
largely dodged having to make a decision about this, because talking
about the degree of parallelism for CLUSTER makes just as much sense
as talking about the degree of parallelism for a query, but we could
also decide to have e.g. maintenance_parallel_degree instead of
max_parallel_degree for such operations.  But as we commit to names
it's something to be aware of.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 3:55 PM, Nikolay Shaplov 
wrote:

> В письме от 31 мая 2016 15:38:38 пользователь Robert Haas написал:
>
> > >>> 99% of the time, you'd be right.  But this is an unusual case, for
> the
> > >>> reasons I mentioned before.
> > >>
> > >> I tend to agree with Nikolay.  I can't see much upside in making this
> > >> change.  At best, nothing will break.  At worst, something will break.
> > >> But how do we actually come out ahead?
> > >
> > > We come out ahead by not having to make the documentation more
> confusing.
> > >
> > > Basically, we have the opportunity to fix an ancient mistake here at
> > > very low cost.  I do not think that doubling down on the mistake is
> > > a better answer.
> >
> > I'm not convinced, but we don't have to agree on everything...
> I am not convinced too. But I will not argue hard for the patch as far as
> my
> main goal was to report inconsistency. Through the I consider Tom's
> proposal
> quite strange...
>
>
We've recently chosen to not document the "ANALYZE -> ANALYSE" syntax, and
I'm sure there are other examples, so I don't see why the status quo
(pre-Tom's patch) is unacceptable...if adding USING to the synopsis is
prone to cause confusion then don't; but lets not break existing uses that
in no way harm the project.

Otherwise I presume Tom is correct that the true fix is more than a single
word in one file of our documentation.  If you want to see it stay and be
documented there needs to be a complete proposal that at least gets, even
if grudging, approval from a couple of people and a committer.

David J.

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Alvaro Herrera

Robert Haas wrote:

> I just want to point out that if we change #1, we're breaking
> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
> I'd just leave it alone.

We can add the old name as a synonym in guc.c to maintain compatibility.

> I would propose to call #2 max_parallel_workers and #3
> max_parallel_degree, which I think is clear as daylight, but since I
> invented all of these names, I guess I'm biased.

You are biased :-)

> In terms of forward-compatibility, one thing to note is that we
> currently have only parallel QUERY, but in the future we will no doubt
> also have parallel operations that are not queries, like VACUUM and
> CLUSTER and ALTER TABLE.  If we put the word "query" into the name of
> a GUC, we're committing to the idea that it only applies to queries,
> and that there will be some other limit for non-queries.  If we don't
> put the word query in there, we are not necessarily committing to the
> reverse, but we're at least leaning in that direction.  So far I've
> largely dodged having to make a decision about this, because talking
> about the degree of parallelism for CLUSTER makes just as much sense
> as talking about the degree of parallelism for a query, but we could
> also decide to have e.g. maintenance_parallel_degree instead of
> max_parallel_degree for such operations.  But as we commit to names
> it's something to be aware of.

This is a very good point.

I think parallel maintenance commands are going to require different
tuning than different queries, and I'd rather have separate parameters
for those things rather than force the same parameter being changed over
and over depending on what is the session going to execute next.

Also, I think your proposed change from "max" to "maintenance" does not
make much sense; I'd rather have max_maintenance_parallel_degree.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

2016-05-31 Thread Jeffrey.Marshall

Hi Folks!

The permissions on the RECOVERYXLOG file at the time of the error are 0400:
-r. 1 postgres postgres 16777216 May 31 09:51 RECOVERYXLOG

I sent that info to Tom earlier this afternoon (still learning the posting 
protocols - sorry) - his response is below:

From Tom:

Ah, that confirms the diagnosis --- the new fsync code is assuming that all 
files it might need to fsync are writable by the postgres user.

What would be useful to know is how it got to be that way rather than
0600 (-rw---).  Seems like it must be a property of your archive restore 
script, but we don't have the details.  Maybe you could show the script?

Also, I don't recall whether you said exactly what platform you're on, but 
that's probably important here as well.

regards, tom lane


OPERATING SYSTEM INFORMATION:

The database server is running on a Redhat Linux host (Red Hat Enterprise Linux 
Server release 6.8 (Santiago)) -

[postgres-pt(at)postXX pg_xlog]$ uname -a
Linux  2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 
x86_64 x86_64 GNU/Linux


The user account that creates the backup files is the same account that is used 
for performing the restores - and at no time are any elevated privileges used.  
The script we use for restoring our databases is not overly complex - a quick 
summary is below:

Save config files
Remove tablespace directories  
Remove cluster directory
untar tablespace backup files
untar cluster backup file
copy back the config files

At this point in the script the pg_xlog directory is cleaned out - there are no 
files in the pg_xlog directory when the restore starts.  The database server is 
then started.


Below are the contents of the pg_xlog directory from the unsuccessful restore 
attempts:

[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX
[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0


[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
pg_ctl: another server might be running; trying to start server anyway
server starting
2016-05-31 14:59:18 EDT 574ddf06.7d10 [1-1] user=,db=,remote= LOG:  loaded 
library "pg_stat_statements"


[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0


Even if the permissions on the RECOVERYXLOG file are change (0777 in this case) 
and the file is left in the pg_xlog directory, the postgres process will 
recreate the file as 0400:



[postgres@postZZ postXX]$ chmod 777 RECOVERYXLOG
[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-rwxrwxrwx 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 14:59:46 EDT 574ddf21.7d25 [1-1] user=,db=,remote= LOG:  loaded 
library "pg_stat_statements"


[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0



Interestingly enough, in the 9.2.15 version (successful restore), the file is 
also created 0400:



[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX


[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

[postgres@postZZ postXX]$  pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 15:36:27 EDT 574de7bb.7eff [1-1] user=,db=,remote= LOG:  loaded 
library "pg_stat_statements"


[postgres@postZZ postXX]$ ls -lR
.:
total 65544
-r 1 postgres postgres 16777216 May 31 14:46 000100050022
-rw--- 1 postgres postgres   56 May 31 14:46 0002.history
-rw--- 1 postgres postgres 16777216 May 31 14:46 000200050022
-rw--- 1 postgres postgres 16777216 May 31 14:46 000200050023
-rw--- 1 postgres postgres 16777216 May 31 14:46 000200050024
drwx-- 2 postgres postgres 4096 May 31 14:46 archive_status

./archive_status:
total 0
-rw--- 1 postgres postgres 0 May 31 14:46 000100050022.done
-rw--- 1 postgres postgres 0 May 31 14:46 0002.history.done
-rw--- 1 postgres postgres 0 May 31 14:46 000200050023.done





Some additional info.  We are in the initial stages of upgrading from 9.2 to 
9.5.  I upgraded one of our database servers to 9.5.1, then backed up the 
upgraded database with our same backup process.  Using both the 9.5.2 
executables (where the fsync/rename patch was introduced) and the 9.5.3 
executables I was able to successfully restore the 9.5.1 backup (same host/user 
account which produces an error under

[HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David Fetter

Folks,

While querying some JSONB blobs at work in preparation for a massive
rework of the data infrastructure, I ran into things that really
puzzled me, to wit:

SELECT * FROM unnest('["a","b","c"]'::jsonb);
ERROR:  function unnest(jsonb) does not exist

SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
 value 
───
 "a"
 "b"
 "c"
(3 rows)

Similar things happen with the other functions matching

jsonb?_array_elements(_text)?

These functions correctly identify JSON[B] things which are not, at
their top level, arrays, and error out appropriately.

What this hints to me is that json_array_elements() and friends have
access to things that at least in theory UNNEST could have access to.

Is making those things accessible to UNNEST, etc., a reasonable
direction to go?

Another option I came up with is to make functions that match

jsonb?_array_elements(_text)?(_with_ordinality), but that seems
somewhat tedious and error-prone on the maintenance side.

What say?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan

On Tue, May 31, 2016 at 1:27 PM, Alvaro Herrera
 wrote:
> This is a very good point.
>
> I think parallel maintenance commands are going to require different
> tuning than different queries, and I'd rather have separate parameters
> for those things rather than force the same parameter being changed over
> and over depending on what is the session going to execute next.

FWIW, as things stand my parallel CREATE INDEX patch does have a cost
model which is pretty close to the one for parallel sequential scan.
The cost model cares about max_parallel_degree. However, it also
introduces a parallel_degree index storage parameter, which overrides
the cost model to make it indicate the number of parallel workers to
use (presumably, somebody will change master to make parallel_degree
in terms of "participant processes" rather than worker processes soon,
at which point I'll follow their lead).

The storage parameter sticks around, of course, so a REINDEX will
reuse it without asking. (I think CLUSTER should do the same with the
index storage parameter.)

What's novel about this is that for utility statements, you can
override the cost model, and so disregard max_parallel_degree if you
so choose. My guess is that DBAs will frequently want to do so.

I'm not attached to any of this, but that's what I've come up with to
date. Possibly the index storage parameter concept has baggage
elsewhere, which comes up when we later go to parallelize index scans.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 4:12 PM, Robert Haas  wrote:

> On Tue, May 31, 2016 at 4:04 PM, Tom Lane  wrote:
> > Alvaro Herrera  writes:
> >> Robert Haas wrote:
> >>> So I think in the long run we should have three limits:
> >>>
> >>> 1. Cluster-wide limit on number of worker processes for all purposes
> >>> (currently, max_worker_processes).
> >>>
> >>> 2. Cluster-wide limit on number of worker processes for parallelism
> >>> (don't have this yet).
> >>>
> >>> 3. Per-operation limit on number of worker processes for parallelism
> >>> (currently, max_parallel_degree).
> >>>
> >>> Whatever we rename, there needs to be enough semantic space between #1
> >>> and #3 to allow for the possibility - I think the very likely
> >>> possibility - that we will eventually also want #2.
> >
> >> max_background_workers sounds fine to me for #1, and I propose to add #2
> >> in 9.6 rather than wait.
> >
> > +1 to both points.
> >
> >> max_total_parallel_query_workers ?
> >
> > The name should be closely related to what we use for #3.  I could go for
> > max_total_parallel_workers for #2 and max_parallel_workers for #3.
> > Or maybe max_parallel_workers_total?
>
> I just want to point out that if we change #1, we're breaking
> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
> I'd just leave it alone.
>

+1

We shall assume that the DBA has set the value of max_worker_processes
according to hardware specifications without regard to what exactly could
be parallelized.

>
> I would propose to call #2 max_parallel_workers and #3
> max_parallel_degree, which I think is clear as daylight, but since I
> invented all of these names, I guess I'm biased.
>

So we have operation, session, and cluster scope yet no way to know which
is which without memorizing the documentation.

While we cannot enforce this I'd say any of these that are intended to be
changed by the user should have functions published as their official API.
The name of the function can be made to be user friendly.  For all the
others pick a name with something closer to the 64 character limit that is
as expressive as possible so that seeing the name in postgresql.conf is all
person really needs to see to know that they are changing the right thing.

I say you expectations are off if you want to encode these differences by
using workers and degree at the same time.  Lets go for a part-time DBA
vocabulary here.  I get the merit of degree, and by itself it might even be
OK, but in our usage degree is theory while workers are actual and I'd say
people are likely to relate better to the later.

David J.

Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:

> Folks,
>
> While querying some JSONB blobs at work in preparation for a massive
> rework of the data infrastructure, I ran into things that really
> puzzled me, to wit:
>
> SELECT * FROM unnest('["a","b","c"]'::jsonb);
> ERROR:  function unnest(jsonb) does not exist
>
> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
>  value
> ───
>  "a"
>  "b"
>  "c"
> (3 rows)
>
>
I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
we endeavor to emulate.

Having an overloaded form:  is unappealing.
While likely not that common the introduction of an ambiguity makes raises
the bar considerably.

That said we do seem to be lacking any easy way to take a json array and
attempt to convert it directly into a PostgreSQL array.  Just a conversion
is not always going to succeed though the capability seems worthwhile if as
yet unasked for.  The each->convert->array_agg pattern works but is likely
inefficient for homogeneous json array cases.

David J.

Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread Corey Huinker

On Tue, May 31, 2016 at 5:06 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
>
>> Folks,
>>
>> While querying some JSONB blobs at work in preparation for a massive
>> rework of the data infrastructure, I ran into things that really
>> puzzled me, to wit:
>>
>> SELECT * FROM unnest('["a","b","c"]'::jsonb);
>> ERROR:  function unnest(jsonb) does not exist
>>
>> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
>>  value
>> ───
>>  "a"
>>  "b"
>>  "c"
>> (3 rows)
>>
>>
> I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.
>
> Having an overloaded form:  is unappealing.
> While likely not that common the introduction of an ambiguity makes raises
> the bar considerably.
>
> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.
>
> David J.
>

If there is no list of people asking for that function, let me be the first.

In the mean time, I've resigned myself to carting this around from db to
db...

create function jsonb_array_to_text_array(jsonb_arr jsonb) returns text[]
language sql as $$
select  array_agg(r) from jsonb_array_elements_text(jsonb_arr) r;
$$;

Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David Fetter

On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
> 
> > Folks,
> >
> > While querying some JSONB blobs at work in preparation for a massive
> > rework of the data infrastructure, I ran into things that really
> > puzzled me, to wit:
> >
> > SELECT * FROM unnest('["a","b","c"]'::jsonb);
> > ERROR:  function unnest(jsonb) does not exist
> >
> > SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
> >  value
> > ───
> >  "a"
> >  "b"
> >  "c"
> > (3 rows)
> I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.

You cut out the part where I introduced the part that's not
equivalent, so "there is more than one way to do it" isn't on point
here.  More on that specific issue below.

UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
than the json_array_elements-like functions do.  Is it really more
efficient to build and maintain those capabilities separately in the
JSON[B] set-returning functions, or to force our end users to use
atrocious hacks like putting
generate_series(1,jsonb_array_length(foo)) in the target list than
just to make UNNEST and ROWS FROM do the right thing?

> Having an overloaded form:  is unappealing.

On what grounds?

> While likely not that common the introduction of an ambiguity makes
> raises the bar considerably.

What ambiguity?

SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb);
ERROR:  cannot extract elements from an object

The json_array_elements family manages to do the right thing.  Why
would it be harder to make sure UNNEST and ROWS FROM() do so?

> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.

To your earlier point about "there is more than one way to do it," we
have made no attempt to make some sort of language composed of
orthonormal vectors, and the SQL standard actually requires that we
not do so.  For a trivial example, there's

SELECT * FROM foo
and
TABLE foo

which are equivalent and both spelled out in the standard.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane

Alvaro Herrera  writes:
> Robert Haas wrote:
>> I just want to point out that if we change #1, we're breaking
>> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
>> I'd just leave it alone.

> We can add the old name as a synonym in guc.c to maintain compatibility.

I doubt this is much of an issue at this point; max_worker_processes has
only been there a release or so, and surely there are very few people
explicitly setting it, given its limited use-case up to now.  It will be
really hard to change it after 9.6, but I think we could still get away
with that today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David G. Johnston

On Tue, May 31, 2016 at 5:46 PM, David Fetter  wrote:

> On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> > On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
> >
> > > Folks,
> > >
> > > While querying some JSONB blobs at work in preparation for a massive
> > > rework of the data infrastructure, I ran into things that really
> > > puzzled me, to wit:
> > >
> > > SELECT * FROM unnest('["a","b","c"]'::jsonb);
> > > ERROR:  function unnest(jsonb) does not exist
> > >
> > >
> 
> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
> > >  value
> > > ───
> > >  "a"
> > >  "b"
> > >  "c"
> > > (3 rows)
> > I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle
> that
> > we endeavor to emulate.
>
> You cut out the part where I introduced the part that's not
> equivalent, so "there is more than one way to do it" isn't on point
> here.  More on that specific issue below.
>
> UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
> than the json_array_elements-like functions do.

  Is it really more

> efficient to build and maintain those capabilities separately in the
> JSON[B] set-returning functions, or to force our end users to use
> atrocious hacks like putting
> generate_series(1,jsonb_array_length(foo)) in the target list than
> just to make UNNEST and ROWS FROM do the right thing?
>
>
Both of these work in 9.5...



SELECT * FROM ROWS FROM (jsonb_array_elements('["a","b","c"]'::jsonb)) WITH
ORDINALITY
value | ordinality
--
"a" | 1
"b" | 2
"c" | 3


> > Having an overloaded form:  is unappealing.
>
> On what grounds?
>

Aesthetic.  YMAV (your mileage apparently varies).


> > While likely not that common the introduction of an ambiguity makes
> > raises the bar considerably.
>
> What ambiguity?
>
> SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb);
> ERROR:  cannot extract elements from an object
>
>
I stand corrected.  I was thinking you could somehow craft unnest('') but there is no way to auto-convert to "anyarray"...

The json_array_elements family manages to do the right thing.  Why
> would it be harder to make sure UNNEST and ROWS FROM() do so?
>

I have apparently failed to understand your point.  All I saw was that you
wanted "unnest(jsonb)" to work in an identical fashion to
"jsonb_array_elements(jsonb)".  If there is some aspect beyond this being
an aliasing situation then you have failed to communicate it such that I
comprehended that fact.


> > That said we do seem to be lacking any easy way to take a json array and
> > attempt to convert it directly into a PostgreSQL array.  Just a
> conversion
> > is not always going to succeed though the capability seems worthwhile if
> as
> > yet unasked for.  The each->convert->array_agg pattern works but is
> likely
> > inefficient for homogeneous json array cases.
>
> To your earlier point about "there is more than one way to do it," we
> have made no attempt to make some sort of language composed of
> orthonormal vectors, and the SQL standard actually requires that we
> not do so.  For a trivial example, there's
>
> SELECT * FROM foo
> and
> TABLE foo
>
> which are equivalent and both spelled out in the standard.
>
>
And would we have done so had we not been compelled to by the standard?

Maybe it should be enshrined somewhere more obvious but Alvaro, and
indirectly Tom, recently made the same claim[1] against TIMTOWTDI so I feel
justified making such a claim on behalf of the project - as well as my
general personal feeling.  Adding user-friendly UI to the system, to
correct deficiencies, does have merit, but that is not what I personally
see here.

David J.

[1]
https://www.postgresql.org/message-id/20160405164951.GA286879@alvherre.pgsql

Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread Tom Lane

David Fetter  writes:
> On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
>> While likely not that common the introduction of an ambiguity makes
>> raises the bar considerably.

> What ambiguity?

My first thought about it was that

select unnest('{1,2,3}');

would start failing.  But it turns out it already does fail:

ERROR:  function unnest(unknown) is not unique

You get that as a result of the recent introduction of unnest(tsvector),
which we debated a few weeks ago and seem to have decided to leave as-is.
But it failed before 9.6 too, with

ERROR:  could not determine polymorphic type because input has type "unknown"

So at least in this particular case, adding unnest(jsonb) wouldn't be a
problem from the standpoint of not being able to resolve calls that we
could resolve before.

Nonetheless, there *is* an ambiguity here, which is specific to json(b):
what type of array are you expecting to get?  The reason we have both
json[b]_array_elements() and json[b]_array_elements_text() is that there
are plausible use-cases for returning either json or plain text.  It's not
hard to imagine that somebody will want json[b]_array_elements_numeric()
before long, too.  If you want to have an unnest(jsonb) then you will need
to make an arbitrary decision about which type it will return, and that
doesn't seem like an especially great idea to me.

> UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
> than the json_array_elements-like functions do.

AFAICT, this is nonsense.  We did not tie WITH ORDINALITY to UNNEST;
it works for any set-returning function.

regression=# select * from unnest(array[1,2,3]) with ordinality;
 unnest | ordinality 
+
  1 |  1
  2 |  2
  3 |  3
(3 rows)

regression=# select * from jsonb_array_elements('["a","b","c"]'::jsonb) with 
ordinality;
 value | ordinality 
---+
 "a"   |  1
 "b"   |  2
 "c"   |  3
(3 rows)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety tagging of extension functions

2016-05-31 Thread Andreas Karlsson


On 05/31/2016 06:47 PM, Tom Lane wrote:

Given that, your original approach of manually updating proargtypes in the
existing pg_proc row for the functions may be the best way.  Anything else
is going to be more complicated and ultimately will still require at least
one direct catalog update.


It is the least ugly of all the ugly solutions I could think of. I have 
attached a patch which fixes the signatures using this method. I use 
CREATE OR REPLACE FUNCTION to update to catcache. What do you think? Is 
it too ugly?


Andreas


gin-gist-signatures-v1.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread Peter van Hardenberg

The idea of converting a JSONB array to a PG array is appealing and would
potentially be more general-purpose than adding a new unnest. I'm not sure
how feasible either suggestion is.

I will say that I think the current state of affairs is gratuitously
verbose and expects users to memorize a substantial number of long function
names to perform simple tasks.

-p


On Tue, May 31, 2016 at 2:06 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
>
>> Folks,
>>
>> While querying some JSONB blobs at work in preparation for a massive
>> rework of the data infrastructure, I ran into things that really
>> puzzled me, to wit:
>>
>> SELECT * FROM unnest('["a","b","c"]'::jsonb);
>> ERROR:  function unnest(jsonb) does not exist
>>
>> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
>>  value
>> ───
>>  "a"
>>  "b"
>>  "c"
>> (3 rows)
>>
>>
> I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.
>
> Having an overloaded form:  is unappealing.
> While likely not that common the introduction of an ambiguity makes raises
> the bar considerably.
>
> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.
>
> David J.
>



-- 
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: [HACKERS] COMMENT ON, psql and access methods

2016-05-31 Thread Michael Paquier

On Wed, Jun 1, 2016 at 4:52 AM, Robert Haas  wrote:
> On Fri, May 27, 2016 at 7:53 AM, Michael Paquier
>  wrote:
>> As far as I can see, COMMENT ON has no support for access methods.
>> Wouldn't we want to add it as it is created by a command? On top of
>> that, perhaps we could have a backslash command in psql to list the
>> supported access methods, like \dam[S]? The system methods would be in
>> this case all the in-core ones.
>
> +1.

Are there other opinions? That's not a 9.6 blocker IMO, so I could get
patches out for 10.0 if needed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statement timeout

2016-05-31 Thread Tatsuo Ishii

> FWIW, I think the existing behavior is just fine.  It corresponds to what
> PQexec has always done with multi-statement query strings; that is,
> statement_timeout governs the total time to execute the transaction (the
> whole query string, unless you put transaction control commands in there).
> In extended query mode, since you can only put one textual query per Parse
> message, that maps to statement_timeout governing the total time from
> initial Parse to Sync.  Which is what it does.

I've never thought about that. And I cannot imagine anyone is using
that way in extended protocol to simulate multi-statement queries. Is
that documented somewhere?

> What you propose here is not a bug fix but a redefinition of what
> statement_timeout does; and you've made it inconsistent with simple query
> mode.  I don't really think it's an improvement.

>From the document about statement_timeout (config.sgml):

Abort any statement that takes more than the specified number of
milliseconds, starting from the time the command arrives at the server
from the client.  If log_min_error_statement is set to
ERROR or lower, the statement that timed out will also be
logged.  A value of zero (the default) turns this off.

Apparently "starting from the time the command arrives at the server
from the client" referrers to the timing of execute message arrives to
server the in my example. And I think current behavior of our code is
doing different from what he document advertises. Or at least counter
intuitive to users.

> BTW, aren't you missing a re-enable of the timeout for statements after
> the first one?

Will check.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] User demand, and idea, for C-code conversions from JSON arrays to PostgreSQL arrays

2016-05-31 Thread David G. Johnston

All,

Oven in the "JSON[B] arrays are second-class citizens" thread [1] I made
the observation that the only way to get a PostgreSQL array from a JSON
array is via the "elements->cast->array_agg" chain.  For JSON arrays that
are homogeneous in nature the capability to go "directly" from JSON to
json[], text[], bigint[], etc... seems like it would have some value.

A couple, so far, of community members have chimed in.  Maybe this ends up
on a ToDo somewhere but for the moment I figured I'd float the idea on its
own thread since the other one is really about a different (though somewhat
related) topic.

In the spirit of the json_populate_record like functions something with the
following signature seems usable:

as_array(anyarray, jsonb) : anyarray  [2]

so that actual calls look like:

SELECT as_array(null::text[], '["a", "b", "c"]'::jsonb)

For better or worse while every table gets a corresponding composite type
explicitly created types cannot be treated as row types in this situation

SELECT jsonb_populate_record(null::text[], '["a", "b", "c"]'::jsonb)
> ERROR: first argument of jsonb_populate_record must be a row type

Loosening the restriction and allowing jsonb_populate_record to fulfill the
role of the above described "as_array" function would be something to
consider, but likely not possible or worth any effort in trying to make
happen.

David J.

[1]
https://www.postgresql.org/message-id/CADkLM=fSC+otuBmzoJT6Riyksue3HpHgu2=Mofcv=fd0der...@mail.gmail.com

[2] ROTQ[3] - whose idea was it to put the type first and the data second?

[3] Rhetorical Off-Topic Question

1 2 >

1 - 100 of 127 matches

Mail list logo