Re: [HACKERS] Statement timeout
> Oops. Previous example was not appropriate. Here are revised > examples. (in any case, the time consumed at parse and bind are small, > and I omit the CHECK_FOR_INTERRUPTS after these commands) > > [example 1] > > statement_timeout = 3s > > parse(statement1) -- time 0. set statement timout timer > bind(statement1, portal1) > execute(portal1) -- took 2 seconds > CHECK_FOR_INTERRUPTS -- 2 seconds passed since time 0. statement timeout does > not fire > parse(statement2) > bind(statement2, portal2) > execute(portal2) -- took 2 seconds > CHECK_FOR_INTERRUPTS -- 4 seconds passed since time 0. the statement timeout > fires! > > Another example. > > [example 2] > > statement_timeout = 3s > > parse(statement1) -- time 0. set statement timout timer > bind(statement1, portal1) > execute(portal1) -- took 1 second > CHECK_FOR_INTERRUPTS -- 1 second passed since time 0. statement timeout does > not fire > parse(statement2) > bind(statement2, portal2) > execute(portal2) -- took 1 second > CHECK_FOR_INTERRUPTS -- 2 seconds passed since time 0. the statement timeout > fires! > [client is idle for 2 seconds] > sync > CHECK_FOR_INTERRUPTS -- 4 seconds passed since time 0. the statement timeout > fires! > > I think current code is good in detecting statement timeout if each > command execution time actually exceeds the specified timeout. However > it could report false statement timeout in some cases like above. Here is the patch against master branch to deal with the problem. I create new functions in tcop/postgres.c: static void enable_statement_timeout(void); static void disable_statement_timeout(void); Before the code was in start_xact_command() and finish_xact_command() but I want to manage to disable timeout in exec_execute_command, so I separated the code into the new functions. Also start_xact_command() and finish_xact_command() were modified to use these new functions to avoid code duplication. I tested the patch using small C program linked with modified libpq (PQsendQueryParams was modified to issue "flush" message instead of "sync" message). The statement timeout was set to 3 seconds. With these test programs, client sends pg_sleep(2) and flush message then sleep 10 seconds. With current PostgreSQL, this triggers statement timeout while the client is sleeping. With patched PostgreSQL, this does not trigger the timeout as expected. All regression tests passed. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index b185c1b..564855a 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -149,6 +149,11 @@ static bool doing_extended_query_message = false; static bool ignore_till_sync = false; /* + * Flag to keep track of whether we have started statement timeout timer. + */ +static bool st_timeout = false; + +/* * If an unnamed prepared statement exists, it's stored here. * We keep it separate from the hashtable kept by commands/prepare.c * in order to reduce overhead for short-lived queries. @@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *parseTrees); static void drop_unnamed_stmt(void); static void SigHupHandler(SIGNAL_ARGS); static void log_disconnections(int code, Datum arg); +static void enable_statement_timeout(void); +static void disable_statement_timeout(void); /* @@ -2002,6 +2009,11 @@ exec_execute_message(const char *portal_name, long max_rows) * those that start or end a transaction block. */ CommandCounterIncrement(); + + /* + * We need to reset statement timeout if already set. + */ + disable_statement_timeout(); } /* Send appropriate CommandComplete to client */ @@ -2433,14 +2445,10 @@ start_xact_command(void) (errmsg_internal("StartTransactionCommand"))); StartTransactionCommand(); - /* Set statement timeout running, if any */ - /* NB: this mustn't be enabled until we are within an xact */ - if (StatementTimeout > 0) - enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout); - else - disable_timeout(STATEMENT_TIMEOUT, false); - xact_started = true; + + /* Set statement timeout running, if any */ + enable_statement_timeout(); } } @@ -2450,7 +2458,7 @@ finish_xact_command(void) if (xact_started) { /* Cancel any active statement timeout before committing */ - disable_timeout(STATEMENT_TIMEOUT, false); + disable_statement_timeout(); /* Now commit the command */ ereport(DEBUG3, @@ -4510,3 +4518,51 @@ log_disconnections(int code, Datum arg) port->user_name, port->database_name, port->remote_host, port->remote_port[0] ? " port=" : "", port->remote_port))); } + +/* + * Set statement timeout if any. + */ +static void enable_statement_timeout(void) +{ + if (!st_timeout) + { + if (StatementTimeout > 0) + { + + /* + * Sanity check
Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat
Hello, At Fri, 27 May 2016 13:20:20 -0400, Tom Lane wrote in <14603.1464369...@sss.pgh.pa.us> > Kyotaro HORIGUCHI writes: > > By the way, the reason of the "invalid snapshot identifier" is > > that some worker threads try to use it after the connection on > > the first worker closed. > > ... BTW, I don't quite see what the issue is there. The snapshot is > exported from the master session, so errors in worker sessions should not > cause such failures in other workers. And I don't see any such failure > when setting up a scenario that will cause a worker to fail on Linux. > The "invalid snapshot identifier" bleats would make sense if you had > gotten a server-side error (and transaction abort) in the master session, > but I don't see any evidence that that happened in that example. Might be > worth seeing if that's reproducible. The master session died from lack of libz and the failure of compressLevel's propagation already fixed. Some of the children that started transactions after the master's death will get the error. Similary, sudden close of the session of the master child at very early in its transaction could cause the same symptom but it seems not likely if master surely arrives at command-waiting, or "safe", state. If we want prevent it perfectly, one solution could be that non-master children explicitly wait the master to arrive at the "safe" state before starting their transactions. But I suppose it is not needed here. Does this make sense? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
27 мая 2016 г., в 19:57, Vladimir Borodinнаписал(а):-performance+hackers25 мая 2016 г., в 17:33, Vladimir Borodin написал(а):Hi all.We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in * REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb), * REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053), * master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.Shortly:OS PostgreSQL version TPS Avg. latencyRHEL 6 9.4 44898 1.425 msRHEL 6 9.5 26199 2.443 msRHEL 6 9.5 43027 1.487 msUbuntu 14.04 9.4 67458 0.949 msUbuntu 14.04 9.5 64065 0.999 msUbuntu 14.04 9.6 64350 0.995 msThe results above are not really fair, pgbouncer.ini was a bit different on Ubuntu host (application_name_add_host was disabled). Here are the right results with exactly the same configuration:OS PostgreSQL version TPS Avg. latencyRHEL 6 9.4 44898 1.425 msRHEL 6 9.5 26199 2.443 msRHEL 6 9.5 43027 1.487 msUbuntu 14.04 9.4 45971 1.392 msUbuntu 14.04 9.5 40282 1.589 msUbuntu 14.04 9.6 45410 1.409 msIt can be seen that there is a regression for 9.5 in Ubuntu also, but not so significant. We first thought that the reason is 38628db8d8caff21eb6cf8d775c0b2d04cf07b9b (Add memory barriers for PgBackendStatus.st_changecount protocol), but in that case the regression should also be seen in 9.6 also.There also was a bunch of changes in FE/BE communication (like 387da18874afa17156ee3af63766f17efb53c4b9 or 98a64d0bd713cb89e61bef6432befc4b7b5da59e) and that may answer the question of regression in 9.5 and normal results in 9.6. Probably the right way to find the answer is to do bisect. I’ll do it but if some more diagnostics information can help, feel free to ask about it.Yep, bisect confirms that the first bad commit in REL9_5_STABLE is 387da18874afa17156ee3af63766f17efb53c4b9. Full output is attached.And bisect for master branch confirms that the situation became much better after 98a64d0bd713cb89e61bef6432befc4b7b5da59e. Output is also attached.On Ubuntu performance degradation is ~15% and on RHEL it is ~100%. I don’t know what is the cause for different numbers on RHEL and Ubuntu but certainly there is a regression when pgbouncer is connected to postgres through localhost. When I try to connect pgbouncer to postgres through unix-socket performance is constantly bad on all postgres versions.Both servers are for testing but I can easily provide you SSH access only to Ubuntu host if necessary. I can also gather more diagnostics if needed. bisect95.out Description: Binary data bisect96.out Description: Binary data You could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.Below are more details.RHEL 6:postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2693962latency average: 1.425 mstps = 44897.461518 (including connections establishing)tps = 44898.763258 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 1572014latency average: 2.443 mstps = 26198.928627 (including connections establishing)tps = 26199.803363 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2581645latency average: 1.487 mstps = 43025.676995 (including connections establishing)tps = 43027.038275 (excluding connections establishing)postgres@pgload05g ~ $Ubuntu 14.04 (the same hardware):postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of thr
[HACKERS] Binary I/O for isn extension
Hi. Attached is a small patch which adds binary input/output for the types added by the isn extension. Shay isn-binary.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question and suggestion about application binary compatibility policy
On 31/05/2016 08:10, Tsunakawa, Takayuki wrote: From: Michael Meskes [mailto:mes...@postgresql.org] Yes, but Windows users probably don't understand or know it. So, I suggested explicitly describing the application binary compatibility policy in the PostgreSQL manual. What do you think about it? Couldn't you point your customer to the system documentation? Personally I don't think standard system behavior should be documented for each application relying on it but ymmv. I couldn't find appropriate system documentation. Regarding Linux, I remember I saw some HOWTO on tldp.org website which explains the concept of shared library soname, but it's not very friendly for users who just want to know the application binary compatibility policy of PostgreSQL. And I don't think there's suitable documentation on Windows. Even if there is any, users will not be sure whether PostgreSQL follows those platform-specific conventions. They may have doubts about it, because even the product version "PostgreSQL x.y.z" causes misconception that x is the major version and y is the minor one. So, I suggested documenting the compatibility policy for clarification and user friendliness as in the Oracle Database documentation below. http://docs.oracle.com/database/121/UPGRD/app.htm#UPGRD12547 BTW, although this may be a separate topic, it may be better that we add the major version in the library names like libpq5.dll and libecpg6.dll, so that the application can fail to run with the incompatible versions of libpq and libecpg. FYI: https://en.wikipedia.org/wiki/Side-by-side_assembly [Excerpt] Microsoft Visual C++ 2005 and 2008 employ SxS with all C runtime libraries. However, runtime libraries in Visual C++ 2010 no longer use this technology; instead, they include the version number of a DLL in its file name, which means that different versions of one DLL will technically be completely different DLLs now. Any comments on these? If there's no strong objection, I think I'll submit a documentation patch in the future. Regards Takayuki Tsunakawa Hi, on cygwin the postgresql binary package already include the library versions: /usr/bin/cygecpg-6.dll /usr/bin/cygecpg_compat-3.dll /usr/bin/cygpgtypes-3.dll /usr/bin/cygpq-5.dll attached the patch used for the build. Regards Marco --- origsrc/postgresql-9.4.2/src/Makefile.shlib 2015-05-20 00:33:58.0 +0200 +++ src/Makefile.shlib 2015-05-27 23:01:09.379468300 +0200 @@ -267,7 +267,7 @@ endif ifeq ($(PORTNAME), cygwin) LINK.shared = $(CC) -shared ifdef SO_MAJOR_VERSION -shlib = cyg$(NAME)$(DLSUFFIX) +shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX) endif haslibarule = yes endif @@ -359,12 +359,9 @@ ifeq ($(PORTNAME), cygwin) # Cygwin case $(shlib): $(OBJS) | $(SHLIB_PREREQS) - $(CC) $(CFLAGS) -shared -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE) + $(CC) $(CFLAGS) -shared -o $@ -Wl,--out-implib=$(stlib) $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE) -$(stlib): $(OBJS) | $(SHLIB_PREREQS) - rm -f $@ - $(LINK.static) $@ $^ - $(RANLIB) $@ +$(stlib): $(shlib) ; else --- origsrc/postgresql-9.4.2/src/interfaces/libpq/Makefile 2015-05-20 00:33:58.0 +0200 +++ src/interfaces/libpq/Makefile 2015-05-27 22:56:43.193200600 +0200 @@ -45,7 +45,7 @@ OBJS += ip.o md5.o OBJS += encnames.o wchar.o ifeq ($(PORTNAME), cygwin) -override shlib = cyg$(NAME)$(DLSUFFIX) +override shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX) endif ifeq ($(PORTNAME), win32) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IPv6 link-local addresses and init data type
On 05/31/2016 04:06 AM, Tom Lane wrote: Andreas Karlsson writes: On 05/31/2016 02:37 AM, Haribabu Kommi wrote: The % delimiter character is not only used at the end of the IPV6 address, from the RFC document, it is possible as follows also. fe80::%2/64 we need to handle both the scenarios, it may not be a straight forward to store the zone id data. This case needs to be handled by the parser for at least the cidr type, but I do not think it would make parsing any trickier. Unless there's a semantic difference between fe80::1%2/64 and fe80::1/64%2, this doesn't seem like a big deal to me. As far as I can till only fe80::1%2/64 is valid, but I am not 100% sure. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Binary I/O for isn extension
Hi, Thanks for the patch, you can add it to our commitfest app so it gets reviewed in the next commitfest. https://commitfest.postgresql.org/ A suggestion to make it easier for your patch to be accepted: When adding new functions to an extension you need to bump the version of the extension by renaming the file, updating the .control file, creating an upgrade script, and updating the Makefile to include the new files. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Binary I/O for isn extension
Hello, Attached is a small patch which adds binary input/output for the types added by the isn extension. I added this patch to the next CF (2016-09) under "Miscellaneous". Out of curiosity, what is the motivation? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Binary I/O for isn extension
> > When adding new functions to an extension you need to bump the version of > the extension by renaming the file, updating the .control file, creating an > upgrade script, and updating the Makefile to include the new files. Thanks for the guidance, I'll fix all that and resubmit a patch.
Re: [HACKERS] Binary I/O for isn extension
Thanks for the patch, you can add it to our commitfest app so it gets reviewed in the next commitfest. I did this. A suggestion to make it easier for your patch to be accepted: When adding new functions to an extension you need to bump the version of the extension by renaming the file, updating the .control file, creating an upgrade script, and updating the Makefile to include the new files. Indeed. Moreover I'm not sure that a type may me updated incrementaly to add new send/receive functions. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Binary I/O for isn extension
> > I added this patch to the next CF (2016-09) under "Miscellaneous". > Thanks! > Out of curiosity, what is the motivation? I'm the owner of Npgsql, the open-source .NET driver for PostgreSQL, which is a binary-first driver. That is, working with types that have no binary I/O is possible but awkward. I hope the general direction is (or will be) to have binary I/O for all supported types, both for compatibility with binary-first consumers such as Npgsql and for general efficiency.
Re: [HACKERS] Question and suggestion about application binary compatibility policy
> I couldn't find appropriate system documentation. Regarding Linux, I > remember I saw some HOWTO on tldp.org website which explains the > concept of shared library soname, but it's not very friendly for > users who just want to know the application binary compatibility > policy of PostgreSQL. And I don't think there's suitable I would expect Unix sysadmins to understand that howto, but there are others, e.g. a random hit from google: https://www.bottomupcs.com/libra ries_and_the_linker.xhtml There even is a wikipedia page about it: https://en.wikipedia.org/wiki/ Soname > documentation on Windows. Even if there is any, users will not be > sure whether PostgreSQL follows those platform-specific > conventions. They may have doubts about it, because even the product > version "PostgreSQL x.y.z" causes misconception that x is the major > version and y is the minor one. I don't know anything about the Windows setup in PostgreSQL, but I would find it fair to assume that PostgreSQL follows conventions. > BTW, although this may be a separate topic, it may be better that we > add the major version in the library names like libpq5.dll and > libecpg6.dll, so that the application can fail to run with the > incompatible versions of libpq and libecpg. FYI: Does this mean you cannot have to versions of libpq installed on the same Windows system at the same time? Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Hi, On 05/26/2016 10:10 PM, Tom Lane wrote: Tomas Vondra writes: Attached is a patch that should fix the coalescing, including the clock skew detection. In the end I reorganized the code a bit, moving the check at the end, after the clock skew detection. Otherwise I'd have to do the clock skew detection on multiple places, and that seemed ugly. I hadn't been paying any attention to this thread, I must confess. But I rediscovered this no-coalescing problem while pursuing the poor behavior for shared catalogs that Peter complained of in https://www.postgresql.org/message-id/56ad41ac.1030...@gmx.net I posted a patch at https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us which I think is functionally equivalent to what you have here, but it goes to some lengths to make the code more readable, whereas this is just adding another layer of complication to something that's already a mess (eg, the request_time field is quite useless as-is). So I'd like to propose pushing that in place of this patch ... do you care to review it first? I've reviewed the patch today, and it seems fine to me - correct and achieving the same goal as the patch posted to this thread (plus fixing the issue with shared catalogs and fixing many comments). FWIW do you still plan to back-patch this? Minimizing the amount of changes was one of the things I had in mind when writing "my" patch, which is why I ended up with parts that are less readable. The one change I'm not quite sure about is the removal of clock skew detection in pgstat_recv_inquiry(). You've removed the first check on the inquiry, replacing it with this comment: It seems sufficient to check for clock skew once per write round. But the first check was comparing msg/req, while the second check looks at dbentry/cur_ts. I don't see how those two clock skew check are redundant - if they are, the comment should explain that I guess. Another thing is that if you believe merging requests across databases is a silly idea, maybe we should bite the bullet and replace the list of requests with a single item. I'm not convinced about this, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Logical replication & oldest XID.
Hi, We are using logical replication in multimaster and are faced with some interesting problem with "frozen" procArray->replication_slot_xmin. This variable is adjusted by ProcArraySetReplicationSlotXmin which is invoked by ReplicationSlotsComputeRequiredXmin, which is in turn is called by LogicalConfirmReceivedLocation. If transactions are executed at all nodes of multimaster, then everything works fine: replication_slot_xmin is advanced. But if we send transactions only to one multimaster node and broadcast this changes to other nodes, then no data is send through replications slot at this nodes. No data sends - no confirmations, LogicalConfirmReceivedLocation is not called and procArray->replication_slot_xmin preserves original value 599. As a result GetOldestXmin function always returns 599, so autovacuum is actually blocked and our multimaster is not able to perform cleanup of XID->CSN map, which cause shared memory overflow. This situation happens only when write transactions are sent only to one node or if there are no write transactions at all. Before implementing some workaround (for example forces all of ReplicationSlotsComputeRequiredXmin), I want to understand if it is real problem of logical replication or we are doing something wrong? BDR should be faced with the same problem if all updates are performed from one node... -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner wrote: >> As far as I can see normal index builds will allow concurrent hot >> prunes and everything; since those only require page-level >> exclusive locks. >> >> So for !concurrent builds we might end up with a corrupt index. > > ... by which you mean an index that omits certainly-dead heap > tuples which have been the subject of early pruning or vacuum, even > if there are still registered snapshots that include those tuples? > Or do you see something else? I think that is the danger. > Again, since both the heap pages involved and all the new index > pages would have LSNs recent enough to trigger the old snapshot > check, I'm not sure where the problem is, but will review closely > to see what I might have missed. This is a good point, though, I think. >> I think many of the places relying on heap scans with !mvcc >> snapshots are problematic in that way. Outdated reads will not be >> detected by TestForOldSnapshot() (given the (snapshot)->satisfies >> == HeapTupleSatisfiesMVCC condition therein), and rely on the >> snapshot to be actually working. E.g. CLUSTER/ VACUUM FULL rely >> on accurate HEAPTUPLE_RECENTLY_DEAD > > Don't the "RECENTLY" values imply that the transaction is still > running which cause the tuple to be dead? Since tuples are not > subject to early pruning or vacuum when that is the case, where do > you see a problem? The snapshot itself has the usual xmin et al., > so I'm not sure what you might mean by "the snapshot to be actually > working" if not the override at the time of pruning/vacuuming. Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that is newer that the oldest registered snapshot in the system (based on some snapshots being ignored) might get a return value of HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD. It seems necessary to carefully audit all calls to HeapTupleSatisfiesVacuum() to see whether that difference matters. I took a quick look and here's what I see: statapprox_heap(): Statistical information for the DBA. The difference is non-critical. heap_prune_chain(): Seeing the tuple as dead might cause it to be removed early. This should be OK. Removing the tuple early will cause the page LSN to be bumped unless RelationNeedsWAL() returns false, and TransactionIdLimitedForOldSnapshots() includes that as a condition for disabling early pruning. IndexBuildHeapRangeScan(): We might end up with indexIt = false instead of indexIt = true. That should be OK because anyone using the old snapshot will see a new page LSN and error out. We might also fail to set indexInfo->ii_BrokenHotChain = true. I suspect that's a problem, but I'm not certain of it. acquire_sample_rows: Both return values are treated in the same way. No problem. copy_heap_data: We'll end up setting isdead = true instead of tups_recently_dead += 1. That means that the new heap won't include the tuple, which is OK because old snapshots can't read the new heap without erroring out, assuming that the new heap has LSNs. The xmin used here comes from vacuum_set_xid_limits() which goes through TransactionIdLimitedForOldSnapshots() so this should be OK for the same reasons as heap_prune_chain(). Another effect of seeing the tuple as prematurely dead is that we'll call rewrite_heap_dead_tuple() on it; rewrite_heap_dead_tuple() will presume that if this tuple is dead, its predecessor in the ctid chain is also dead. I don't see any obvious problem with that. lazy_scan_heap(): Basically, the same thing as heap_prune_chain(). CheckForSerializableConflictOut: Maybe a problem? If the tuple is dead, there's no issue, but if it's recently-dead, there might be. We might want to add comments to some of these places addressing snapshot_too_old specifically. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Sun, May 29, 2016 at 1:33 AM, Noah Misch wrote: > On Fri, May 06, 2016 at 02:52:30PM -0400, Robert Haas wrote: >> OK, my reading of this thread is that there is a consensus is to >> redefine max_parallel_degree=1 as "no parallelism" and >> max_parallel_degree>1 as "parallelism using a leader plus N-1 >> workers", and along with that, to keep the names unchanged. However, >> I don't think I can get that done before beta1, at least not without a >> serious risk of breaking stuff. I can look at this post-beta1. > > [This is a generic notification.] > > The above-described topic is currently a PostgreSQL 9.6 open item. Robert, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > 9.6 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within 72 hours of this > message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping 9.6rc1. Consequently, I will appreciate your > efforts toward speedy resolution. Thanks. Here is a patch. Note that I still don't agree with this change, but I'm bowing to the will of the group. I think that some of the people who were in favor of this change should review this patch, including especially the language I wrote for the documentation. If that happens, and the reviews are positive, then I will commit this. If that does not happen, then I will interpret that to mean that there isn't actually all that much interest in changing this after all and will accordingly recommend that this open item be removed without further action. Here is a test which shows how it works: rhaas=# set max_parallel_degree = 100; SET rhaas=# alter table pgbench_accounts set (parallel_degree = 10); ALTER TABLE rhaas=# explain (analyze) select count(*) from pgbench_accounts; QUERY PLAN Finalize Aggregate (cost=177436.04..177436.05 rows=1 width=8) (actual time=383.244..383.244 rows=1 loops=1) -> Gather (cost=177435.00..177436.01 rows=10 width=8) (actual time=383.040..383.237 rows=9 loops=1) Workers Planned: 9 Workers Launched: 8 -> Partial Aggregate (cost=176435.00..176435.01 rows=1 width=8) (actual time=375.569..375.569 rows=1 loops=9) -> Parallel Seq Scan on pgbench_accounts (cost=0.00..173935.00 rows=100 width=0) (actual time=0.103..260.352 rows=111 loops=9) Planning time: 0.135 ms Execution time: 384.387 ms (8 rows) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company parallel-degree-1based.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout
Tatsuo Ishii writes: >> Oops. Previous example was not appropriate. Here are revised >> examples. (in any case, the time consumed at parse and bind are small, >> and I omit the CHECK_FOR_INTERRUPTS after these commands) FWIW, I think the existing behavior is just fine. It corresponds to what PQexec has always done with multi-statement query strings; that is, statement_timeout governs the total time to execute the transaction (the whole query string, unless you put transaction control commands in there). In extended query mode, since you can only put one textual query per Parse message, that maps to statement_timeout governing the total time from initial Parse to Sync. Which is what it does. What you propose here is not a bug fix but a redefinition of what statement_timeout does; and you've made it inconsistent with simple query mode. I don't really think it's an improvement. BTW, aren't you missing a re-enable of the timeout for statements after the first one? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
On Tue, May 31, 2016 at 10:03 AM, Robert Haas wrote: > On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner wrote: >>> As far as I can see normal index builds will allow concurrent hot >>> prunes and everything; since those only require page-level >>> exclusive locks. >>> >>> So for !concurrent builds we might end up with a corrupt index. >> >> ... by which you mean an index that omits certainly-dead heap >> tuples which have been the subject of early pruning or vacuum, even >> if there are still registered snapshots that include those tuples? >> Or do you see something else? > > I think that is the danger. Well, the *perceived* danger -- since every page in the new index would be new enough to be recognized as too recently modified to be safe for a snapshot that could still see the omitted rows, the only question I'm sorting out on this is how safe it might be to cause the index to be ignored in planning when using such a snapshot. That and demonstrating the safe behavior to those not following closely enough to see what will happen without a demonstration. >> Again, since both the heap pages involved and all the new index >> pages would have LSNs recent enough to trigger the old snapshot >> check, I'm not sure where the problem is, > This is a good point, though, I think. The whole perception of risk in this area seems to be based on getting that wrong; although the review of this area may yield some benefit in terms of minimizing false positives. >>> I think many of the places relying on heap scans with !mvcc >>> snapshots are problematic in that way. Outdated reads will not be >>> detected by TestForOldSnapshot() (given the (snapshot)->satisfies >>> == HeapTupleSatisfiesMVCC condition therein), and rely on the >>> snapshot to be actually working. E.g. CLUSTER/ VACUUM FULL rely >>> on accurate HEAPTUPLE_RECENTLY_DEAD >> >> Don't the "RECENTLY" values imply that the transaction is still >> running which cause the tuple to be dead? Since tuples are not >> subject to early pruning or vacuum when that is the case, where do >> you see a problem? The snapshot itself has the usual xmin et al., >> so I'm not sure what you might mean by "the snapshot to be actually >> working" if not the override at the time of pruning/vacuuming. > > Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that > is newer that the oldest registered snapshot in the system (based on > some snapshots being ignored) might get a return value of > HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD. Since the override xmin cannot advance past the earliest transaction which is still active, HEAPTUPLE_DEAD indicates that the transaction causing the tuple to be dead has completed and the tuple is irrevocably dead -- even if there are still snapshots registered which can see it. Seeing HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD is strictly limited to tuples which are certainly and permanently dead and for which the only possible references are non-MVCC snapshots or existing snapshots subject to "snapshot too old" monitoring. > IndexBuildHeapRangeScan(): We might end up with indexIt = false > instead of indexIt = true. That should be OK because anyone using the > old snapshot will see a new page LSN and error out. We might also > fail to set indexInfo->ii_BrokenHotChain = true. I suspect that's a > problem, but I'm not certain of it. The latter flag is what I'm currently digging at; but my hope is that whenever old_snapshot_threshold >= 0 we can set indexInfo->ii_BrokenHotChain = true to cause the planner to skip consideration of the index if the snapshot is an "old" one. That will avoid some false positives (seeing the error when not strictly necessary). If that works out the way I hope, the only down side is that a scan using a snapshot from an old transaction or cursor would use some other index or a heap scan; but we already have that possibility in some cases -- that seems to be the point of the flag. > CheckForSerializableConflictOut: Maybe a problem? If the tuple is > dead, there's no issue, but if it's recently-dead, there might be. If the tuple is not visible to the scan, the behavior is unchanged (a simple return from the function on either HEAPTUPLE_DEAD or HEAPTUPLE_RECENTLY_DEAD with !visible) and (thus) clearly correct. If the tuple is visible to us it is currently subject to early pruning or vacuum (since those operations would get the same modified xmin) but has not yet had any such treatment since we made it to this function in the first place. The processing for SSI purposes would be unaffected by the possibility that there could later be early pruning/vacuuming. Thanks for the review and feedback! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion for --truncate-tables to pg_restore
Hi there, Refering to https://www.postgresql.org/message-id/1352742344.21373.4@mofo I'm running into situations where I'd need to bulk transfer of data tables across servers, but a drop and recreate schema isn't feasible as we are running different permissions etc. on the two databases. Thus my question: Anybody working on getting this into pg_restore proper, and any advice on getting this feature incorporated? HEndrik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat
Kyotaro HORIGUCHI writes: > At Fri, 27 May 2016 13:20:20 -0400, Tom Lane wrote in > <14603.1464369...@sss.pgh.pa.us> >> Kyotaro HORIGUCHI writes: >>> By the way, the reason of the "invalid snapshot identifier" is >>> that some worker threads try to use it after the connection on >>> the first worker closed. >> ... BTW, I don't quite see what the issue is there. > The master session died from lack of libz and the failure of > compressLevel's propagation already fixed. Some of the children > that started transactions after the master's death will get the > error. I don't think I believe that theory, because it would require the master to not notice the lack of libz before it launches worker processes, but instead while the workers are working. But AFAICS, while there are worker processes open, the master does nothing except wait for workers and dispatch new jobs to them; it does no database work of its own. So the libz-isn't-there error has to have occurred in one of the workers. > If we want prevent it perfectly, one solution could be that > non-master children explicitly wait the master to arrive at the > "safe" state before starting their transactions. But I suppose it > is not needed here. Actually, I believe the problem is in archive_close_connection, around line 295 in HEAD: once the master realizes that one child has failed, it first closes its own database connection and only second tries to kill the remaining children. So there's a race condition wherein remaining children have time to see the missing-snapshot error. In the patch I posted yesterday, I reversed the order of those two steps, which should fix this problem in most scenarios: https://www.postgresql.org/message-id/7005.1464657...@sss.pgh.pa.us regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet
On Thu, May 26, 2016 at 3:28 PM, Tom Lane wrote: > Nikolay Shaplov writes: >> Actually I did not expected any discussion for this case. Documentations >> missed an optional keyword, documentation should be fixed. > > 99% of the time, you'd be right. But this is an unusual case, for the > reasons I mentioned before. I tend to agree with Nikolay. I can't see much upside in making this change. At best, nothing will break. At worst, something will break. But how do we actually come out ahead? The only thing you've offered is that it might make it easier to make the relevant documentation pages 100% clear, but I feel like a man of your elocution can probably surmount that impediment. I guess we might save a few parser states, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet
Robert Haas writes: > On Thu, May 26, 2016 at 3:28 PM, Tom Lane wrote: >> 99% of the time, you'd be right. But this is an unusual case, for the >> reasons I mentioned before. > I tend to agree with Nikolay. I can't see much upside in making this > change. At best, nothing will break. At worst, something will break. > But how do we actually come out ahead? We come out ahead by not having to make the documentation more confusing. Basically, we have the opportunity to fix an ancient mistake here at very low cost. I do not think that doubling down on the mistake is a better answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On 5/9/16 10:50 AM, Andres Freund wrote: I think it's a good idea to run a force-parallel run on some buildfarm members. But I'm rather convinced that the core tests run by all animals need some minimal coverage of parallel queries. Both because otherwise it'll be hard to get some coverage of unusual platforms, and because it's imo something rather relevant to test during development. I can confirm that with force_parallel_mode = on, parallel.c does get good test coverage. But I agree that it is a problem that this does not happen in the default test setup. I think we might want to have a new test file that sets force_parallel_mode = on and just runs a few queries to give parallel.c a small workout. For example, I find that the following test file gives almost the same coverage as running the full test suite with f_p_m=on: """ SET force_parallel_mode = on; EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 1; SELECT * FROM tenk1 WHERE unique1 = 1; -- provoke error in worker SELECT stringu1::int2 FROM tenk1; """ While we're at it, one of the things that force_parallel_mode = regress does is remove an error context that is otherwise added to all messages: errcontext("parallel worker, PID %d", *(int32 *) arg); I think we should get rid of that error context altogether, except possibly as a debugging tool, because it's an implementation detail that does not need to be shown to users by default. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] copyParamList
On Fri, May 27, 2016 at 6:07 PM, Andrew Gierth wrote: > copyParamList does not respect from->paramMask, in what looks to me like > an obvious oversight: > > retval->paramMask = NULL; > [...] > /* Ignore parameters we don't need, to save cycles and space. */ > if (retval->paramMask != NULL && > !bms_is_member(i, retval->paramMask)) > > retval->paramMask is never set to anything not NULL in this function, > so surely that should either be initializing it to from->paramMask, or > checking from->paramMask in the conditional? Oh, dear. I think you are right. I'm kind of surprised this didn't provoke a test failure somewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 09:15 AM, Robert Haas wrote: > On Sun, May 29, 2016 at 1:33 AM, Noah Misch wrote: >> On Fri, May 06, 2016 at 02:52:30PM -0400, Robert Haas wrote: >>> OK, my reading of this thread is that there is a consensus is to >>> redefine max_parallel_degree=1 as "no parallelism" and >>> max_parallel_degree>1 as "parallelism using a leader plus N-1 >>> workers", and along with that, to keep the names unchanged. However, >>> I don't think I can get that done before beta1, at least not without a >>> serious risk of breaking stuff. I can look at this post-beta1. >> >> [This is a generic notification.] >> >> The above-described topic is currently a PostgreSQL 9.6 open item. Robert, >> since you committed the patch believed to have created it, you own this open >> item. If some other commit is more relevant or if this does not belong as a >> 9.6 open item, please let us know. Otherwise, please observe the policy on >> open item ownership[1] and send a status update within 72 hours of this >> message. Include a date for your subsequent status update. Testers may >> discover new open items at any time, and I want to plan to get them all fixed >> well in advance of shipping 9.6rc1. Consequently, I will appreciate your >> efforts toward speedy resolution. Thanks. > > Here is a patch. Note that I still don't agree with this change, but > I'm bowing to the will of the group. > > I think that some of the people who were in favor of this change > should review this patch, including especially the language I wrote > for the documentation. If that happens, and the reviews are positive, > then I will commit this. If that does not happen, then I will > interpret that to mean that there isn't actually all that much > interest in changing this after all and will accordingly recommend > that this open item be removed without further action. > > Here is a test which shows how it works: > > rhaas=# set max_parallel_degree = 100; > SET > rhaas=# alter table pgbench_accounts set (parallel_degree = 10); > ALTER TABLE > rhaas=# explain (analyze) select count(*) from pgbench_accounts; > > QUERY PLAN > > Finalize Aggregate (cost=177436.04..177436.05 rows=1 width=8) > (actual time=383.244..383.244 rows=1 loops=1) >-> Gather (cost=177435.00..177436.01 rows=10 width=8) (actual > time=383.040..383.237 rows=9 loops=1) > Workers Planned: 9 > Workers Launched: 8 I realize there's a lot of water under the bridge here, but I think we're going to get 1000 questions on -general of the type: "I asked for 8 parallel workers, why did I only get 7?". I believe we will regret this change. So, one vote from me to revert. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
Andreas Karlsson writes: > So how to best change the function signatures? I do not think it is > possible without locking indexes by just using the SQL commands. You > cannot drop a function from the operator family without dropping the > operator class first. Ugh. I had been thinking that you could use ALTER OPERATOR FAMILY DROP FUNCTION, but these functions are not "loose" in the opfamily --- they're assumed to be bound to the specific opclass. So there's an opclass dependency preventing them from getting dropped; and we have no SQL commands for changing the contents of an opclass. After some fooling around, I couldn't find a way to do it without some manual catalog update or other. Given that, your original approach of manually updating proargtypes in the existing pg_proc row for the functions may be the best way. Anything else is going to be more complicated and ultimately will still require at least one direct catalog update. Sometime we might want to think about making this sort of thing cleaner with more ALTER commands, but post-beta is probably not the time for that; it wouldn't be a small patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Tomas Vondra writes: > On 05/26/2016 10:10 PM, Tom Lane wrote: >> I posted a patch at >> https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us >> which I think is functionally equivalent to what you have here, but >> it goes to some lengths to make the code more readable, whereas this >> is just adding another layer of complication to something that's >> already a mess (eg, the request_time field is quite useless as-is). >> So I'd like to propose pushing that in place of this patch ... do you >> care to review it first? > I've reviewed the patch today, and it seems fine to me - correct and > achieving the same goal as the patch posted to this thread (plus fixing > the issue with shared catalogs and fixing many comments). Thanks for reviewing! > FWIW do you still plan to back-patch this? Minimizing the amount of > changes was one of the things I had in mind when writing "my" patch, > which is why I ended up with parts that are less readable. Yeah, I think it's a bug fix and should be back-patched. I'm not in favor of making things more complicated just to reduce the number of lines a patch touches. > The one change I'm not quite sure about is the removal of clock skew > detection in pgstat_recv_inquiry(). You've removed the first check on > the inquiry, replacing it with this comment: > It seems sufficient to check for clock skew once per write round. > But the first check was comparing msg/req, while the second check looks > at dbentry/cur_ts. I don't see how those two clock skew check are > redundant - if they are, the comment should explain that I guess. I'm confused here --- are you speaking of having removed if (msg->cutoff_time > req->request_time) req->request_time = msg->cutoff_time; ? That is not a check for clock skew, it's intending to be sure that req->request_time reflects the latest request for this DB when we've seen more than one request. But since req->request_time isn't actually being used anywhere, it's useless code. I reformatted the actual check for clock skew, but I do not think I changed its behavior. > Another thing is that if you believe merging requests across databases > is a silly idea, maybe we should bite the bullet and replace the list of > requests with a single item. I'm not convinced about this, though. No, I don't want to do that either. We're not spending much code by having pending_write_requests be a list rather than a single entry, and we might eventually figure out a reasonable way to time the flushes so that we can merge requests. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Josh berkus writes: > I realize there's a lot of water under the bridge here, but I think > we're going to get 1000 questions on -general of the type: "I asked for > 8 parallel workers, why did I only get 7?". I believe we will regret > this change. > So, one vote from me to revert. Well, that gets back to the question of whether average users will understand the "degree" terminology. For the record, while I do not like the current behavior either, this was not the solution I favored. I thought we should rename the GUC and keep it as meaning the number of additional worker processes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 10:03 AM, Tom Lane wrote: > Josh berkus writes: >> I realize there's a lot of water under the bridge here, but I think >> we're going to get 1000 questions on -general of the type: "I asked for >> 8 parallel workers, why did I only get 7?". I believe we will regret >> this change. >> So, one vote from me to revert. > > Well, that gets back to the question of whether average users will > understand the "degree" terminology. For the record, while I do not > like the current behavior either, this was not the solution I favored. > I thought we should rename the GUC and keep it as meaning the number > of additional worker processes. I will happily bet anyone a nice dinner in Ottawa that most users will not understand it. Compare this: "max_parallel is the maximum number of parallel workers which will work on each stage of the query which is parallizable. If you set it to 4, you get up to 4 workers." with this: "max_parallel_degree is the amount of parallelism in the query, with the understanding that the original parent process counts as 1, which means that if you set it to 1 you get no parallelism, and if you want 4 parallel workers you need to set it to 5." Which one of those is going to require more explanations on -general and -novice? Bets? Let's not be complicated for the sake of being complicated. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 10:10 AM, Josh berkus wrote: Compare this: "max_parallel is the maximum number of parallel workers which will work on each stage of the query which is parallizable. If you set it to 4, you get up to 4 workers." with this: "max_parallel_degree is the amount of parallelism in the query, with the understanding that the original parent process counts as 1, which means that if you set it to 1 you get no parallelism, and if you want 4 parallel workers you need to set it to 5." Which one of those is going to require more explanations on -general and -novice? Bets? Let's not be complicated for the sake of being complicated. +1 -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 10:10 AM, Josh berkus wrote: > "max_parallel_degree is the amount of parallelism in the query, with the > understanding that the original parent process counts as 1, which means > that if you set it to 1 you get no parallelism, and if you want 4 > parallel workers you need to set it to 5." > > Which one of those is going to require more explanations on -general and > -novice? Bets? > > Let's not be complicated for the sake of being complicated. But the distinction between parallel workers and backends that can participate in parallel query does need to be user-visible. Worker processes are a commodity (i.e. the user must consider max_worker_processes). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
On 05/31/2016 06:59 PM, Tom Lane wrote: Tomas Vondra writes: On 05/26/2016 10:10 PM, Tom Lane wrote: I posted a patch at https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us which I think is functionally equivalent to what you have here, but it goes to some lengths to make the code more readable, whereas this is just adding another layer of complication to something that's already a mess (eg, the request_time field is quite useless as-is). So I'd like to propose pushing that in place of this patch ... do you care to review it first? I've reviewed the patch today, and it seems fine to me - correct and achieving the same goal as the patch posted to this thread (plus fixing the issue with shared catalogs and fixing many comments). Thanks for reviewing! FWIW do you still plan to back-patch this? Minimizing the amount of changes was one of the things I had in mind when writing "my" patch, which is why I ended up with parts that are less readable. Yeah, I think it's a bug fix and should be back-patched. I'm not in favor of making things more complicated just to reduce the number of lines a patch touches. The one change I'm not quite sure about is the removal of clock skew detection in pgstat_recv_inquiry(). You've removed the first check on the inquiry, replacing it with this comment: It seems sufficient to check for clock skew once per write round. But the first check was comparing msg/req, while the second check looks at dbentry/cur_ts. I don't see how those two clock skew check are redundant - if they are, the comment should explain that I guess. I'm confused here --- are you speaking of having removed if (msg->cutoff_time > req->request_time) req->request_time = msg->cutoff_time; ? That is not a check for clock skew, it's intending to be sure that req->request_time reflects the latest request for this DB when we've seen more than one request. But since req->request_time isn't actually being used anywhere, it's useless code. Ah, you're right. I've made the mistake of writing the e-mail before drinking any coffee today, and I got distracted by the comment change. I reformatted the actual check for clock skew, but I do not think I changed its behavior. I'm not sure it does not change the behavior, though. request_time only became unused after you removed the two places that set the value (one of them in the clock skew check). I'm not sure this is a bad change, though. But there was a dependency between the new request and the preceding one. Another thing is that if you believe merging requests across databases is a silly idea, maybe we should bite the bullet and replace the list of requests with a single item. I'm not convinced about this, though. No, I don't want to do that either. We're not spending much code by having pending_write_requests be a list rather than a single entry, and we might eventually figure out a reasonable way to time the flushes so that we can merge requests. +1 regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 10:16 AM, Peter Geoghegan wrote: > On Tue, May 31, 2016 at 10:10 AM, Josh berkus wrote: >> "max_parallel_degree is the amount of parallelism in the query, with the >> understanding that the original parent process counts as 1, which means >> that if you set it to 1 you get no parallelism, and if you want 4 >> parallel workers you need to set it to 5." >> >> Which one of those is going to require more explanations on -general and >> -novice? Bets? >> >> Let's not be complicated for the sake of being complicated. > > But the distinction between parallel workers and backends that can > participate in parallel query does need to be user-visible. Worker > processes are a commodity (i.e. the user must consider > max_worker_processes). It's still WAY simpler to understand "max_parallel is the number of parallel workers I requested". Any system where you set it to 2 and get only 1 worker on an idle system is going to cause endless queries on the mailing lists. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Tomas Vondra writes: > On 05/31/2016 06:59 PM, Tom Lane wrote: >> I'm confused here --- are you speaking of having removed >> if (msg->cutoff_time > req->request_time) >> req->request_time = msg->cutoff_time; >> ? That is not a check for clock skew, it's intending to be sure that >> req->request_time reflects the latest request for this DB when we've >> seen more than one request. But since req->request_time isn't >> actually being used anywhere, it's useless code. > Ah, you're right. I've made the mistake of writing the e-mail before > drinking any coffee today, and I got distracted by the comment change. >> I reformatted the actual check for clock skew, but I do not think I >> changed its behavior. > I'm not sure it does not change the behavior, though. request_time only > became unused after you removed the two places that set the value (one > of them in the clock skew check). Well, it's unused in the sense that the if-test quoted above is the only place in HEAD that examines the value of request_time. And since that if-test only controls whether we change the value, and not whether we proceed to make the clock skew check, I don't see how it's related to clock skew or indeed anything else at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Josh berkus writes: > On 05/31/2016 10:16 AM, Peter Geoghegan wrote: >> But the distinction between parallel workers and backends that can >> participate in parallel query does need to be user-visible. Worker >> processes are a commodity (i.e. the user must consider >> max_worker_processes). > It's still WAY simpler to understand "max_parallel is the number of > parallel workers I requested". > Any system where you set it to 2 and get only 1 worker on an idle system > is going to cause endless queries on the mailing lists. I really think that a GUC named "max_parallel_workers", which in fact limits the number of workers and not something else, is the way to go. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 10:23 AM, Josh berkus wrote: > It's still WAY simpler to understand "max_parallel is the number of > parallel workers I requested". (Sorry Josh, somehow hit reply, not reply-all) Yes, it is. But as long as parallel workers are not really that distinct to the leader-as-worker when executing a parallel query, then you have another consideration. Which is that you need to care about how many cores your query uses first and foremost, and not the number of parallel workers used. I don't think that having only one worker will cause too much confusion, because users will trust that we won't allow something that simply makes no sense to happen. In my parallel create index patch, the leader participates as a worker to scan and sort runs. It's identical to a worker, practically speaking, at least until time comes to merge those runs. Similarly, parallel aggregate does not really have much for the leader process to do other than act as a worker. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 10:30 AM, Tom Lane wrote: Josh berkus writes: On 05/31/2016 10:16 AM, Peter Geoghegan wrote: But the distinction between parallel workers and backends that can participate in parallel query does need to be user-visible. Worker processes are a commodity (i.e. the user must consider max_worker_processes). It's still WAY simpler to understand "max_parallel is the number of parallel workers I requested". Any system where you set it to 2 and get only 1 worker on an idle system is going to cause endless queries on the mailing lists. I really think that a GUC named "max_parallel_workers", which in fact limits the number of workers and not something else, is the way to go. I agree with Tom here. If we are being pedantic for the sake of being pedantic we are just causing frustration. The term max_parallel_workers is simple, easy to understand and accurate enough. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 10:38 AM, Peter Geoghegan wrote: > On Tue, May 31, 2016 at 10:23 AM, Josh berkus wrote: >> It's still WAY simpler to understand "max_parallel is the number of >> parallel workers I requested". > > (Sorry Josh, somehow hit reply, not reply-all) > > Yes, it is. But as long as parallel workers are not really that > distinct to the leader-as-worker when executing a parallel query, then > you have another consideration. Which is that you need to care about > how many cores your query uses first and foremost, and not the number > of parallel workers used. I don't think that having only one worker > will cause too much confusion, because users will trust that we won't > allow something that simply makes no sense to happen. > > In my parallel create index patch, the leader participates as a worker > to scan and sort runs. It's identical to a worker, practically > speaking, at least until time comes to merge those runs. Similarly, > parallel aggregate does not really have much for the leader process to > do other than act as a worker. In parallel seq scan and join, do the "masters" behave as workers as well? -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Rename synchronous_standby_names?
Hi, Are we going to change synchronous_standby_names? Certainly the GUC is not *only* a list of names anymore. synchronous_standby_config? synchronous_standbys (adjust to correct english if necesary)? -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 10:46 AM, Josh berkus wrote: > In parallel seq scan and join, do the "masters" behave as workers as well? It depends. They will if they can. If the parallel seq scan leader isn't getting enough work to do from workers (enough tuples to process from the shared memory queue), it will start acting as a worker fairly quickly. With parallel aggregate, and some other cases, that will always happen. Even when the leader is consuming input from workers, that's still perhaps pegging one CPU core. So, it doesn't really invalidate what I said about the number of cores being the primary consideration. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
I wrote: > I really think that a GUC named "max_parallel_workers", which in fact > limits the number of workers and not something else, is the way to go. To be concrete, I suggest comparing the attached documentation patch with Robert's. Which one is more understandable? (I have not bothered preparing code changes to go with this, but am willing to do so.) regards, tom lane diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 72b5920..fccde28 100644 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** include_dir 'conf.d' *** 1998,2019 ! !max_parallel_degree (integer) ! max_parallel_degree configuration parameter ! Sets the maximum number of workers that can be started for an ! individual parallel operation. Parallel workers are taken from the ! pool of processes established by ! . Note that the requested ! number of workers may not actually be available at runtime. If this ! occurs, the plan will run with fewer workers than expected, which may ! be inefficient. The default value is 2. Setting this value to 0 ! disables parallel query execution. --- 1998,2020 ! !max_parallel_workers (integer) ! max_parallel_workers configuration parameter ! Sets the maximum number of worker processes that can be used to ! assist an individual parallelizable operation. Parallel workers are ! taken from the cluster-wide pool of processes established by ! . Depending on the size of ! the pool and the number of other parallel queries active, the ! requested number of workers may not actually be available at runtime. ! If this occurs, the plan will run with fewer workers than expected, ! which may be inefficient. The default value is 2. Setting this ! value to 0 disables parallel query execution. diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index d1807ed..d73cf39 100644 *** a/doc/src/sgml/ref/create_table.sgml --- b/doc/src/sgml/ref/create_table.sgml *** CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY *** 909,922 ! parallel_degree (integer) ! The parallel degree for a table is the number of workers that should ! be used to assist a parallel scan of that table. If not set, the ! system will determine a value based on the relation size. The actual ! number of workers chosen by the planner may be less, for example due to ! the setting of . --- 909,923 ! parallel_workers (integer) ! This setting limits the number of parallel worker processes that should ! be used to assist a parallel scan of this table. If not set, the ! planner will select a value based on the relation size. The actual ! number of workers chosen by the planner may be less, for example ! because it is also constrained by ! ; it will not be more. diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml index 38e1967..3d78b00 100644 *** a/doc/src/sgml/release-9.6.sgml --- b/doc/src/sgml/release-9.6.sgml *** *** 153,159 Use of parallel query execution can be controlled through the new configuration parameters ! , , , and . --- 153,159 Use of parallel query execution can be controlled through the new configuration parameters ! , , , and . -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 10:51 AM, Peter Geoghegan wrote: > On Tue, May 31, 2016 at 10:46 AM, Josh berkus wrote: >> In parallel seq scan and join, do the "masters" behave as workers as well? > > It depends. They will if they can. If the parallel seq scan leader > isn't getting enough work to do from workers (enough tuples to process > from the shared memory queue), it will start acting as a worker fairly > quickly. > With parallel aggregate, and some other cases, that will always happen. > > Even when the leader is consuming input from workers, that's still perhaps > pegging one CPU core. So, it doesn't really invalidate what I said about > the number of cores being the primary consideration. > I get where you're coming from, but I think Haas's query plan output is going to show us the confusion we're going to get. So we need to either change the parameter, the explain output, or brace ourselves for endless repeated questions. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename synchronous_standby_names?
Jaime Casanova writes: > Are we going to change synchronous_standby_names? Certainly the GUC is > not *only* a list of names anymore. > synchronous_standby_config? > synchronous_standbys (adjust to correct english if necesary)? I could get behind renaming it to synchronous_standby_config ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?
On 05/31/2016 11:00 AM, Tom Lane wrote: > ! If this occurs, the plan will run with fewer workers than expected, > ! which may be inefficient. The default value is 2. Setting this > ! value to 0 disables parallel query execution. Is there a thread on how we determined this default of 2? I can't find one under likely search terms. I'm concerned about the effect of overallocating parallel workers on systems which are already running out of cores (e.g. AWS instances), and run with default settings. Possibly max_parallel_workers takes care of this, which is why I want to understand the logic here. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tuesday, May 31, 2016, Tom Lane wrote: > Josh berkus > writes: > > On 05/31/2016 10:16 AM, Peter Geoghegan wrote: > >> But the distinction between parallel workers and backends that can > >> participate in parallel query does need to be user-visible. Worker > >> processes are a commodity (i.e. the user must consider > >> max_worker_processes). > > > It's still WAY simpler to understand "max_parallel is the number of > > parallel workers I requested". > > > Any system where you set it to 2 and get only 1 worker on an idle system > > is going to cause endless queries on the mailing lists. > > I really think that a GUC named "max_parallel_workers", which in fact > limits the number of workers and not something else, is the way to go. > > What is your opinion on the internal name for this? Leave it as "degree"? Skimming the patch the number of times that "parallel_degree - 1" is used would concern me. I would also probably reword "Parallel workers are taken from the pool...": changing parallel with the word additional. "Additional workers are taken from the pool...". I'm tending to think we are using the word "parallel" too often and that selective removal would be better. "Sets the maximum number of workers (i.e., processes) that can be used to perform an operation." An introductory section on parallelism can lay out the framework and the concept of parallelism. Sentences like the one above can assume that the reader understands that they work in parallel if the operation allows and there are workers available. The same observation can be applied to "leader". When more than one process is involved a leader is chosen. This is general background to cover upfront. Elsewhere just talk of workers and let the leader be implied. No need to keep repeating "including the leader". In many ways leader is an implementation detail the user doesn't care about. The only impact is that 2 is not twice as good as 1 due to overhead - whatever form that overhead takes. The change from two to three for the default sneaked in there... David J.
Re: [HACKERS] Rename max_parallel_degree?
Peter Geoghegan writes: > Even when the leader is consuming input from workers, that's still perhaps > pegging one CPU core. So, it doesn't really invalidate what I said about > the number of cores being the primary consideration. Agreed, but if we think that people need to be thinking in those terms, maybe the parameter should be "max_parallel_cores". The alternate docs patch I just posted tries to deal with this by describing max_parallel_workers as being the max number of worker processes used to "assist" a parallel query. That was terminology already being used in one place, but not consistently. If we use it consistently, I think it would be sufficient to remind people that they need to figure on one more core for the leader. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 11:02 AM, Josh berkus wrote: > I get where you're coming from, but I think Haas's query plan output is > going to show us the confusion we're going to get. So we need to either > change the parameter, the explain output, or brace ourselves for endless > repeated questions. I get where you're coming from, too -- I think our positions are very close. The only reason I favor defining parallel_degree = 1, rather than doing what Tom proposes to do with that patch, is that we might as well use the prevailing terminology used by SQL Server and Oracle (as long as we match those semantics). Also, I think that number of cores used is a more important consideration for users than the number of workers used. Users will definitely be confused about workers used vs. cores used, but I don't think that any proposal fixes that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?
Josh berkus writes: > Is there a thread on how we determined this default of 2? I can't find > one under likely search terms. The 9.6 open-items list cites https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?
On 05/31/2016 11:05 AM, Josh berkus wrote: On 05/31/2016 11:00 AM, Tom Lane wrote: ! If this occurs, the plan will run with fewer workers than expected, ! which may be inefficient. The default value is 2. Setting this ! value to 0 disables parallel query execution. Is there a thread on how we determined this default of 2? I can't find one under likely search terms. I'm concerned about the effect of overallocating parallel workers on systems which are already running out of cores (e.g. AWS instances), and run with default settings. Possibly max_parallel_workers takes care of this, which is why I want to understand the logic here. I don't remember the thread but I seem to recall that the default of 2 is explicitly during Beta, RC etc... so that we can see what happens. Robert could speak better to that. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 11:09 AM, Peter Geoghegan wrote: > The only reason I favor defining parallel_degree = 1 I meant "redefining max_parallel_degree =1 to effectively disable parallel query", of course. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tuesday, May 31, 2016, Tom Lane wrote: > I wrote: > > I really think that a GUC named "max_parallel_workers", which in fact > > limits the number of workers and not something else, is the way to go. > > To be concrete, I suggest comparing the attached documentation patch > with Robert's. Which one is more understandable? > > (I have not bothered preparing code changes to go with this, but am > willing to do so.) > > If going this route I'd still rather add the word "assisting" or "additional" directly into the guc name so the need to read the docs to determine inclusive or exclusive of the leader is alleviated. Robert that it would be confusing but it cannot be worse than this... David J.
Re: [HACKERS] Rename max_parallel_degree?
On 5/31/16 2:02 PM, Josh berkus wrote: I get where you're coming from, but I think Haas's query plan output is going to show us the confusion we're going to get. So we need to either change the parameter, the explain output, or brace ourselves for endless repeated questions. Changing the explain output doesn't sound so bad to me. The users' problem is that the parameter setting ought to match the EXPLAIN output. The developers' problem is that the EXPLAIN output actually corresponds to leader + (N-1) workers internally. I think we can hope that developers are going to be less confused about that than users. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tuesday, May 31, 2016, Peter Geoghegan wrote: > On Tue, May 31, 2016 at 11:02 AM, Josh berkus > wrote: > > I get where you're coming from, but I think Haas's query plan output is > > going to show us the confusion we're going to get. So we need to either > > change the parameter, the explain output, or brace ourselves for endless > > repeated questions. > > I get where you're coming from, too -- I think our positions are very > close. > > The only reason I favor defining parallel_degree = 1, rather than > doing what Tom proposes to do with that patch, is that we might as > well use the prevailing terminology used by SQL Server and Oracle (as > long as we match those semantics). Also, I think that number of cores > used is a more important consideration for users than the number of > workers used. > > Users will definitely be confused about workers used vs. cores used, > but I don't think that any proposal fixes that. > > Ideally each worker gets its own core - though obviously physical limitations apply. If that's not the case then, yes, at least one user is confused... David J.
Re: [HACKERS] Rename max_parallel_degree?
"David G. Johnston" writes: > On Tuesday, May 31, 2016, Tom Lane wrote: >>> I really think that a GUC named "max_parallel_workers", which in fact >>> limits the number of workers and not something else, is the way to go. > If going this route I'd still rather add the word "assisting" > or "additional" directly into the guc name so the need to read the docs to > determine inclusive or exclusive of the leader is alleviated. Dunno, "max_assisting_parallel_workers" seems awfully wordy and not remarkably clearer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 11:17 AM, Peter Eisentraut wrote: > Changing the explain output doesn't sound so bad to me. > > The users' problem is that the parameter setting ought to match the EXPLAIN > output. > > The developers' problem is that the EXPLAIN output actually corresponds to > leader + (N-1) workers internally. > > I think we can hope that developers are going to be less confused about that > than users. +1. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename synchronous_standby_names?
On 5/31/16 1:47 PM, Jaime Casanova wrote: Are we going to change synchronous_standby_names? Certainly the GUC is not *only* a list of names anymore. synchronous_standby_config? synchronous_standbys (adjust to correct english if necesary)? If the existing values are still going to be accepted, then I would leave it as is. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
"David G. Johnston" writes: > On Tuesday, May 31, 2016, Tom Lane wrote: >> I really think that a GUC named "max_parallel_workers", which in fact >> limits the number of workers and not something else, is the way to go. > What is your opinion on the internal name for this? Leave it as "degree"? If that proposal is accepted, I am willing to go through the code and rename everything internally to match (and, in fact, would insist on that happening). But I think the current debate is primarily around how we present this to users, so talking about the documentation seems sufficient for the moment. > The change from two to three for the default sneaked in there... Well, that would be the same effective value under Robert's patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 11:17 AM, Peter Eisentraut wrote: > On 5/31/16 2:02 PM, Josh berkus wrote: >> I get where you're coming from, but I think Haas's query plan output is >> going to show us the confusion we're going to get. So we need to either >> change the parameter, the explain output, or brace ourselves for endless >> repeated questions. > > Changing the explain output doesn't sound so bad to me. > > The users' problem is that the parameter setting ought to match the > EXPLAIN output. > > The developers' problem is that the EXPLAIN output actually corresponds > to leader + (N-1) workers internally. > > I think we can hope that developers are going to be less confused about > that than users. Makes sense. One more consistency question: what's the effect of running out of max_parallel_workers? That is, say max_parallel_workers is set to 10, and 8 are already allocated. If I ask for max_parallel_X = 4, how many cores to I use? Presumably the leader isn't counted towards max_parallel_workers? -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 11:22 AM, Josh berkus wrote: >> I think we can hope that developers are going to be less confused about >> that than users. > > Makes sense. Maybe EXPLAIN doesn't have to use the term parallel worker at all. It can instead use a slightly broader terminology, possibly including the term "core". > One more consistency question: what's the effect of running out of > max_parallel_workers? > > That is, say max_parallel_workers is set to 10, and 8 are already > allocated. If I ask for max_parallel_X = 4, how many cores to I use? Well, it depends on the planner, of course. But when constrained only by the availability of worker processes, then your example could use 3 cores -- the 2 remaining parallel workers, plus the leader itself. > Presumably the leader isn't counted towards max_parallel_workers? Exactly. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 2:22 PM, Josh berkus wrote: > On 05/31/2016 11:17 AM, Peter Eisentraut wrote: > > On 5/31/16 2:02 PM, Josh berkus wrote: > >> I get where you're coming from, but I think Haas's query plan output is > >> going to show us the confusion we're going to get. So we need to either > >> change the parameter, the explain output, or brace ourselves for endless > >> repeated questions. > > > > Changing the explain output doesn't sound so bad to me. > > > > The users' problem is that the parameter setting ought to match the > > EXPLAIN output. > > > > The developers' problem is that the EXPLAIN output actually corresponds > > to leader + (N-1) workers internally. > > > > I think we can hope that developers are going to be less confused about > > that than users. > > Makes sense. > > One more consistency question: what's the effect of running out of > max_parallel_workers? > > That is, say max_parallel_workers is set to 10, and 8 are already > allocated. If I ask for max_parallel_X = 4, how many cores to I use? > > Presumably the leader isn't counted towards max_parallel_workers? > You'd have three O/S processes - the one dedicated to your session and you'd pick up two additional processes from the worker pool to assist. How the O/S assigns those to cores is outside PostgreSQL's jurisdiction. David J.
Re: [HACKERS] Rename max_parallel_degree?
Josh berkus writes: > One more consistency question: what's the effect of running out of > max_parallel_workers? ITYM max_worker_processes (ie, the cluster-wide pool size)? > That is, say max_parallel_workers is set to 10, and 8 are already > allocated. If I ask for max_parallel_X = 4, how many cores to I use? One of my reasons for liking max_parallel_workers is that you can sensibly compare it to max_worker_processes to figure out how many workers you're likely to get. If you think in terms of "degree" it takes some additional mental arithmetic to understand what will happen. > Presumably the leader isn't counted towards max_parallel_workers? Not under what I'm proposing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 11:27 AM, Peter Geoghegan wrote: > On Tue, May 31, 2016 at 11:22 AM, Josh berkus wrote: >>> I think we can hope that developers are going to be less confused about >>> that than users. >> >> Makes sense. > > Maybe EXPLAIN doesn't have to use the term parallel worker at all. It > can instead use a slightly broader terminology, possibly including the > term "core". > >> One more consistency question: what's the effect of running out of >> max_parallel_workers? >> >> That is, say max_parallel_workers is set to 10, and 8 are already >> allocated. If I ask for max_parallel_X = 4, how many cores to I use? > > Well, it depends on the planner, of course. But when constrained only > by the availability of worker processes, then your example could use 3 > cores -- the 2 remaining parallel workers, plus the leader itself. > >> Presumably the leader isn't counted towards max_parallel_workers? (oops, I meant max_worker_processes above) So, there's six things we'd like to make consistent to limit user confusion: 1. max_parallel_X and number of cores used 2. max_parallel_X and EXPLAIN output 3. max_parallel_X and gatekeeping via max_worker_processes 4. max_parallel_X and parallelism of operations (i.e. 2 == 2 parallel scanners) 5. settings in other similar databases (does someone have specific citations for this)? 6. consistency with other GUCs (0 or -1 to disable settings) We can't actually make all five things consistent, as some of them are contradictory; for example, (1) and (3) cannot be reconciled. So we need to evaluate which things are more likely to cause confusion. My general evaluation would be to make the GUC be the number of additional workers used, not the total number (including the leader). From my perspective, that makes (2), (3) and (6) consistent, and (4) cannot be made consistent because different types of parallel nodes behave different ways (i.e, some are parallel with only 1 additional worker and some are not). However, I'm resigned to the fact that user confusion is inevitable whichever way we choose, and could be persuaded the other way. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 11:29 AM, Tom Lane wrote: > Josh berkus writes: >> One more consistency question: what's the effect of running out of >> max_parallel_workers? > > ITYM max_worker_processes (ie, the cluster-wide pool size)? Yes. Sorry for contributing to the confusion. Too many similar-sounding parameter names. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?
On 05/31/2016 11:10 AM, Tom Lane wrote: > Josh berkus writes: >> Is there a thread on how we determined this default of 2? I can't find >> one under likely search terms. > > The 9.6 open-items list cites > > https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de Looks like we didn't decide for the release, just the beta. I can see two ways to go for the final release: 1. Ship with max_parallel_X = 2 (or similar) and a very low max_worker_processes (e.g. 4). 2. Ship with max_parallel_X = 1 (or 0, depending), and with a generous max_worker_processes (e.g. 16). Argument in favor of (1): we want parallelism to work out of the gate for users running on low-concurrency systems. These settings would let some parallelism happen immediately, without overwhelming a 4-to-8-core system/vm. Tuning for the user would then be fairly easy, as we could just tell them "set max_worker_processes to half the number of cores you have". Argument in favor of (2): parallelism is potentially risky for .0, and as a result we want it disabled unless users choose to enable it. Also, defaulting to off lets users make more use of the parallel_degree table attribute to just enable parallelism on select tables. Thoughts? -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Josh berkus writes: > On 05/31/2016 11:29 AM, Tom Lane wrote: >> Josh berkus writes: >>> One more consistency question: what's the effect of running out of >>> max_parallel_workers? >> ITYM max_worker_processes (ie, the cluster-wide pool size)? > Yes. Sorry for contributing to the confusion. Too many > similar-sounding parameter names. Yeah, I was thinking the same thing while preparing my docs patch. At the risk of opening another can of worms, what about renaming max_worker_processes as well? It would be a good thing if that had "cluster" in it somewhere, or something that indicates it's a system-wide value not a per-session value. "max_workers_per_cluster" would answer, though I'm not in love with it particularly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 2:37 PM, Josh berkus wrote: > On 05/31/2016 11:27 AM, Peter Geoghegan wrote: > > On Tue, May 31, 2016 at 11:22 AM, Josh berkus wrote: > >>> I think we can hope that developers are going to be less confused about > >>> that than users. > >> > >> Makes sense. > > > > Maybe EXPLAIN doesn't have to use the term parallel worker at all. It > > can instead use a slightly broader terminology, possibly including the > > term "core". > 2. max_parallel_X and EXPLAIN output > This is helped greatly if we can say: Leader: xxx Worker 0: yyy Worker 1: zzz David Rowley's response on April 30th provides an example explain output should anyone want to an easy source to look at. https://www.postgresql.org/message-id/cakjs1f8_3y6xhp2xp5yovpetxmucmiwp6zwr_eterqy2fcn...@mail.gmail.com Getting across the point that session process are not worker processes seems doable. The implementation detail that they do indeed perform work can be made evident in the explain but otherwise left an implementation detail. David J.
Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
On 05/31/2016 07:24 PM, Tom Lane wrote: Tomas Vondra writes: On 05/31/2016 06:59 PM, Tom Lane wrote: I'm confused here --- are you speaking of having removed if (msg->cutoff_time > req->request_time) req->request_time = msg->cutoff_time; ? That is not a check for clock skew, it's intending to be sure that req->request_time reflects the latest request for this DB when we've seen more than one request. But since req->request_time isn't actually being used anywhere, it's useless code. Ah, you're right. I've made the mistake of writing the e-mail before drinking any coffee today, and I got distracted by the comment change. I reformatted the actual check for clock skew, but I do not think I changed its behavior. I'm not sure it does not change the behavior, though. request_time only became unused after you removed the two places that set the value (one of them in the clock skew check). Well, it's unused in the sense that the if-test quoted above is the only place in HEAD that examines the value of request_time. And since that if-test only controls whether we change the value, and not whether we proceed to make the clock skew check, I don't see how it's related to clock skew or indeed anything else at all. I see, in that case it indeed is useless. I've checked how this worked in 9.2 (before the 9.3 patch that split the file per db), and back then last_statsrequest (transformed to request_time) was used to decide whether we need to write something. But now we do that by simply checking whether the list is empty. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename synchronous_standby_names?
On Tue, May 31, 2016 at 2:19 PM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 5/31/16 1:47 PM, Jaime Casanova wrote: > >> Are we going to change synchronous_standby_names? Certainly the GUC is >> not *only* a list of names anymore. >> >> synchronous_standby_config? >> synchronous_standbys (adjust to correct english if necesary)? >> > > If the existing values are still going to be accepted, then I would leave > it as is. +1 David J.
Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?
Josh berkus writes: > On 05/31/2016 11:10 AM, Tom Lane wrote: >> The 9.6 open-items list cites >> https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de > Looks like we didn't decide for the release, just the beta. Indeed. I think it's premature to have this discussion. The plan was to evaluate near the end of beta, when we (hopefully) have a better feeling for how buggy parallel query is likely to be. > Also, defaulting to off lets users make more use of the parallel_degree > table attribute to just enable parallelism on select tables. Well, that's an interesting point. The current coding is that parallel_degree is an upper limit on per-table workers, and max_parallel_degree also limits it. So if you want parallel scans only on a small set of tables, parallel_degree is not an especially convenient way to get to that. Whether we measure it in workers or cores doesn't change this conclusion. It might be worth reconsidering what per-table knobs we should provide exactly, but that's orthogonal to the main point under discussion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Tomas Vondra writes: > I've checked how this worked in 9.2 (before the 9.3 patch that split the > file per db), and back then last_statsrequest (transformed to > request_time) was used to decide whether we need to write something. But > now we do that by simply checking whether the list is empty. Right. In effect, 9.3 moved the decision about "do we need to write stats because of this request" from the main loop to pgstat_recv_inquiry() ... but we forgot to incorporate any check for whether the request was already satisfied into pgstat_recv_inquiry(). We can do that, though, as per either of the patches under discussion --- and once we do so, maintaining DBWriteRequest.request_time seems a bit pointless. It's conceivable that we'd try to implement merging of close-together requests in a way that would take account of how far back the oldest unsatisfied request is. But I think that a global oldest-request time would be sufficient for that; we don't need to track it per-database. In any case, it's hard to see exactly how to make that work without putting a gettimeofday() call into the inner message handling loop, which I believe we won't want to do on performance grounds. The previous speculation about doing writes every N messages or when we have no input to process seems more likely to lead to a useful answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
I wrote: > At the risk of opening another can of worms, what about renaming > max_worker_processes as well? It would be a good thing if that > had "cluster" in it somewhere, or something that indicates it's a > system-wide value not a per-session value. "max_workers_per_cluster" > would answer, though I'm not in love with it particularly. Actually, after a bit more thought, maybe "max_background_workers" would be a good name? That matches its existing documentation more closely: Sets the maximum number of background processes that the system can support. This parameter can only be set at server start. The default is 8. However, that would still leave us with max_background_workers as the cluster-wide limit and max_parallel_workers as the per-query-node limit. That naming isn't doing all that much to clarify the distinction. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 3:19 PM, Tom Lane wrote: > I wrote: >> At the risk of opening another can of worms, what about renaming >> max_worker_processes as well? It would be a good thing if that >> had "cluster" in it somewhere, or something that indicates it's a >> system-wide value not a per-session value. "max_workers_per_cluster" >> would answer, though I'm not in love with it particularly. > > Actually, after a bit more thought, maybe "max_background_workers" would > be a good name? That matches its existing documentation more closely: > > Sets the maximum number of background processes that the system > can support. This parameter can only be set at server start. The > default is 8. > > However, that would still leave us with max_background_workers as the > cluster-wide limit and max_parallel_workers as the per-query-node limit. > That naming isn't doing all that much to clarify the distinction. It sure isn't. Also, I think that we might actually want to add an additional GUC to prevent the parallel query system from consuming the entire pool of processes established by max_worker_processes. If you're doing anything else with worker processes on your system, you might well want to say, well, it's OK to have up to 20 worker processes, but at most 10 of those can be used for parallel queries, so that the other 10 are guaranteed to be available for whatever other stuff I'm running that uses the background process facility. It's worth remembering that the background worker stuff was originally invented by Alvaro to allow users to run daemons, not for parallel query. So I think in the long run we should have three limits: 1. Cluster-wide limit on number of worker processes for all purposes (currently, max_worker_processes). 2. Cluster-wide limit on number of worker processes for parallelism (don't have this yet). 3. Per-operation limit on number of worker processes for parallelism (currently, max_parallel_degree). Whatever we rename, there needs to be enough semantic space between #1 and #3 to allow for the possibility - I think the very likely possibility - that we will eventually also want #2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 3:19 PM, Tom Lane wrote: > I wrote: > > At the risk of opening another can of worms, what about renaming > > max_worker_processes as well? It would be a good thing if that > > had "cluster" in it somewhere, or something that indicates it's a > > system-wide value not a per-session value. "max_workers_per_cluster" > > would answer, though I'm not in love with it particularly. > > Actually, after a bit more thought, maybe "max_background_workers" would > be a good name? That matches its existing documentation more closely: > > Sets the maximum number of background processes that the system > can support. This parameter can only be set at server start. The > default is 8. > > However, that would still leave us with max_background_workers as the > cluster-wide limit and max_parallel_workers as the per-query-node limit. > That naming isn't doing all that much to clarify the distinction. > > Node execution is serial, right? I know work_mem has a provision about possibly using multiples of the maximum in a single query... The per-node dynamic does seem important to at least well-document so that when users see 3 workers on one explain node and 2 on another they aren't left scratching their heads. We don't reserve a set number of workers per-statement but rather they are retrieved as needed during query execution. I think I see the same amount of added value in tacking on 'per_cluster" as you do in adding "additional" to get "max_additional_parallel_workers" - though further refinement of trying to actively NOT consider a leader a worker would support the omission of addition[al]... If we left max_worker_processes as defining only the maximum allowed non-parallel workers and added a new "max_cluster_workers" that would cap the combination of both "max_worker_processes" and "max_parallel_workers"... David J.
Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet
On Tue, May 31, 2016 at 12:34 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, May 26, 2016 at 3:28 PM, Tom Lane wrote: >>> 99% of the time, you'd be right. But this is an unusual case, for the >>> reasons I mentioned before. > >> I tend to agree with Nikolay. I can't see much upside in making this >> change. At best, nothing will break. At worst, something will break. >> But how do we actually come out ahead? > > We come out ahead by not having to make the documentation more confusing. > > Basically, we have the opportunity to fix an ancient mistake here at > very low cost. I do not think that doubling down on the mistake is > a better answer. I'm not convinced, but we don't have to agree on everything... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 3:35 PM, Robert Haas wrote: > On Tue, May 31, 2016 at 3:19 PM, Tom Lane wrote: > > I wrote: > >> At the risk of opening another can of worms, what about renaming > >> max_worker_processes as well? It would be a good thing if that > >> had "cluster" in it somewhere, or something that indicates it's a > >> system-wide value not a per-session value. "max_workers_per_cluster" > >> would answer, though I'm not in love with it particularly. > > > > Actually, after a bit more thought, maybe "max_background_workers" would > > be a good name? That matches its existing documentation more closely: > > > > Sets the maximum number of background processes that the system > > can support. This parameter can only be set at server start. > The > > default is 8. > > > > However, that would still leave us with max_background_workers as the > > cluster-wide limit and max_parallel_workers as the per-query-node limit. > > That naming isn't doing all that much to clarify the distinction. > > It sure isn't. Also, I think that we might actually want to add an > additional GUC to prevent the parallel query system from consuming the > entire pool of processes established by max_worker_processes. My mind started going here too...though the question is whether you need a reserved pool or whether we apply some algorithm if (max_parallel + max_other > max_cluster). That algorithm could be configurable (i.e., target ratio 20:10 where 20+10 == max_cluster). David J.
Re: [HACKERS] COMMENT ON, psql and access methods
On Fri, May 27, 2016 at 7:53 AM, Michael Paquier wrote: > As far as I can see, COMMENT ON has no support for access methods. > Wouldn't we want to add it as it is created by a command? On top of > that, perhaps we could have a backslash command in psql to list the > supported access methods, like \dam[S]? The system methods would be in > this case all the in-core ones. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet
В письме от 31 мая 2016 15:38:38 пользователь Robert Haas написал: > >>> 99% of the time, you'd be right. But this is an unusual case, for the > >>> reasons I mentioned before. > >> > >> I tend to agree with Nikolay. I can't see much upside in making this > >> change. At best, nothing will break. At worst, something will break. > >> But how do we actually come out ahead? > > > > We come out ahead by not having to make the documentation more confusing. > > > > Basically, we have the opportunity to fix an ancient mistake here at > > very low cost. I do not think that doubling down on the mistake is > > a better answer. > > I'm not convinced, but we don't have to agree on everything... I am not convinced too. But I will not argue hard for the patch as far as my main goal was to report inconsistency. Through the I consider Tom's proposal quite strange... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Robert Haas wrote: > Also, I think that we might actually want to add an > additional GUC to prevent the parallel query system from consuming the > entire pool of processes established by max_worker_processes. If > you're doing anything else with worker processes on your system, you > might well want to say, well, it's OK to have up to 20 worker > processes, but at most 10 of those can be used for parallel queries, > so that the other 10 are guaranteed to be available for whatever other > stuff I'm running that uses the background process facility. It's > worth remembering that the background worker stuff was originally > invented by Alvaro to allow users to run daemons, not for parallel > query. Agreed -- things like pglogical and BDR rely on background workers to do their jobs. Many other users of bgworkers have popped up, so I think it'd be a bad idea if parallel queries are able to monopolize all the available slots. > So I think in the long run we should have three limits: > > 1. Cluster-wide limit on number of worker processes for all purposes > (currently, max_worker_processes). > > 2. Cluster-wide limit on number of worker processes for parallelism > (don't have this yet). > > 3. Per-operation limit on number of worker processes for parallelism > (currently, max_parallel_degree). > > Whatever we rename, there needs to be enough semantic space between #1 > and #3 to allow for the possibility - I think the very likely > possibility - that we will eventually also want #2. max_background_workers sounds fine to me for #1, and I propose to add #2 in 9.6 rather than wait. max_total_parallel_query_workers ? I already presented my proposal for #3 which, as you noted, nobody endorsed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 12:55 PM, Alvaro Herrera wrote: > Agreed -- things like pglogical and BDR rely on background workers to do > their jobs. Many other users of bgworkers have popped up, so I think > it'd be a bad idea if parallel queries are able to monopolize all the > available slots. That never occurred to me. I'd say it could be a serious problem, that should be fixed promptly. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perf Benchmarking and regression.
On Fri, May 27, 2016 at 12:37 AM, Andres Freund wrote: > I don't think the situation is quite that simple. By *disabling* backend > flushing it's also easy to see massive performance regressions. In > situations where shared buffers was configured appropriately for the workload > (not the case here IIRC). On what kind of workload does setting backend_flush_after=0 represent a large regression vs. the default settings? I think we have to consider that pgbench and parallel copy are pretty common things to want to do, and a non-zero default setting hurts those workloads a LOT. I have a really hard time believing that the benefits on other workloads are large enough to compensate for the slowdowns we're seeing here. We have nobody writing in to say that backend_flush_after>0 is making things way better for them, and Ashutosh and I have independently hit massive slowdowns on unrelated workloads. We weren't looking for slowdowns in this patch. We were trying to measure other stuff, and ended up tracing the behavior back to this patch. That really, really suggests that other people will have similar experiences. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Alvaro Herrera writes: > Robert Haas wrote: >> So I think in the long run we should have three limits: >> >> 1. Cluster-wide limit on number of worker processes for all purposes >> (currently, max_worker_processes). >> >> 2. Cluster-wide limit on number of worker processes for parallelism >> (don't have this yet). >> >> 3. Per-operation limit on number of worker processes for parallelism >> (currently, max_parallel_degree). >> >> Whatever we rename, there needs to be enough semantic space between #1 >> and #3 to allow for the possibility - I think the very likely >> possibility - that we will eventually also want #2. > max_background_workers sounds fine to me for #1, and I propose to add #2 > in 9.6 rather than wait. +1 to both points. > max_total_parallel_query_workers ? The name should be closely related to what we use for #3. I could go for max_total_parallel_workers for #2 and max_parallel_workers for #3. Or maybe max_parallel_workers_total? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 05/31/2016 01:04 PM, Tom Lane wrote: > The name should be closely related to what we use for #3. I could go for > max_total_parallel_workers for #2 and max_parallel_workers for #3. > Or maybe max_parallel_workers_total? How about parallel_worker_pool? -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 4:04 PM, Tom Lane wrote: > Alvaro Herrera writes: >> Robert Haas wrote: >>> So I think in the long run we should have three limits: >>> >>> 1. Cluster-wide limit on number of worker processes for all purposes >>> (currently, max_worker_processes). >>> >>> 2. Cluster-wide limit on number of worker processes for parallelism >>> (don't have this yet). >>> >>> 3. Per-operation limit on number of worker processes for parallelism >>> (currently, max_parallel_degree). >>> >>> Whatever we rename, there needs to be enough semantic space between #1 >>> and #3 to allow for the possibility - I think the very likely >>> possibility - that we will eventually also want #2. > >> max_background_workers sounds fine to me for #1, and I propose to add #2 >> in 9.6 rather than wait. > > +1 to both points. > >> max_total_parallel_query_workers ? > > The name should be closely related to what we use for #3. I could go for > max_total_parallel_workers for #2 and max_parallel_workers for #3. > Or maybe max_parallel_workers_total? I just want to point out that if we change #1, we're breaking postgresql.conf compatibility for, IMHO, not a whole lot of benefit. I'd just leave it alone. I would propose to call #2 max_parallel_workers and #3 max_parallel_degree, which I think is clear as daylight, but since I invented all of these names, I guess I'm biased. In terms of forward-compatibility, one thing to note is that we currently have only parallel QUERY, but in the future we will no doubt also have parallel operations that are not queries, like VACUUM and CLUSTER and ALTER TABLE. If we put the word "query" into the name of a GUC, we're committing to the idea that it only applies to queries, and that there will be some other limit for non-queries. If we don't put the word query in there, we are not necessarily committing to the reverse, but we're at least leaning in that direction. So far I've largely dodged having to make a decision about this, because talking about the degree of parallelism for CLUSTER makes just as much sense as talking about the degree of parallelism for a query, but we could also decide to have e.g. maintenance_parallel_degree instead of max_parallel_degree for such operations. But as we commit to names it's something to be aware of. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet
On Tue, May 31, 2016 at 3:55 PM, Nikolay Shaplov wrote: > В письме от 31 мая 2016 15:38:38 пользователь Robert Haas написал: > > > >>> 99% of the time, you'd be right. But this is an unusual case, for > the > > >>> reasons I mentioned before. > > >> > > >> I tend to agree with Nikolay. I can't see much upside in making this > > >> change. At best, nothing will break. At worst, something will break. > > >> But how do we actually come out ahead? > > > > > > We come out ahead by not having to make the documentation more > confusing. > > > > > > Basically, we have the opportunity to fix an ancient mistake here at > > > very low cost. I do not think that doubling down on the mistake is > > > a better answer. > > > > I'm not convinced, but we don't have to agree on everything... > I am not convinced too. But I will not argue hard for the patch as far as > my > main goal was to report inconsistency. Through the I consider Tom's > proposal > quite strange... > > We've recently chosen to not document the "ANALYZE -> ANALYSE" syntax, and I'm sure there are other examples, so I don't see why the status quo (pre-Tom's patch) is unacceptable...if adding USING to the synopsis is prone to cause confusion then don't; but lets not break existing uses that in no way harm the project. Otherwise I presume Tom is correct that the true fix is more than a single word in one file of our documentation. If you want to see it stay and be documented there needs to be a complete proposal that at least gets, even if grudging, approval from a couple of people and a committer. David J.
Re: [HACKERS] Rename max_parallel_degree?
Robert Haas wrote: > I just want to point out that if we change #1, we're breaking > postgresql.conf compatibility for, IMHO, not a whole lot of benefit. > I'd just leave it alone. We can add the old name as a synonym in guc.c to maintain compatibility. > I would propose to call #2 max_parallel_workers and #3 > max_parallel_degree, which I think is clear as daylight, but since I > invented all of these names, I guess I'm biased. You are biased :-) > In terms of forward-compatibility, one thing to note is that we > currently have only parallel QUERY, but in the future we will no doubt > also have parallel operations that are not queries, like VACUUM and > CLUSTER and ALTER TABLE. If we put the word "query" into the name of > a GUC, we're committing to the idea that it only applies to queries, > and that there will be some other limit for non-queries. If we don't > put the word query in there, we are not necessarily committing to the > reverse, but we're at least leaning in that direction. So far I've > largely dodged having to make a decision about this, because talking > about the degree of parallelism for CLUSTER makes just as much sense > as talking about the degree of parallelism for a query, but we could > also decide to have e.g. maintenance_parallel_degree instead of > max_parallel_degree for such operations. But as we commit to names > it's something to be aware of. This is a very good point. I think parallel maintenance commands are going to require different tuning than different queries, and I'd rather have separate parameters for those things rather than force the same parameter being changed over and over depending on what is the session going to execute next. Also, I think your proposed change from "max" to "maintenance" does not make much sense; I'd rather have max_maintenance_parallel_degree. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file
Hi Folks! The permissions on the RECOVERYXLOG file at the time of the error are 0400: -r. 1 postgres postgres 16777216 May 31 09:51 RECOVERYXLOG I sent that info to Tom earlier this afternoon (still learning the posting protocols - sorry) - his response is below: From Tom: Ah, that confirms the diagnosis --- the new fsync code is assuming that all files it might need to fsync are writable by the postgres user. What would be useful to know is how it got to be that way rather than 0600 (-rw---). Seems like it must be a property of your archive restore script, but we don't have the details. Maybe you could show the script? Also, I don't recall whether you said exactly what platform you're on, but that's probably important here as well. regards, tom lane OPERATING SYSTEM INFORMATION: The database server is running on a Redhat Linux host (Red Hat Enterprise Linux Server release 6.8 (Santiago)) - [postgres-pt(at)postXX pg_xlog]$ uname -a Linux 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux The user account that creates the backup files is the same account that is used for performing the restores - and at no time are any elevated privileges used. The script we use for restoring our databases is not overly complex - a quick summary is below: Save config files Remove tablespace directories Remove cluster directory untar tablespace backup files untar cluster backup file copy back the config files At this point in the script the pg_xlog directory is cleaned out - there are no files in the pg_xlog directory when the restore starts. The database server is then started. Below are the contents of the pg_xlog directory from the unsuccessful restore attempts: [postgres@postZZ postXX]$ pwd /pgdata/pgsrv_xlog_data/postXX [postgres@postZZ postXX]$ ls -lR .: total 4 drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status ./archive_status: total 0 [postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start pg_ctl: another server might be running; trying to start server anyway server starting 2016-05-31 14:59:18 EDT 574ddf06.7d10 [1-1] user=,db=,remote= LOG: loaded library "pg_stat_statements" [postgres@postZZ postXX]$ ls -lR .: total 16388 -r 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status ./archive_status: total 0 Even if the permissions on the RECOVERYXLOG file are change (0777 in this case) and the file is left in the pg_xlog directory, the postgres process will recreate the file as 0400: [postgres@postZZ postXX]$ chmod 777 RECOVERYXLOG [postgres@postZZ postXX]$ ls -lR .: total 16388 -rwxrwxrwx 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status ./archive_status: total 0 [postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start server starting 2016-05-31 14:59:46 EDT 574ddf21.7d25 [1-1] user=,db=,remote= LOG: loaded library "pg_stat_statements" [postgres@postZZ postXX]$ ls -lR .: total 16388 -r 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status ./archive_status: total 0 Interestingly enough, in the 9.2.15 version (successful restore), the file is also created 0400: [postgres@postZZ postXX]$ pwd /pgdata/pgsrv_xlog_data/postXX [postgres@postZZ postXX]$ ls -lR .: total 4 drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status ./archive_status: total 0 [postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start server starting 2016-05-31 15:36:27 EDT 574de7bb.7eff [1-1] user=,db=,remote= LOG: loaded library "pg_stat_statements" [postgres@postZZ postXX]$ ls -lR .: total 65544 -r 1 postgres postgres 16777216 May 31 14:46 000100050022 -rw--- 1 postgres postgres 56 May 31 14:46 0002.history -rw--- 1 postgres postgres 16777216 May 31 14:46 000200050022 -rw--- 1 postgres postgres 16777216 May 31 14:46 000200050023 -rw--- 1 postgres postgres 16777216 May 31 14:46 000200050024 drwx-- 2 postgres postgres 4096 May 31 14:46 archive_status ./archive_status: total 0 -rw--- 1 postgres postgres 0 May 31 14:46 000100050022.done -rw--- 1 postgres postgres 0 May 31 14:46 0002.history.done -rw--- 1 postgres postgres 0 May 31 14:46 000200050023.done Some additional info. We are in the initial stages of upgrading from 9.2 to 9.5. I upgraded one of our database servers to 9.5.1, then backed up the upgraded database with our same backup process. Using both the 9.5.2 executables (where the fsync/rename patch was introduced) and the 9.5.3 executables I was able to successfully restore the 9.5.1 backup (same host/user account which produces an error under
[HACKERS] JSON[B] arrays are second-class citizens
Folks, While querying some JSONB blobs at work in preparation for a massive rework of the data infrastructure, I ran into things that really puzzled me, to wit: SELECT * FROM unnest('["a","b","c"]'::jsonb); ERROR: function unnest(jsonb) does not exist SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb); value ─── "a" "b" "c" (3 rows) Similar things happen with the other functions matching jsonb?_array_elements(_text)? These functions correctly identify JSON[B] things which are not, at their top level, arrays, and error out appropriately. What this hints to me is that json_array_elements() and friends have access to things that at least in theory UNNEST could have access to. Is making those things accessible to UNNEST, etc., a reasonable direction to go? Another option I came up with is to make functions that match jsonb?_array_elements(_text)?(_with_ordinality), but that seems somewhat tedious and error-prone on the maintenance side. What say? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 1:27 PM, Alvaro Herrera wrote: > This is a very good point. > > I think parallel maintenance commands are going to require different > tuning than different queries, and I'd rather have separate parameters > for those things rather than force the same parameter being changed over > and over depending on what is the session going to execute next. FWIW, as things stand my parallel CREATE INDEX patch does have a cost model which is pretty close to the one for parallel sequential scan. The cost model cares about max_parallel_degree. However, it also introduces a parallel_degree index storage parameter, which overrides the cost model to make it indicate the number of parallel workers to use (presumably, somebody will change master to make parallel_degree in terms of "participant processes" rather than worker processes soon, at which point I'll follow their lead). The storage parameter sticks around, of course, so a REINDEX will reuse it without asking. (I think CLUSTER should do the same with the index storage parameter.) What's novel about this is that for utility statements, you can override the cost model, and so disregard max_parallel_degree if you so choose. My guess is that DBAs will frequently want to do so. I'm not attached to any of this, but that's what I've come up with to date. Possibly the index storage parameter concept has baggage elsewhere, which comes up when we later go to parallelize index scans. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 4:12 PM, Robert Haas wrote: > On Tue, May 31, 2016 at 4:04 PM, Tom Lane wrote: > > Alvaro Herrera writes: > >> Robert Haas wrote: > >>> So I think in the long run we should have three limits: > >>> > >>> 1. Cluster-wide limit on number of worker processes for all purposes > >>> (currently, max_worker_processes). > >>> > >>> 2. Cluster-wide limit on number of worker processes for parallelism > >>> (don't have this yet). > >>> > >>> 3. Per-operation limit on number of worker processes for parallelism > >>> (currently, max_parallel_degree). > >>> > >>> Whatever we rename, there needs to be enough semantic space between #1 > >>> and #3 to allow for the possibility - I think the very likely > >>> possibility - that we will eventually also want #2. > > > >> max_background_workers sounds fine to me for #1, and I propose to add #2 > >> in 9.6 rather than wait. > > > > +1 to both points. > > > >> max_total_parallel_query_workers ? > > > > The name should be closely related to what we use for #3. I could go for > > max_total_parallel_workers for #2 and max_parallel_workers for #3. > > Or maybe max_parallel_workers_total? > > I just want to point out that if we change #1, we're breaking > postgresql.conf compatibility for, IMHO, not a whole lot of benefit. > I'd just leave it alone. > +1 We shall assume that the DBA has set the value of max_worker_processes according to hardware specifications without regard to what exactly could be parallelized. > > I would propose to call #2 max_parallel_workers and #3 > max_parallel_degree, which I think is clear as daylight, but since I > invented all of these names, I guess I'm biased. > So we have operation, session, and cluster scope yet no way to know which is which without memorizing the documentation. While we cannot enforce this I'd say any of these that are intended to be changed by the user should have functions published as their official API. The name of the function can be made to be user friendly. For all the others pick a name with something closer to the 64 character limit that is as expressive as possible so that seeing the name in postgresql.conf is all person really needs to see to know that they are changing the right thing. I say you expectations are off if you want to encode these differences by using workers and degree at the same time. Lets go for a part-time DBA vocabulary here. I get the merit of degree, and by itself it might even be OK, but in our usage degree is theory while workers are actual and I'd say people are likely to relate better to the later. David J.
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Tue, May 31, 2016 at 4:34 PM, David Fetter wrote: > Folks, > > While querying some JSONB blobs at work in preparation for a massive > rework of the data infrastructure, I ran into things that really > puzzled me, to wit: > > SELECT * FROM unnest('["a","b","c"]'::jsonb); > ERROR: function unnest(jsonb) does not exist > > SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb); > value > ─── > "a" > "b" > "c" > (3 rows) > > I'd be inclined to -1 such a proposal. TIMTOWTDI is not a principle that we endeavor to emulate. Having an overloaded form: is unappealing. While likely not that common the introduction of an ambiguity makes raises the bar considerably. That said we do seem to be lacking any easy way to take a json array and attempt to convert it directly into a PostgreSQL array. Just a conversion is not always going to succeed though the capability seems worthwhile if as yet unasked for. The each->convert->array_agg pattern works but is likely inefficient for homogeneous json array cases. David J.
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Tue, May 31, 2016 at 5:06 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tue, May 31, 2016 at 4:34 PM, David Fetter wrote: > >> Folks, >> >> While querying some JSONB blobs at work in preparation for a massive >> rework of the data infrastructure, I ran into things that really >> puzzled me, to wit: >> >> SELECT * FROM unnest('["a","b","c"]'::jsonb); >> ERROR: function unnest(jsonb) does not exist >> >> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb); >> value >> ─── >> "a" >> "b" >> "c" >> (3 rows) >> >> > I'd be inclined to -1 such a proposal. TIMTOWTDI is not a principle that > we endeavor to emulate. > > Having an overloaded form: is unappealing. > While likely not that common the introduction of an ambiguity makes raises > the bar considerably. > > That said we do seem to be lacking any easy way to take a json array and > attempt to convert it directly into a PostgreSQL array. Just a conversion > is not always going to succeed though the capability seems worthwhile if as > yet unasked for. The each->convert->array_agg pattern works but is likely > inefficient for homogeneous json array cases. > > David J. > If there is no list of people asking for that function, let me be the first. In the mean time, I've resigned myself to carting this around from db to db... create function jsonb_array_to_text_array(jsonb_arr jsonb) returns text[] language sql as $$ select array_agg(r) from jsonb_array_elements_text(jsonb_arr) r; $$;
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote: > On Tue, May 31, 2016 at 4:34 PM, David Fetter wrote: > > > Folks, > > > > While querying some JSONB blobs at work in preparation for a massive > > rework of the data infrastructure, I ran into things that really > > puzzled me, to wit: > > > > SELECT * FROM unnest('["a","b","c"]'::jsonb); > > ERROR: function unnest(jsonb) does not exist > > > > SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb); > > value > > ─── > > "a" > > "b" > > "c" > > (3 rows) > I'd be inclined to -1 such a proposal. TIMTOWTDI is not a principle that > we endeavor to emulate. You cut out the part where I introduced the part that's not equivalent, so "there is more than one way to do it" isn't on point here. More on that specific issue below. UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY than the json_array_elements-like functions do. Is it really more efficient to build and maintain those capabilities separately in the JSON[B] set-returning functions, or to force our end users to use atrocious hacks like putting generate_series(1,jsonb_array_length(foo)) in the target list than just to make UNNEST and ROWS FROM do the right thing? > Having an overloaded form: is unappealing. On what grounds? > While likely not that common the introduction of an ambiguity makes > raises the bar considerably. What ambiguity? SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb); ERROR: cannot extract elements from an object The json_array_elements family manages to do the right thing. Why would it be harder to make sure UNNEST and ROWS FROM() do so? > That said we do seem to be lacking any easy way to take a json array and > attempt to convert it directly into a PostgreSQL array. Just a conversion > is not always going to succeed though the capability seems worthwhile if as > yet unasked for. The each->convert->array_agg pattern works but is likely > inefficient for homogeneous json array cases. To your earlier point about "there is more than one way to do it," we have made no attempt to make some sort of language composed of orthonormal vectors, and the SQL standard actually requires that we not do so. For a trivial example, there's SELECT * FROM foo and TABLE foo which are equivalent and both spelled out in the standard. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Alvaro Herrera writes: > Robert Haas wrote: >> I just want to point out that if we change #1, we're breaking >> postgresql.conf compatibility for, IMHO, not a whole lot of benefit. >> I'd just leave it alone. > We can add the old name as a synonym in guc.c to maintain compatibility. I doubt this is much of an issue at this point; max_worker_processes has only been there a release or so, and surely there are very few people explicitly setting it, given its limited use-case up to now. It will be really hard to change it after 9.6, but I think we could still get away with that today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Tue, May 31, 2016 at 5:46 PM, David Fetter wrote: > On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote: > > On Tue, May 31, 2016 at 4:34 PM, David Fetter wrote: > > > > > Folks, > > > > > > While querying some JSONB blobs at work in preparation for a massive > > > rework of the data infrastructure, I ran into things that really > > > puzzled me, to wit: > > > > > > SELECT * FROM unnest('["a","b","c"]'::jsonb); > > > ERROR: function unnest(jsonb) does not exist > > > > > > > > SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb); > > > value > > > ─── > > > "a" > > > "b" > > > "c" > > > (3 rows) > > I'd be inclined to -1 such a proposal. TIMTOWTDI is not a principle > that > > we endeavor to emulate. > > You cut out the part where I introduced the part that's not > equivalent, so "there is more than one way to do it" isn't on point > here. More on that specific issue below. > > UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY > than the json_array_elements-like functions do. Is it really more > efficient to build and maintain those capabilities separately in the > JSON[B] set-returning functions, or to force our end users to use > atrocious hacks like putting > generate_series(1,jsonb_array_length(foo)) in the target list than > just to make UNNEST and ROWS FROM do the right thing? > > Both of these work in 9.5... SELECT * FROM ROWS FROM (jsonb_array_elements('["a","b","c"]'::jsonb)) WITH ORDINALITY value | ordinality -- "a" | 1 "b" | 2 "c" | 3 > > Having an overloaded form: is unappealing. > > On what grounds? > Aesthetic. YMAV (your mileage apparently varies). > > While likely not that common the introduction of an ambiguity makes > > raises the bar considerably. > > What ambiguity? > > SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb); > ERROR: cannot extract elements from an object > > I stand corrected. I was thinking you could somehow craft unnest('') but there is no way to auto-convert to "anyarray"... The json_array_elements family manages to do the right thing. Why > would it be harder to make sure UNNEST and ROWS FROM() do so? > I have apparently failed to understand your point. All I saw was that you wanted "unnest(jsonb)" to work in an identical fashion to "jsonb_array_elements(jsonb)". If there is some aspect beyond this being an aliasing situation then you have failed to communicate it such that I comprehended that fact. > > That said we do seem to be lacking any easy way to take a json array and > > attempt to convert it directly into a PostgreSQL array. Just a > conversion > > is not always going to succeed though the capability seems worthwhile if > as > > yet unasked for. The each->convert->array_agg pattern works but is > likely > > inefficient for homogeneous json array cases. > > To your earlier point about "there is more than one way to do it," we > have made no attempt to make some sort of language composed of > orthonormal vectors, and the SQL standard actually requires that we > not do so. For a trivial example, there's > > SELECT * FROM foo > and > TABLE foo > > which are equivalent and both spelled out in the standard. > > And would we have done so had we not been compelled to by the standard? Maybe it should be enshrined somewhere more obvious but Alvaro, and indirectly Tom, recently made the same claim[1] against TIMTOWTDI so I feel justified making such a claim on behalf of the project - as well as my general personal feeling. Adding user-friendly UI to the system, to correct deficiencies, does have merit, but that is not what I personally see here. David J. [1] https://www.postgresql.org/message-id/20160405164951.GA286879@alvherre.pgsql
Re: [HACKERS] JSON[B] arrays are second-class citizens
David Fetter writes: > On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote: >> While likely not that common the introduction of an ambiguity makes >> raises the bar considerably. > What ambiguity? My first thought about it was that select unnest('{1,2,3}'); would start failing. But it turns out it already does fail: ERROR: function unnest(unknown) is not unique You get that as a result of the recent introduction of unnest(tsvector), which we debated a few weeks ago and seem to have decided to leave as-is. But it failed before 9.6 too, with ERROR: could not determine polymorphic type because input has type "unknown" So at least in this particular case, adding unnest(jsonb) wouldn't be a problem from the standpoint of not being able to resolve calls that we could resolve before. Nonetheless, there *is* an ambiguity here, which is specific to json(b): what type of array are you expecting to get? The reason we have both json[b]_array_elements() and json[b]_array_elements_text() is that there are plausible use-cases for returning either json or plain text. It's not hard to imagine that somebody will want json[b]_array_elements_numeric() before long, too. If you want to have an unnest(jsonb) then you will need to make an arbitrary decision about which type it will return, and that doesn't seem like an especially great idea to me. > UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY > than the json_array_elements-like functions do. AFAICT, this is nonsense. We did not tie WITH ORDINALITY to UNNEST; it works for any set-returning function. regression=# select * from unnest(array[1,2,3]) with ordinality; unnest | ordinality + 1 | 1 2 | 2 3 | 3 (3 rows) regression=# select * from jsonb_array_elements('["a","b","c"]'::jsonb) with ordinality; value | ordinality ---+ "a" | 1 "b" | 2 "c" | 3 (3 rows) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
On 05/31/2016 06:47 PM, Tom Lane wrote: Given that, your original approach of manually updating proargtypes in the existing pg_proc row for the functions may be the best way. Anything else is going to be more complicated and ultimately will still require at least one direct catalog update. It is the least ugly of all the ugly solutions I could think of. I have attached a patch which fixes the signatures using this method. I use CREATE OR REPLACE FUNCTION to update to catcache. What do you think? Is it too ugly? Andreas gin-gist-signatures-v1.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON[B] arrays are second-class citizens
The idea of converting a JSONB array to a PG array is appealing and would potentially be more general-purpose than adding a new unnest. I'm not sure how feasible either suggestion is. I will say that I think the current state of affairs is gratuitously verbose and expects users to memorize a substantial number of long function names to perform simple tasks. -p On Tue, May 31, 2016 at 2:06 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tue, May 31, 2016 at 4:34 PM, David Fetter wrote: > >> Folks, >> >> While querying some JSONB blobs at work in preparation for a massive >> rework of the data infrastructure, I ran into things that really >> puzzled me, to wit: >> >> SELECT * FROM unnest('["a","b","c"]'::jsonb); >> ERROR: function unnest(jsonb) does not exist >> >> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb); >> value >> ─── >> "a" >> "b" >> "c" >> (3 rows) >> >> > I'd be inclined to -1 such a proposal. TIMTOWTDI is not a principle that > we endeavor to emulate. > > Having an overloaded form: is unappealing. > While likely not that common the introduction of an ambiguity makes raises > the bar considerably. > > That said we do seem to be lacking any easy way to take a json array and > attempt to convert it directly into a PostgreSQL array. Just a conversion > is not always going to succeed though the capability seems worthwhile if as > yet unasked for. The each->convert->array_agg pattern works but is likely > inefficient for homogeneous json array cases. > > David J. > -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt."—Kurt Vonnegut
Re: [HACKERS] COMMENT ON, psql and access methods
On Wed, Jun 1, 2016 at 4:52 AM, Robert Haas wrote: > On Fri, May 27, 2016 at 7:53 AM, Michael Paquier > wrote: >> As far as I can see, COMMENT ON has no support for access methods. >> Wouldn't we want to add it as it is created by a command? On top of >> that, perhaps we could have a backslash command in psql to list the >> supported access methods, like \dam[S]? The system methods would be in >> this case all the in-core ones. > > +1. Are there other opinions? That's not a 9.6 blocker IMO, so I could get patches out for 10.0 if needed. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout
> FWIW, I think the existing behavior is just fine. It corresponds to what > PQexec has always done with multi-statement query strings; that is, > statement_timeout governs the total time to execute the transaction (the > whole query string, unless you put transaction control commands in there). > In extended query mode, since you can only put one textual query per Parse > message, that maps to statement_timeout governing the total time from > initial Parse to Sync. Which is what it does. I've never thought about that. And I cannot imagine anyone is using that way in extended protocol to simulate multi-statement queries. Is that documented somewhere? > What you propose here is not a bug fix but a redefinition of what > statement_timeout does; and you've made it inconsistent with simple query > mode. I don't really think it's an improvement. >From the document about statement_timeout (config.sgml): Abort any statement that takes more than the specified number of milliseconds, starting from the time the command arrives at the server from the client. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. A value of zero (the default) turns this off. Apparently "starting from the time the command arrives at the server from the client" referrers to the timing of execute message arrives to server the in my example. And I think current behavior of our code is doing different from what he document advertises. Or at least counter intuitive to users. > BTW, aren't you missing a re-enable of the timeout for statements after > the first one? Will check. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] User demand, and idea, for C-code conversions from JSON arrays to PostgreSQL arrays
All, Oven in the "JSON[B] arrays are second-class citizens" thread [1] I made the observation that the only way to get a PostgreSQL array from a JSON array is via the "elements->cast->array_agg" chain. For JSON arrays that are homogeneous in nature the capability to go "directly" from JSON to json[], text[], bigint[], etc... seems like it would have some value. A couple, so far, of community members have chimed in. Maybe this ends up on a ToDo somewhere but for the moment I figured I'd float the idea on its own thread since the other one is really about a different (though somewhat related) topic. In the spirit of the json_populate_record like functions something with the following signature seems usable: as_array(anyarray, jsonb) : anyarray [2] so that actual calls look like: SELECT as_array(null::text[], '["a", "b", "c"]'::jsonb) For better or worse while every table gets a corresponding composite type explicitly created types cannot be treated as row types in this situation SELECT jsonb_populate_record(null::text[], '["a", "b", "c"]'::jsonb) > ERROR: first argument of jsonb_populate_record must be a row type Loosening the restriction and allowing jsonb_populate_record to fulfill the role of the above described "as_array" function would be something to consider, but likely not possible or worth any effort in trying to make happen. David J. [1] https://www.postgresql.org/message-id/CADkLM=fSC+otuBmzoJT6Riyksue3HpHgu2=Mofcv=fd0der...@mail.gmail.com [2] ROTQ[3] - whose idea was it to put the type first and the data second? [3] Rhetorical Off-Topic Question