Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
On 10/20/2014 11:02 PM, jes...@krogh.cc wrote: I do suspect the majority is from 30 concurrent processes updating an 506GB GIN index, but it would be nice to confirm that. There is also a message-queue in the DB with a fairly high turnaround. A 506GB GIN index? Uh, interesting :). What's it used for? Trigrams? It is for full-text-search, but it is being updated entirely regulary, ~100M records. A dump/restore cycle typically reduces the size to 30-40% of current size. Try 9.4 beta. The on-disk format of GIN indexes was rewritten in 9.4, making them a lot smaller. That might help with WAL volume too. Or not, but I'd love to hear what the impact is, in a real-life database :-). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reducing lock strength of adding foreign keys
Hi, I have been thinking about why we need to grab an AccessExclusiveLock on the table with the PK when we add a foreign key. Adding new tables with foreign keys to old ones is common so it would be really nice if the lock strength could be reduced. A comment in the code claims that we need to lock so no rows are deleted under us and that adding a trigger will lock in AccessExclusive anyway. But with MVCC catalog access and the removal of SnapshotNow I do not see why adding a trigger would require an exclusive lock. Just locking for data changes should be enough. Looking at the code the only see the duplicate name check use the fact that we have grabbed an exclusive lock and that should work anyway due to the unique constraint, but since I am pretty new to the code base I could be missing something obvious. I have attached a proof of concept patch which reduces the lock strength to ShareLock. What do you say? Am I missing something? My gut feel also says that if we know it is a RI trigger we are adding then AccessShare should be enough for the PK table, since we could rely on row locks to prevent rows from being deleted. It would be really nice though if this was possible since this would make it possible to add a new table with foreign keys and data without locking anything more than the referred rows. Andreas diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c new file mode 100644 index ecdff1e..49ad323 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** ATAddForeignKeyConstraint(AlteredTableIn *** 6016,6031 ListCell *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop); /* ! * Grab an exclusive lock on the pk table, so that someone doesn't delete ! * rows out from under us. (Although a lesser lock would do for that ! * purpose, we'll need exclusive lock anyway to add triggers to the pk ! * table; trying to start with a lesser lock will just create a risk of ! * deadlock.) */ if (OidIsValid(fkconstraint-old_pktable_oid)) ! pkrel = heap_open(fkconstraint-old_pktable_oid, AccessExclusiveLock); else ! pkrel = heap_openrv(fkconstraint-pktable, AccessExclusiveLock); /* * Validity checks (permission checks wait till we have the column --- 6016,6028 ListCell *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop); /* ! * Grab an share lock on the pk table, so that someone doesn't delete ! * rows out from under us. */ if (OidIsValid(fkconstraint-old_pktable_oid)) ! pkrel = heap_open(fkconstraint-old_pktable_oid, ShareLock); else ! pkrel = heap_openrv(fkconstraint-pktable, ShareLock); /* * Validity checks (permission checks wait till we have the column diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c new file mode 100644 index f4c0ffa..214755d *** a/src/backend/commands/trigger.c --- b/src/backend/commands/trigger.c *** CreateTrigger(CreateTrigStmt *stmt, cons *** 157,165 referenced; if (OidIsValid(relOid)) ! rel = heap_open(relOid, AccessExclusiveLock); else ! rel = heap_openrv(stmt-relation, AccessExclusiveLock); /* * Triggers must be on tables or views, and there are additional --- 157,165 referenced; if (OidIsValid(relOid)) ! rel = heap_open(relOid, ShareLock); else ! rel = heap_openrv(stmt-relation, ShareLock); /* * Triggers must be on tables or views, and there are additional *** CreateTrigger(CreateTrigStmt *stmt, cons *** 525,532 * can skip this for internally generated triggers, since the name * modification above should be sufficient. * ! * NOTE that this is cool only because we have AccessExclusiveLock on the ! * relation, so the trigger set won't be changing underneath us. */ if (!isInternal) { --- 525,531 * can skip this for internally generated triggers, since the name * modification above should be sufficient. * ! * NOTE that this is cool only because of the unique contraint. */ if (!isInternal) { *** RemoveTriggerById(Oid trigOid) *** 1102,1108 */ relid = ((Form_pg_trigger) GETSTRUCT(tup))-tgrelid; ! rel = heap_open(relid, AccessExclusiveLock); if (rel-rd_rel-relkind != RELKIND_RELATION rel-rd_rel-relkind != RELKIND_VIEW --- 1101,1107 */ relid = ((Form_pg_trigger) GETSTRUCT(tup))-tgrelid; ! rel = heap_open(relid, ShareLock); if (rel-rd_rel-relkind != RELKIND_RELATION rel-rd_rel-relkind != RELKIND_VIEW *** renametrig(RenameStmt *stmt) *** 1257,1263 * Look up name, check permissions, and acquire lock (which we will NOT * release until end of transaction). */ ! relid = RangeVarGetRelidExtended(stmt-relation, AccessExclusiveLock, false, false, RangeVarCallbackForRenameTrigger, NULL); --- 1256,1262
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: Today, I have verified all previous comments raised by me and looked at new code and below are my findings: 4. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; .. } Setting releaseOK in above context might not be required because if the control comes in this part of code, it will not retry to acquire another time. Hm. You're probably right. You have agreed to fix this comment, but it seems you have forgot to change it. 11. LWLockRelease() { .. PRINT_LWDEBUG(LWLockRelease, lock, mode); } Shouldn't this be in begining of LWLockRelease function rather than after processing held_lwlocks array? Ok. You have agreed to fix this comment, but it seems you have forgot to change it. Below comment doesn't seem to be adressed? LWLockAcquireOrWait() { .. LOG_LWDEBUG(LWLockAcquireOrWait, lockid, mode, succeeded); .. } a. such a log is not there in any other LWLock.. variants, if we want to introduce it, then shouldn't it be done at other places as well. Below point is yet to be resolved. 12. #ifdef LWLOCK_DEBUG lock-owner = MyProc; #endif Shouldn't it be reset in LWLockRelease? That's actually intentional. It's quite useful to know the last owner when debugging lwlock code. Won't it cause any problem if the last owner process exits? Can you explain how pg_read_barrier() in below code makes this access safe? LWLockWakeup() { .. + pg_read_barrier(); /* pairs with nwaiters-- */ + if (!BOOL_ACCESS_ONCE(lock-releaseOK)) .. } With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API
On Fri, Oct 10, 2014 at 10:08 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 09/17/2014 08:27 PM, Craig Ringer wrote: Hi all Attached is a patch to switch 9.5 over to using the GetSystemTimeAsFileTime call instead of separate GetSystemTime and SystemTimeToFileTime calls. Following on from my prior patch that switches to using GetSystemTimeAsFileTime, I now attach a two-patch series that also adds support for GetFileTimePreciseAsFileTime where it is available. Hi Craig, I was just having a quick look at this with the view of testing it on a windows 8 machine. Here's a couple of things I've noticed: + pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress( + GetModuleHandle(TEXT(kernel32.dll)), + GetSystemRimePreciseAsFileTime); Rime, not doubt is meant to be Time. Same here: + elog(DEBUG1, GetProcAddress(\GetSystemRimePreciseAsFileTime\) on kernel32.dll failed with error code %d not expected ERROR_PROC_NOT_FOUND(127), errcode); But I don't think you'll be able to elog quite that early. I tested by getting rid of the if (errcode != ERROR_PROC_NOT_FOUND) test and I get: D:\Postgres\install\binpostgres -D ..\data error occurred at src\port\gettimeofday.c:87 before error message processing is available Perhaps we needn't bother with this debug message? Either that you'll probably need to cache the error code and do something when the logger is initialised. Also, just for testing the resolution of the 2 functions, I added some code into PostgreSQL's gettimeofday() do{ (*pg_get_system_time)(ft2); }while ((file_time.dwLowDateTime - ft2.dwLowDateTime) == 0); #ifndef FRONTEND elog(NOTICE, %d, ft2.dwLowDateTime - file_time.dwLowDateTime); if (pg_get_system_time == GetSystemTimeAsFileTime) elog(NOTICE, GetSystemTimeAsFileTime); else elog(NOTICE, GetSystemTimePreciseAsFileTime); #endif return 0; I see: test=# select current_timestamp; NOTICE: 4 NOTICE: GetSystemTimePreciseAsFileTime Which indicates that this is quite a precise timer. Whereas if I add a quick hack into init_win32_gettimeofday() so that it always chooses GetSystemTimeAsFileTime() I see: test=# select current_timestamp; NOTICE: 9953 NOTICE: GetSystemTimeAsFileTime Also, I've attached gettimeofday.c, which is a small test programme which just makes 10 million calls to each function, in order to find out which one of the 3 method performs best. Here I just wanted to ensure that we'd not get any performance regression in gettimeofday() calls. Here's the output from running this on this windows 8.1 laptop. D:\cl /Ox gettimeofday.c Microsoft (R) C/C++ Optimizing Compiler Version 17.00.61030 for x64 Copyright (C) Microsoft Corporation. All rights reserved. gettimeofday.c Microsoft (R) Incremental Linker Version 11.00.61030.0 Copyright (C) Microsoft Corporation. All rights reserved. /out:gettimeofday.exe gettimeofday.obj D:\gettimeofday.exe GetSystemTimePreciseAsFileTime() in 0.157480 seconds GetSystemTimeAsFileTime() in 0.060075 seconds Current Method in 0.742677 seconds Regards David Rowley #include stdio.h #include windows.h #define LOOPS 1000 int main(void) { LARGE_INTEGER start, end, frequency; FILETIME ft; SYSTEMTIME system_time; int x; if (!QueryPerformanceFrequency(frequency)) exit(-1); QueryPerformanceCounter(start); for (x = 0; x LOOPS; x++) { GetSystemTimePreciseAsFileTime(ft); } QueryPerformanceCounter(end); printf(GetSystemTimePreciseAsFileTime() in %f seconds\n, (end.QuadPart - start.QuadPart) / (double)frequency.QuadPart); QueryPerformanceCounter(start); for (x = 0; x LOOPS; x++) { GetSystemTimeAsFileTime(ft); } QueryPerformanceCounter(end); printf(GetSystemTimeAsFileTime() in %f seconds\n, (end.QuadPart - start.QuadPart) / (double)frequency.QuadPart); QueryPerformanceCounter(start); for (x = 0; x LOOPS; x++) { GetSystemTime(system_time); SystemTimeToFileTime(system_time, ft); } QueryPerformanceCounter(end); printf(Current Method in %f seconds\n, (end.QuadPart - start.QuadPart) / (double)frequency.QuadPart); return 0; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API
On 10/22/2014 04:12 PM, David Rowley wrote: I was just having a quick look at this with the view of testing it on a windows 8 machine. Thankyou. I really appreciate your taking the time to do this, as one of the barriers to getting Windows-specific patches accepted is that usually people just don't want to review Windows patches. I see you've already linked it on the commitfest too, so thanks again. I'll follow up with a fixed patch and my test code shortly. I'm at PGEU in Madrid so things are a bit chaotic, but I can make some time. +pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress( +GetModuleHandle(TEXT(kernel32.dll)), +GetSystemRimePreciseAsFileTime); Rime, not doubt is meant to be Time. Hm. I must've somehow managed to attach and post the earlier version of the patch before I fixed a few issues and tested it, because I've compiled and tested this feature on Win2k12. Apparently just not the version I published. Thanks for catching that. I'll fix it up. +elog(DEBUG1, GetProcAddress(\GetSystemRimePreciseAsFileTime\) on kernel32.dll failed with error code %d not expected ERROR_PROC_NOT_FOUND(127), errcode); But I don't think you'll be able to elog quite that early. I tested by getting rid of the if (errcode != ERROR_PROC_NOT_FOUND) test and I get: D:\Postgres\install\binpostgres -D ..\data error occurred at src\port\gettimeofday.c:87 before error message processing is available Thankyou. I didn't consider that logging wouldn't be available there. This case shouldn't really happen. Perhaps we needn't bother with this debug message? Either that you'll probably need to cache the error code and do something when the logger is initialised. In a shouldn't happen case like this I think it'll be OK to just print to stderr. I see: test=# select current_timestamp; NOTICE: 4 NOTICE: GetSystemTimePreciseAsFileTime Which indicates that this is quite a precise timer. Great. Because I was testing on AWS I wasn't getting results that fine, but the kind of granularity you're seeing is consistent with what I get on my Linux laptop. Whereas if I add a quick hack into init_win32_gettimeofday() so that it always chooses GetSystemTimeAsFileTime() I see: test=# select current_timestamp; NOTICE: 9953 NOTICE: GetSystemTimeAsFileTime I'll publish the test code I was using too. I was doing it from SQL level with no code changes other than the ones required for timestamp precision. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about RI checks
Florian Pflug f...@phlo.org wrote: So in conclusion, the lock avoids raising constraint violation errors in a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some constraint violation errors into serialization failures. Or at least that's how it looks to me. I go the REPEATABLE READ case wrong there, and that case is actually broken!. I initially assumed that trying to lock a row whose latest version is invisible a transaction in mode REPEATABLE READ or above will result in a serialization error. But for the queries run by ri_triggers.c, this is not the case. AFAICS, the only executor node which takes the crosscheck snapshot into account is nodeModifyTable. So both a plain SELECT and a SELECT FOR [KEY] SHARE | UPDATE will simply run with the snapshot passed to the SPI, which for the query in question is always a *current* snapshot (we pass the original snapshot as the crosscheck snapshot in REPEATABLE READ). Thus, if there's a child row that is visible to the transaction deleting the parent row, but that child was meanwhile removed, it seems that we currently *won't* complain. The same holds, of course for mode SERIALIZABLE -- we don't distinguish the two in ri_triggers.c. But that's wrong. The transaction's snapshot *would* see that row, so we ought to raise an error. Note that this applies also to mode SERIALIZABLE, and breaks true serializability in some cases, since we don't do conflict detection for RI enforcement queries. Here's a test case, involving two transaction A and B. I tried this on REL9_4_STABLE. Setup - CREATE TABLE parent (id int NOT NULL PRIMARY KEY); CREATE TABLE child (id int NOT NULL PRIMARY KEY, parent_id int NOT NULL REFERENCES parent (id)); INSERT INTO parent (id) VALUES (1); INSERT INTO child (id, parent_id) VALUES (1, 1); Failure Case A:: set default_transaction_isolation='serializable'; A:: begin; A:: select * from child; - id | parent_id +--- 1 | 1 B:: set default_transaction_isolation='serializable'; B:: begin; B:: delete from child; - DELETE 1 B:: commit; A:: delete from parent; - DELETE 1 A:: commit; A can neither come before B in any serializable history (since the DELETE should fail then), nor after it (since it shouldn't see the row in the child table then). Yet we don't complain, even though both transaction run in SERIALIZABLE mode. On Oct21, 2014, at 19:27 , Nick Barnes nickbarne...@gmail.com wrote: On Wed, Oct 22, 2014 at 3:19 AM, Kevin Grittner kgri...@ymail.com wrote: It doesn't seem like this analysis considers all of the available ON DELETE and ON UPDATE behaviors available. Besides RESTRICT there is CASCADE, SET NULL, SET DEFAULT, and NO ACTION. Some of those require updating the referencing rows. I think the logic in question is specific to RESTRICT and NO ACTION. The other cases don't look like they need to explicitly lock anything; the UPDATE / DELETE itself should take care of that. Exactly. We run the dubious SELECT FROM fkrel ... FOR KEY SHARE query *only* for RESTRICT and NO ACTION. I've traced through the revisions of ri_triggers.c a bit. Locking the conflicting child rows in the parent table's DELETE and UPDATE triggers was added in commit 0882951b0cdb4c6686e57121d56216cb2044f7eb which is dated 1999-12-08. (This was before we even has row-level SHARE locks, so the code did a SELECT .. FOR UPDATE back then). This is also the commit which added FOR UPDATE locking to RI_FKey_check, i.e. which ensured that parent rows are locked when adding new child rows. It's commit message unfortunately doesn't offer much in terms of explanations - it just says Fixed concurrent visibility bug. When SHARE locks where implemented in commit bedb78d386a47fd66b6cda2040e0a5fb545ee371, dating from 2005-04-28, it seems that FOR UPDATE was simply replaced by FOR SHARE throughout ri_triggers.c. The same seems to have happened when KEY SHARE locks where introduced on 2013-01-13 with commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Simplify EXISTS subqueries containing LIMIT
On Tue, Oct 21, 2014 at 11:00 PM, Marti Raudsepp ma...@juffo.org wrote: On Sun, Oct 19, 2014 at 1:22 PM, David Rowley dgrowle...@gmail.com wrote: the argument for this would have been much stronger if anti join support had just been added last week. It's been quite a few years now and the argument for this must be getting weaker with every release. I see your point, but I would put it another way: we've had this for a few years, but people haven't learned and are *still* using LIMIT 1. I've had a bit of a look at this and here's a couple of things: /* +* LIMIT clause can be removed if it's a positive constant or ALL, to +* prevent it from being an optimization barrier. It's a common meme to put +* LIMIT 1 within EXISTS subqueries. +*/ I think this comment may be better explained along the lines of: A subquery which has a LIMIT clause with a positive value is effectively a no-op in this scenario. With EXISTS we only care about the first row anyway, so any positive limit value will have no behavioral change to the query, so we'll simply remove the LIMIT clause. If we're unable to prove that the LIMIT value is a positive number then we'd better not touch it. + /* Checking for negative values is done later; 0 is just silly */ + if (! limit-constisnull DatumGetInt64(limit-constvalue) = 0) I'm a bit confused by the comment here. You're saying that we'll check for negatives later... but you're checking for = 0 on the next line... Did I miss something or is this a mistake? This test: +select * from int4_tbl o where exists (select 1 limit 0); + f1 + +(0 rows) I guess here you're just testing to ensure that you've not broken this and that the new code does not accidentally remove the LIMIT clause. I think it also might be worth adding some tests to ensure that co-related EXISTS clauses with a positive LIMIT value get properly changed into a SEMI JOIN instead of remaining as a subplan. And also make sure that a LIMIT 0 or LIMIT -1 gets left as a subplan. I'd say such a test would supersede the above test, and I think it's also the case you were talking about optimising anyway? You can use EXPLAIN (COSTS OFF) to get a stable explain output. Regards David Rowley
[HACKERS] btree_gin and ranges
Suggested patch adds GIN support contains operator for ranges over scalar column. It allows more effective GIN scan. Currently, queries like SELECT * FROM test_int4 WHERE i = 1 and i = 1 will be excuted by GIN with two scans: one is from mines infinity to 1 and another is from -1 to plus infinity. That's because GIN is generalized and it doesn't know a semantics of operation. With patch it's possible to rewrite query with ranges SELECT * FROM test_int4 WHERE i @ '[-1, 1]'::int4range and GIN index will support this query with single scan from -1 to 1. Patch provides index support only for existing range types: int4, int8, numeric, date and timestamp with and without time zone. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ btree_gin_range-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] btree_gin and ranges
Hi On Wed, Oct 22, 2014 at 1:55 PM, Teodor Sigaev teo...@sigaev.ru wrote: With patch it's possible to rewrite query with ranges SELECT * FROM test_int4 WHERE i @ '[-1, 1]'::int4range and GIN index will support this query with single scan from -1 to 1. Shouldn't this be implemented in a more generic manner? An ordinary btree index could very well support @ queries too, but your patch only adds this capability to btree-gin. The documentation describes btree-gin as providing GIN operator classes that implement B-tree equivalent behavior, but now the behavior diverges. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] speedup tidbitmap patch: cache page
In specific workload postgres could spend a lot of time in tbm_add_tuples, up to 50% of query time. hash_search call is expensive and called twice for each ItemPointer to insert. Suggested patch tries to cache last PagetableEntry pointer in hope that next ItemPointer will be on the same relation page. Patch is rather trivial and I don't believe that it could cause performance degradation. But it could increase performance on some workload. An example: # create extension btree_gin; # select (v / 10)::int4 as i into t from generate_series(1, 500) as v; # create index idx on t using gin (i); # set enable_seqscan = off; # explain analyze select * from t where i = 0; without patch: Execution time: 2427.692 ms with patch:Execution time: 2058.718 ms # explain analyze select * from t where i = 100 and i= 100; without patch: Execution time: 524.441 ms with patch:Execution time: 195.381 ms -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ tbm_cachepage-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On 10/20/2014 09:26 AM, Michael Paquier wrote: On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao masao.fu...@gmail.com wrote: In this case, the patch seems to make the restartpoint recycle even WAL files which have .ready files and will have to be archived later. Thought? The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done. I have spent more time on that, with a bit more of underground work... First, the problem can be reproduced most of the time by running this simple command: psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate This will enforce a segment file switch and restart the master in crash recovery. This has as effect to immediately cut the WAL stream on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0. For example, let's imagine that stream fails when switching from 00010003 to the next segment, then the the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for example 0/310, and walrcv-latestWalEnd is 0/400. When stream restarts it will begin once again from 0/400, ignoring that 00010003 should be marked as .done, ultimately marking it in .ready state when old segment files are recycled or removed. There is nothing that can really be done to enforce the creation of a .done file before the FATAL of libpqrcv_receive because we cannot predict the stream failure.. Now, we can do better than what we have now by looking at WAL start position used when starting streaming in WAL receiver and enforce .done if the start position is the last one of previous segment. Hence, in the case of start position 0/400 that was found previously, the file that will be enforced to .done is 00010003. I have written the patch attached that implements this idea and fixes the problem. Now let's see if you guys see any flaws in this simple logic which uses a sniper gun instead of a bazooka as in the previous patches sent. Hmm. This will still miss the .done file if you don't re-establish the streaming replication connection after the restart. For example, if you shut down the master, and promote the standby server. I think we should take a more wholesale approach to this. We should enforce the rule that the server only ever archives WAL files belonging to the same timeline that the server generates. IOW, the server only archives the WAL that it has generated. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] btree_gin and ranges
Shouldn't this be implemented in a more generic manner? An ordinary Unfortunately I don't see a way for that. GIN is generalized - and it doesn't know semantic. Actually, we could fix first five strategy numbers for BTree's strategies, and then we could teach GIN core to use BTRee semantic, but what about with already existed operator classes? The documentation describes btree-gin as providing GIN operator classes that implement B-tree equivalent behavior, but now the behavior diverges. Anyway GIN couldn't be used for ORDER BY clause. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API
Here's an updated patch addressing David's points. I haven't had a chance to test it yet, on win2k8 or win2k12 due to pgconf.eu . -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From d5cd8c82bdc3caa9d18586eac8e4e083a2729d61 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Thu, 18 Sep 2014 23:02:14 +0800 Subject: [PATCH 2/2] Use GetSystemTimePreciseAsFileTime when available This will cause PostgreSQL on Windows 8 or Windows Server 2012 to obtain high-resolution timestamps while allowing the same binaries to run without problems on older releases. --- src/backend/main/main.c | 6 + src/include/port.h | 2 ++ src/port/gettimeofday.c | 59 +++-- 3 files changed, 65 insertions(+), 2 deletions(-) diff --git a/src/backend/main/main.c b/src/backend/main/main.c index c51b391..73c30c5 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -260,6 +260,12 @@ startup_hacks(const char *progname) /* In case of general protection fault, don't show GUI popup box */ SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); + +#ifndef HAVE_GETTIMEOFDAY + /* Figure out which syscall to use to capture timestamp information */ + init_win32_gettimeofday(); +#endif + } #endif /* WIN32 */ diff --git a/src/include/port.h b/src/include/port.h index 9f8465e..4f8af0a 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -328,6 +328,8 @@ extern FILE *pgwin32_popen(const char *command, const char *type); #ifndef HAVE_GETTIMEOFDAY /* Last parameter not used */ extern int gettimeofday(struct timeval * tp, struct timezone * tzp); +/* On windows we need to call some backend start setup for accurate timing */ +extern void init_win32_gettimeofday(void); #endif #else /* !WIN32 */ diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c index 73ec406..b8871d9 100644 --- a/src/port/gettimeofday.c +++ b/src/port/gettimeofday.c @@ -30,14 +30,69 @@ #include sys/time.h +#ifndef FRONTEND +#include utils/elog.h +#endif + /* FILETIME of Jan 1 1970 00:00:00. */ static const unsigned __int64 epoch = UINT64CONST(1164447360); /* + * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a + * signature, so we can just store a pointer to whichever we find. This + * is the pointer's type. + */ +typedef VOID (WINAPI *PgGetSystemTimeFn)(LPFILETIME); +/* Storage for the function we pick at runtime */ +static PgGetSystemTimeFn pg_get_system_time = NULL; + +/* + * During backend startup, determine if GetSystemTimePreciseAsFileTime is + * available and use it; if not, fall back to GetSystemTimeAsFileTime. + */ +void +init_win32_gettimeofday(void) +{ + /* + * Because it's guaranteed that kernel32.dll will be linked into our + * address space already, we don't need to LoadLibrary it and worry about + * closing it afterwards, so we're not using Pg's dlopen/dlsym() wrapper. + * + * We'll just look up the address of GetSystemTimePreciseAsFileTime if + * present. + * + * While we could look up the Windows version and skip this on Windows + * versions below Windows 8 / Windows Server 2012 there isn't much point, + * and determining the windows version is its self somewhat Windows version + * and development SDK specific... + */ + pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress( + GetModuleHandle(TEXT(kernel32.dll)), +GetSystemTimePreciseAsFileTime); + if (pg_get_system_time == NULL) + { + /* + * The expected error is ERROR_PROC_NOT_FOUND, if the function isn't + * present. No other error should occur. + * + * It's too early in startup to elog(...) if we get some unexpected + * error, and not serious enough to warrant a fprintf to stderr about + * it or save the error and report it later. So silently fall back + * to GetSystemTimeAsFileTime unless we're on a cassert build. + */ + DWORD errcode = GetLastError(); + Assert(errcode == ERROR_PROC_NOT_FOUND); + + pg_get_system_time = GetSystemTimeAsFileTime; + } + +} + +/* * timezone information is stored outside the kernel so tzp isn't used anymore. * - * Note: this function is not for Win32 high precision timing purpose. See + * Note: this function is not for Win32 high precision timing purposes. See * elapsed_time(). */ int @@ -46,7 +101,7 @@ gettimeofday(struct timeval * tp, struct timezone * tzp) FILETIME file_time; ULARGE_INTEGER ularge; - GetSystemTimeAsFileTime(file_time); + (*pg_get_system_time)(file_time); ularge.LowPart = file_time.dwLowDateTime; ularge.HighPart = file_time.dwHighDateTime; -- 1.9.3 From 810abe48ca9bc6dfa0c4422052ea7817e0fb256a Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Fri, 12 Sep 2014 12:41:35 +0800 Subject: [PATCH 1/2] Use GetSystemTimeAsFileTime directly in windows gettimeofday MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8
Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c
On 10/19/2014 11:29 AM, Peter Geoghegan wrote: On Fri, Oct 10, 2014 at 12:33 AM, Peter Geoghegan p...@heroku.com wrote: Attached Have you looked at this? Committed now, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Tom Lane t...@sss.pgh.pa.us writes: This looks to me like re-fighting the last war. Such a GUC has zero value *unless* some situation exactly like the POODLE bug comes up again, and the odds of that are not high. Many people would have said the exact same thing before POODLE, and BEAST, and CRIME, and Heartbleed. You never know what sort of bugs or weaknesses will show up or when; all you know is that there are a lot of people working very hard to find these things and exploit them, and that they *will* succeeded, again and again and again. You can gamble that PostgreSQL will not be vulnerable due to specific details of its protocol or how it uses TLS, but that's a gamble which you will eventually lose. Moreover, the GUC could easily be misused to decrease rather than increase one's security, if it's carelessly set. That's the user's responsibility. DES -- Dag-Erling Smørgrav - d...@des.no -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Magnus Hagander mag...@hagander.net writes: If anything, I think the default should be default, and then we have that map out to something. Because once you've initdb'ed, the config file wil be stuck with a default and we can't change that in a minor release *if* something like POODLE shows up again. Actually, I had that in an earlier version of the patch, but I thought it was too obscure / circular. I can easily re-add it. In a case like POODLE we probably wouldn't have done it anyway, as it doesn't affect our connections (we're not http) If I understand correctly, imaps has been shown to be vulnerable as well, so I wouldn't be so sure. Having the guc could certainly be useful in some cases. If we do, we should of course *also* have a corresponding configuration option in libpq, so I'd say this patch is incomplete if we do want to do it. I can update the patch to include the client side. DES -- Dag-Erling Smørgrav - d...@des.no -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] speedup tidbitmap patch: hash BlockNumber
Just replace tag_hash in tidbitmap which uses hash_any to direct call of hash_uint32, it saves ~5% of execution time. An example: # create extension btree_gin; # select (v / 10)::int4 as i into t from generate_series(1, 500) as v; # create index idx on t using gin (i); # set enable_seqscan = off; # explain analyze select * from t where i = 0; without patch: Execution time: 2427.692 ms with patch:Execution time: 2319.376 ms # explain analyze select * from t where i = 100 and i= 100; without patch: Execution time: 524.441 ms with patch:Execution time: 504.478 ms -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ tbm_blocknumber-2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Tom Lane t...@sss.pgh.pa.us writes: And in the end, if we set values like this from PG --- whether hard-wired or via a GUC --- the SSL library people will have exactly the same perspective with regards to *our* values. And not without reason; we were forcing very obsolete settings up till recently, because nobody had looked at the issue for a decade. I see no reason to expect that that history won't repeat itself. I'm not sure what you're saying here, but - I'm not sure how familiar you are with the OpenSSL API, but it's insecure by default. There is *no other choice* for an application than to explicitly select which protocols it wants to use (or at least which protocols it wants to avoid). And you can't change OpenSSL, because a ton of old crappy software is going to break. DES -- Dag-Erling Smørgrav - d...@des.no -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Fri, Oct 17, 2014 at 2:23 PM, Fujii Masao masao.fu...@gmail.com wrote: I found one problem in the 0002 patch. The patch changes the recovery so that it creates .done files for every WAL files which exist in pg_xlog directory at the end of recovery. But even WAL files which will have to be archived later can exist in pg_xlog at that moment. For example, the latest, recycled and fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files. The patch wrongly prevents them from being archived at all. Re-looking at patch 2, yes you are right. Even if it was mentioned that we should do that for a node that had promotion triggered it was not done this way as a check on CheckForStandbyTrigger() is actually missing. Attached is an updated patch. ISTM that the 0001 patch has the similar problem. Please imagine the following scenario. In this case, the patch seems to make the restartpoint recycle even WAL files which have .ready files and will have to be archived later. Thought? Right, that's really backward. This was the approach taken before c9cc7e0, and this commit actually prevents removal of unarchived WAL files during recovery. -- Michael From fdf1f4a46a2f870ee2aeda11f909c465ed12450a Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Wed, 22 Oct 2014 15:18:33 +0200 Subject: [PATCH] Enforce all WAL segment files to be marked as .done at node promotion This is a safety mechanism to ensure that there are no files that are not considered as .done as some segments may have been missed particularly in the case of partially written files or files being written when a disconnection occurred between a streaming standby and its root node. This makes the node reaching promotion having a state consistent with what is expected using the assumption that all the WAL segment files that are done being streamed should be always considered as archived by the node. --- src/backend/access/transam/xlog.c| 10 src/backend/access/transam/xlogarchive.c | 39 +++- src/include/access/xlog_internal.h | 1 + 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 235b442..7179003 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6862,6 +6862,16 @@ StartupXLOG(void) } /* + * Create a .done entry for each WAL file present in pg_xlog that has + * not been yet marked as such for a node that has been promoted. In some + * cases where for example a streaming replica has had a connection to a + * remote node cut abruptly, such WAL files may have been only partially + * written or even not flagged correctly with .done. + */ + if (InRecovery CheckForStandbyTrigger()) + XLogArchiveForceDoneAll(); + + /* * Kill WAL receiver, if it's still running, before we continue to write * the startup checkpoint record. It will trump over the checkpoint and * subsequent records if it's still alive when we start writing WAL. diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 047efa2..931106f 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -554,6 +554,40 @@ XLogArchiveNotifySeg(XLogSegNo segno) } /* + * XLogArchiveForceDoneAll + * + * Wrapper of XLogArchiveForceDone scanning all the XLOG segments files in + * XLOGDIR, switching them forcibly to XLOG.done. + */ +void +XLogArchiveForceDoneAll(void) +{ + DIR *xldir; + struct dirent *xlde; + + + xldir = AllocateDir(XLOGDIR); + if (xldir == NULL) + ereport(ERROR, +(errcode_for_file_access(), + errmsg(could not open transaction log directory \%s\: %m, + XLOGDIR))); + + /* + * Scan all the WAL segments present in the archives and switch them to + * .done. + */ + while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL) + { + if (strlen(xlde-d_name) == 24 + strspn(xlde-d_name, 0123456789ABCDEF) == 24) + XLogArchiveForceDone(xlde-d_name); + } + + FreeDir(xldir); +} + +/* * XLogArchiveForceDone * * Emit notification forcibly that an XLOG segment file has been successfully @@ -582,7 +616,6 @@ XLogArchiveForceDone(const char *xlog) (errcode_for_file_access(), errmsg(could not rename file \%s\ to \%s\: %m, archiveReady, archiveDone))); - return; } @@ -604,6 +637,10 @@ XLogArchiveForceDone(const char *xlog) archiveDone))); return; } + + ereport(DEBUG2, + (errmsg(archive status file of \%s\ has been forcibly switched from \%s\ to \%s\, + xlog, archiveReady, archiveDone))); } /* diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index 27b9899..04a0de3 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -286,6 +286,7 @@ extern void ExecuteRecoveryCommand(char *command, char *commandName, extern void
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Tom Lane t...@sss.pgh.pa.us writes: As far as protocol version goes, I think our existing coding basically says prefer newest available version, but at least TLS 1.0. I think that's probably a reasonable approach. The client side forces TLS 1.0: SSL_context = SSL_CTX_new(TLSv1_method()); In typical OpenSSL fashion, this does *not* mean 1.0 or higher. It means 1.0 exactly. If the patch exposed a GUC that set a minimum version, rather than calling out specific acceptable protocols, it might be less risky. Not necessarily. Someone might find a weakness in TLS 1.1 which is not present in 1.0 because it involves a specific algorithm or mode that 1.0 does not support. DES -- Dag-Erling Smørgrav - d...@des.no -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Magnus Hagander mag...@hagander.net writes: Yes, it does that. Though it only does it on 9.4,but with the facts we know now, what 9.4+ does is perfectly safe. Agreed. DES -- Dag-Erling Smørgrav - d...@des.no -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On 10/17/2014 01:59 PM, Simon Riggs wrote: On 17 October 2014 09:55, furu...@pm.nttdata.co.jp wrote: A new parameter to send feedback should be called --feedback A second parameter to decide whether to fsync should be called --fsync I think keep using --reply-fsync and --fsync-interval is better than make new options. Thought? We already have hot_standby_feedback, so using the name feedback is best idea. I am suggesting that we send feedback even if we do not fsync, to allow the master to track our progress. Hence the name of the second parameter was just fsync. So both names were suggested because of links to those terms already being used for similar reasons elsewhere in Postgres. We seem to be going in circles. You suggested having two options, --feedback, and --fsync, which is almost exactly what Furuya posted originally. I objected to that, because I think that user interface is too complicated. Instead, I suggested having just a single option called --synchronous, or even better, have no option at all and have the server tell the client if it's participating in synchronous replication, and have pg_receivexlog automatically fsync when it is, and not otherwise [1]. That way you don't need to expose any new options to the user. What did you think of that idea? [1] http://www.postgresql.org/message-id/5434e0ef.9050...@vmware.com - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
On Wed, Oct 22, 2014 at 3:12 PM, Dag-Erling Smørgrav d...@des.no wrote: Tom Lane t...@sss.pgh.pa.us writes: This looks to me like re-fighting the last war. Such a GUC has zero value *unless* some situation exactly like the POODLE bug comes up again, and the odds of that are not high. Many people would have said the exact same thing before POODLE, and BEAST, and CRIME, and Heartbleed. You never know what sort of bugs or weaknesses will show up or when; all you know is that there are a lot of people working very hard to find these things and exploit them, and that they *will* succeeded, again and again and again. You can gamble that PostgreSQL will not be vulnerable due to specific details of its protocol or how it uses TLS, but that's a gamble which you will eventually lose. There are some companies, without naming them, that have increased the resources dedicated to analyze existing security protocols and libraries, so even if the chances are low, IMO the occurence that similar problems show up are getting to increase wit the time. Moreover, the GUC could easily be misused to decrease rather than increase one's security, if it's carelessly set. That's the user's responsibility. I actually just had a user knocking about having a way to control the protocols used by server... So, changing my opinion on the matter, that would be nice to have even such a parameter on back-branches, with 'default' to let the server decide which one is better. -- Michael
[HACKERS] compress method for spgist
When we developed SP-GiST we missed analogue of GiST's compress method. There was two reasons for that: lack of imagination to imagine case with different types of indexed value and column, and we didn't want call some method while index fit in one page. Some discussion on that http://www.postgresql.org/message-id/27542.1323534...@sss.pgh.pa.us http://www.postgresql.org/message-id/4ee77ea1.6030...@sigaev.ru But we was wrong: PostGIS guys found an example: polygon indexing with storing just a bounding box. Actually, I don't know index structure for boxes suitable for SP-GiST but I'm not a geometer. They are. Attached patch provides support of optional compress method for SP-GiST. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ spgist_compress_method-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-22 13:32:07 +0530, Amit Kapila wrote: On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: Today, I have verified all previous comments raised by me and looked at new code and below are my findings: 4. LWLockAcquireCommon() { .. if (!LWLockDequeueSelf(l)) { for (;;) { PGSemaphoreLock(proc-sem, false); if (!proc-lwWaiting) break; extraWaits++; } lock-releaseOK = true; .. } Setting releaseOK in above context might not be required because if the control comes in this part of code, it will not retry to acquire another time. Hm. You're probably right. You have agreed to fix this comment, but it seems you have forgot to change it. After I've thought more about it, it's is actually required. This essentially *is* a retry. Someobdy woke us up, which is where releaseOK is supposed to be set. 11. LWLockRelease() { .. PRINT_LWDEBUG(LWLockRelease, lock, mode); } Shouldn't this be in begining of LWLockRelease function rather than after processing held_lwlocks array? Ok. You have agreed to fix this comment, but it seems you have forgot to change it. Below comment doesn't seem to be adressed? LWLockAcquireOrWait() { .. LOG_LWDEBUG(LWLockAcquireOrWait, lockid, mode, succeeded); .. } a. such a log is not there in any other LWLock.. variants, if we want to introduce it, then shouldn't it be done at other places as well. I think you're placing unneccessarily high consistency constraints on a debugging feature here. Below point is yet to be resolved. 12. #ifdef LWLOCK_DEBUG lock-owner = MyProc; #endif Shouldn't it be reset in LWLockRelease? That's actually intentional. It's quite useful to know the last owner when debugging lwlock code. Won't it cause any problem if the last owner process exits? No. PGPROCs aren't deallocated or anything. And it's a debugging only variable. Can you explain how pg_read_barrier() in below code makes this access safe? LWLockWakeup() { .. + pg_read_barrier(); /* pairs with nwaiters-- */ + if (!BOOL_ACCESS_ONCE(lock-releaseOK)) .. } What's the concern you have? Full memory barriers (the atomic nwaiters--) pair with read memory barriers. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On 22 October 2014 14:26, Heikki Linnakangas hlinnakan...@vmware.com wrote: We seem to be going in circles. You suggested having two options, --feedback, and --fsync, which is almost exactly what Furuya posted originally. I objected to that, because I think that user interface is too complicated. Instead, I suggested having just a single option called --synchronous, or even better, have no option at all and have the server tell the client if it's participating in synchronous replication, and have pg_receivexlog automatically fsync when it is, and not otherwise [1]. That way you don't need to expose any new options to the user. What did you think of that idea? Sorry, if we're going in circles. Yes, I like the idea. The master setting of synchronous_standby_names defines which standbys/rep clients will have their feedback used to release waiting COMMITs. That uses application_name, which is set at the replication client connection. (That has a default value, but lets assume the user sets this). So when a replication client connects, the WALSender knows its priority, as defined in sync_standby_names. I guess we can have WALSender send a protocol message to the client every time the sync priority changes (as a result of a changed value of sync_standby_names). Which is the function SyncRepInitConfig() -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function array_agg(array)
Hi 2014-10-19 8:02 GMT+02:00 Ali Akbar the.ap...@gmail.com: So, is there any idea how we will handle NULL and empty array in array_agg(anyarray)? I propose we just reject those input because the output will make no sense: - array_agg(NULL::int[]) -- the result will be indistinguished from array_agg of NULL ints. - array_agg('{}'::int[]) -- how we determine the dimension of the result? is it 0? Or the result will be just an empty array {} ? This updated patch rejects NULL and {} arrays as noted above. I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); Regards Pavel Regards, -- Ali Akbar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing lock strength of adding foreign keys
Andreas Karlsson andr...@proxel.se writes: I have attached a proof of concept patch which reduces the lock strength to ShareLock. You're kidding, right? ShareLock isn't even self-exclusive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-21 19:56:05 +0530, Amit Kapila wrote: On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: 2. LWLockWakeup() { .. #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif .. } Earlier while releasing lock, we don't count it towards LWLock stats spin_delay_count. I think if we see other places in lwlock.c, it only gets counted when we try to acquire it in a loop. I think the previous situation was clearly suboptimal. I've now modified things so all spinlock acquirations are counted. Code has mainly 4 stats (sh_acquire_count, ex_acquire_count, block_count, spin_delay_count) to track, if I try to see all stats together to understand the contention situation, the unpatched code makes sense. I don't think it does. It completely disregards that the contention may actually be in LWLockRelease(). That contributes to to spinlock contention just as much as LWLockAcquire(). spin_delay_count gives how much delay has happened to acquire spinlock which when combined with other stats gives the clear situation about the contention around aquisation of corresponding LWLock. Now if we want to count the spin lock delay for Release call as well, then the meaning of the stat is getting changed. It might be that new meaning of spin_delay_count stat is more useful in some situations, however the older one has its own benefits, so I am not sure if changing this as part of this patch is the best thing to do. In which case does the old definition make sense, where the new one doesn't? I don't think it exists. And changing it here seems to make sense because spinlock contention fundamentally changes it meaning for lwlocks anyway as in most paths we don't take a spinlock anymore. 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. The reason I've done so is that it's otherwise much harder to debug issues where there are backend that have been woken up already, but haven't rerun yet. Without this there's simply no evidence of that state. I can't see this being relevant for performance, so I'd rather have it stay that way. I am not sure what useful information we can get during debugging by not doing this in LWLockWakeup() It's useful because you can detect backends that have been scheduled to acquire the lock, but haven't yet. They're otherwise invisible. and w.r.t performance it can lead extra function call, few checks and I think in some cases even can acquire/release spinlock. I fail to see how that could be the case. And again, this is code that's only executed around a couple syscalls. And the cacheline will be touched around there *anyway*. 6. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. if (wokeup_somebody waiter-lwWaitMode == LW_EXCLUSIVE) continue; .. if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE) { .. wokeup_somebody = true; } .. } .. } a. IIUC above logic, if the waiter queue is as follows: (S-Shared; X-Exclusive) S X S S S X S S it can skip the exclusive waiters and release shared waiter. If my understanding is right, then I think instead of continue, there should be *break* in above logic. No, it looks correct to me. What happened is that the first S was woken up. So there's no point in waking up an exclusive locker, but further non-exclusive lockers can be woken up. Okay, even then it makes the current logic of wakingup different which I am not sure is what this patch is intended for. It's already done in a separate patch... b. Consider below sequence of waiters: (S-Shared; X-Exclusive) S S X S S I think as per un-patched code, it will wakeup waiters uptill (including) first Exclusive, but patch will wake up uptill (*excluding*) first Exclusive. I don't think the current code does that. LWLockRelease() { .. /* * If the front waiter wants exclusive lock, awaken him only. * Otherwise awaken as many waiters as want
Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations
On Wed, Oct 15, 2014 at 09:50:16AM -0400, Tom Lane wrote: The same thought had occurred to me. Probably the main use of the datetime parsing code in ecpg is for interpreting outputs from the server, and (at least by default) the server doesn't use timezone abbreviations when printing timestamps. So maybe that's largely dead code anyhow. I would not propose back-patching such a change, but we could try it in 9.5 and see if anyone complains. Agreed on all accounts. A less drastic remedy would be to remove just those abbreviations whose meaning has actually changed over time. Eventually that might be all of them ... but in the meantime, we could at least argue that we weren't breaking any case that worked well before. This is what your patch did, right? Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] from_collapse_limit considerations
[ I think I responded earlier but don't see my mail in the archive... ] Tom Lane t...@sss.pgh.pa.us wrote: Antonin Houska a...@cybertec.at writes: I noticed that - unlike join_collapse_limit - the from_collapse_limit does not enforce maximum length of the top-level list. Shouldn't it do? How would it do that? You want it to fail outright if there are more than N tables? That seems unhelpful. Sure, truncation of the FROM list would be crazy, and that's not what I tried to propose. -- Antonin Houska Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function array_agg(array)
Thanks for the review 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. OK, i'll add the documentation and regression test 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we? Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is. In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented? 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): astate-alen = 64; /* arbitrary starting array size */ it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory. Regards, -- Ali Akbar
Re: [HACKERS] Support UPDATE table SET(*)=...
On Fri, Oct 17, 2014 at 10:47 AM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Oct 17, 2014 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think it wouldn't; Merlin is proposing that f would be taken as the field name. A more realistic objection goes like this: create table foo(f int, g int); update foo x set x = (1,2); -- works alter table foo add column x int; update foo x set x = (1,2,3); -- no longer works It's not a real good thing if a column addition or renaming can so fundamentally change the nature of a query. That's exactly how SELECT works. I also dispute that the user should be surprised in such cases. Well, the reason we have a problem in SELECT is that we support an unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that SELECT foo FROM foo could represent a whole-row selection is nowhere to be found in the SQL standard, for good reason IMO. But we've never had the courage to break cleanly with this Berkeley leftover and insist that people spell it SQL-style as SELECT ROW(foo.*) FROM foo. I'd just as soon not introduce the same kind of ambiguity into UPDATE if we have a reasonable alternative. Ah, interesting point (I happen to like the terse syntax and use it often). This is for posterity anyways since you guys seem to like Atri's proposal, which surprised me. However, I think you're over simplifying things here. Syntax aside: I think SELECT f FROM foo f; and a hypothetical SELECT row(f.*) FROM foo f; give different semantics. The former gives an object of type 'f' and the latter gives type 'row'. To get parity, you'd have to add an extra cast which means you'd have to play tricky games to avoid extra performance overhead besides being significantly more verbose. I'm aware some of the other QUELisms are pretty dodgy and have burned us in the past (like that whole function as record reference thing) but pulling a record as a field in select is pretty nice. It's also widely used and quite useful in json serialization. Been thinking about this in the back of my mind the last couple of days. There are some things you can do with the QUEL 'table as column' in select syntax that are impossible otherwise, at least today, and its usage is going to proliferate because of that. Row construction via () or row() needs to be avoided whenever the column names are important and there is no handy type to cast to. For example, especially during json serialization it's convenient to do things like: select array_agg((select q from (select a, b) q) order by ...) from foo; ...where a,b are fields of foo. FWICT, this is the only way to do json serialization of arbitrary runtime row constructions in a way that does not anonymize the type. Until I figured out this trick I used to create lots of composite types that served no purpose other than to give me a type to cast to which is understandably annoying. if: select (x).* from (select (1, 2) as x) q; worked and properly expanded x to given names should they exist and: SELECT row(f.*) FROM foo f; worked and did same, and: SELECT (row(f.*)).* FROM foo f; was as reasonably performant and gave the same results as: SELECT (f).* FROM foo f; ...then IMSNHO you'd have a reasonable path to deprecating the QUEL inspired syntax. Food for thought. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations
Michael Meskes mes...@postgresql.org writes: On Wed, Oct 15, 2014 at 09:50:16AM -0400, Tom Lane wrote: The same thought had occurred to me. Probably the main use of the datetime parsing code in ecpg is for interpreting outputs from the server, and (at least by default) the server doesn't use timezone abbreviations when printing timestamps. So maybe that's largely dead code anyhow. I would not propose back-patching such a change, but we could try it in 9.5 and see if anyone complains. Agreed on all accounts. A less drastic remedy would be to remove just those abbreviations whose meaning has actually changed over time. Eventually that might be all of them ... but in the meantime, we could at least argue that we weren't breaking any case that worked well before. This is what your patch did, right? No, I did not touch ecpg's set of tokens at all, just changed the representation of datetktbl to match the new backend coding. I figured we could discuss behavioral changes separately. I don't have a strong opinion about which of the above things to do ... what's your preference? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about RI checks
Florian Pflug f...@phlo.org wrote: Florian Pflug f...@phlo.org wrote: But that's wrong. The transaction's snapshot *would* see that row, so we ought to raise an error. Note that this applies also to mode SERIALIZABLE, and breaks true serializability in some cases, since we don't do conflict detection for RI enforcement queries. Here's a test case, involving two transaction A and B. I tried this on REL9_4_STABLE. Setup - CREATE TABLE parent (id int NOT NULL PRIMARY KEY); CREATE TABLE child (id int NOT NULL PRIMARY KEY, parent_id int NOT NULL REFERENCES parent (id)); INSERT INTO parent (id) VALUES (1); INSERT INTO child (id, parent_id) VALUES (1, 1); Failure Case A:: set default_transaction_isolation='serializable'; A:: begin; A:: select * from child; - id | parent_id +--- 1 |1 B:: set default_transaction_isolation='serializable'; B:: begin; B:: delete from child; - DELETE 1 B:: commit; A:: delete from parent; - DELETE 1 A:: commit; A can neither come before B in any serializable history (since the DELETE should fail then), nor after it (since it shouldn't see the row in the child table then). Yet we don't complain, even though both transaction run in SERIALIZABLE mode. Simplifying the display of this a bit: Tables parent and child each have one row. Transaction A = select * from child; [it sees the child row] Transaction B = delete from child; [child row is deleted] delete from parent; [parent row deleted] TA sees the row that TB deletes, creating a rw-dependency that implies that TA ran first. On the other hand, the FK trigger fired by the delete from parent *doesn't* see the row deleted by TB, which implies that TB ran first. I agree, this is a problem for serializable transactions. This should not be considered a problem for repeatable read transactions because the change in visible rows meet the definition of phantom reads, which are allowed in repeatable read: A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. Phantom reads are not *required* to occur in repeatable read transactions, and in PostgreSQL they generally don't, so we *might* want to change this behavior; I'm just saying that we are conforming to requirements of the standard even if we don't. Leaving this alone for repeatable read transactions would require a change to our documentation, though, since we currently assert that we don't allow phantom reads in our repeatable read implementation. Either SSI needs to include the RI checking queries in its tracking *or* TA needs to throw an error if there are child rows visible according to its transaction snapshot -- at least in serializable transactions. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function array_agg(array)
2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com: Thanks for the review 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. OK, i'll add the documentation and regression test 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we? Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is. aha, so isn't better to fix a performance for accumArrayResult ? In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented? where you can start? postgres=# explain select array(select a from test); ERROR: 42704: could not find array type for data type integer[] LOCATION: exprType, nodeFuncs.c:116 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): astate-alen = 64; /* arbitrary starting array size */ it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory. you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable. You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. Regards, -- Ali Akbar
[HACKERS] idea: allow AS label inside ROW constructor
Hi with new functions row_to_json(b), there is more often usage of ROW constructor. Using names in fields is relative difficult. Because ROW has special clause in parser, I am thinking so we can enable labeling inside ROW constructor so instead currently supported: select row_to_json(r) from (select 10 as a, 20 as b) r; users can to write: select row_to_json(row(10 as a,20 as b)); labeling will be enabled only inside ROW constructor. I don't propose enable it everywhere. What do you think about it? Regards Pavel Currently supported syntax is natural for long time PostgreSQL user, but it is relative strange for usual user.
Re: [HACKERS] idea: allow AS label inside ROW constructor
here is a motivation, why I propose this feature http://dba.stackexchange.com/questions/27732/set-names-to-attributes-when-creating-json-with-row-to-json same query I have in Czech postgres users mailing list Pavel 2014-10-22 18:21 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com: Hi with new functions row_to_json(b), there is more often usage of ROW constructor. Using names in fields is relative difficult. Because ROW has special clause in parser, I am thinking so we can enable labeling inside ROW constructor so instead currently supported: select row_to_json(r) from (select 10 as a, 20 as b) r; users can to write: select row_to_json(row(10 as a,20 as b)); labeling will be enabled only inside ROW constructor. I don't propose enable it everywhere. What do you think about it? Regards Pavel Currently supported syntax is natural for long time PostgreSQL user, but it is relative strange for usual user.
Re: [HACKERS] idea: allow AS label inside ROW constructor
On Wed, Oct 22, 2014 at 11:21 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi with new functions row_to_json(b), there is more often usage of ROW constructor. Using names in fields is relative difficult. Because ROW has special clause in parser, I am thinking so we can enable labeling inside ROW constructor so instead currently supported: select row_to_json(r) from (select 10 as a, 20 as b) r; users can to write: select row_to_json(row(10 as a,20 as b)); labeling will be enabled only inside ROW constructor. I don't propose enable it everywhere. What do you think about it? It's a neat idea -- maybe a better alternative to what I was thinking here: http://postgresql.1045698.n5.nabble.com/Support-UPDATE-table-SET-tp5823073p5823944.html Some questions: *) What would the parser transformation resolve to *) Are we ok with SQL standard *) Do you think this (or some similar variant) would work? select row_to_json(row(foo.*)) from foo; merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about RI checks
This should not be considered a problem for repeatable read transactions because the change in visible rows meet the definition of phantom reads, which are allowed in repeatable read: A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. Now I'm confused. Isn't the whole point of REPEATABLE READ to provide, well, repeatable reads? Also, note that after the DELETE FROM parent, further SELECTS in the same transaction will use the original snapshot again, und thus will see the conflicting child rows again that were ignored by the RI trigger. But they won't, of course, see the parent row. IOW, transaction A will, after the delete, see a state of the database in which the PK constraint is broken. I don't think that's acceptable in any isolation level. Best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about RI checks
Florian Pflug f...@phlo.org wrote: This should not be considered a problem for repeatable read transactions because the change in visible rows meet the definition of phantom reads, which are allowed in repeatable read: A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. Now I'm confused. Isn't the whole point of REPEATABLE READ to provide, well, repeatable reads? What the standard requires for REPEATABLE READ is that if you re-read the same row later in a transaction it will containt the same values -- the read of any particular *row* is repeatable (if it exists at both times); it does not guarantee that the same selection criteria will return the same set of rows every time. The standard only requires that of SERIALIZABLE transactions. PostgreSQL has historically provided more rigorous protections at REPEATABLE READ than the standard requires. Our docs claim: | When you select the level Read Uncommitted you really get Read | Committed, and phantom reads are not possible in the PostgreSQL | implementation of Repeatable Read, so the actual isolation level | might be stricter than what you select. This is permitted by the | SQL standard: the four isolation levels only define which | phenomena must not happen, they do not define which phenomena | must happen. As you have shown, our FK/RI implementation exposes phantom reads to clients, so at a minimum our docs are wrong. Also, note that after the DELETE FROM parent, further SELECTS in the same transaction will use the original snapshot again, und thus will see the conflicting child rows again that were ignored by the RI trigger. But they won't, of course, see the parent row. IOW, transaction A will, after the delete, see a state of the database in which the PK constraint is broken. I don't think that's acceptable in any isolation level. Good point. Based on that observation, I agree that our RI is broken at both the REPEATABLE READ and SERIALIZABLE isolation levels. I think that READ COMMITTED is OK, because it will see the child row as deleted in time to prevent problems. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idea: allow AS label inside ROW constructor
2014-10-22 18:35 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Wed, Oct 22, 2014 at 11:21 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi with new functions row_to_json(b), there is more often usage of ROW constructor. Using names in fields is relative difficult. Because ROW has special clause in parser, I am thinking so we can enable labeling inside ROW constructor so instead currently supported: select row_to_json(r) from (select 10 as a, 20 as b) r; users can to write: select row_to_json(row(10 as a,20 as b)); labeling will be enabled only inside ROW constructor. I don't propose enable it everywhere. What do you think about it? It's a neat idea -- maybe a better alternative to what I was thinking here: http://postgresql.1045698.n5.nabble.com/Support-UPDATE-table-SET-tp5823073p5823944.html Some questions: *) What would the parser transformation resolve to row:ROW '(' expr_list ')' { $$ = $3; } | ROW '(' ')' { $$ = NIL; } | '(' expr_list ',' a_expr ')' { $$ = lappend($2, $4); } ; we can replace a expr_list by target_list. I know only so it doesn't enforce a problems with gramatic - bison doesn't raise any warning. *) Are we ok with SQL standard SQL standard doesn't think named attributes in row - so it is out of range ANSI. But it is not in conflict with standard. AS name is used more in SQL/MM, SQL/XML -- and function named parameters has different syntax parameter_name = value - I checked it against SQL99. *) Do you think this (or some similar variant) would work? select row_to_json(row(foo.*)) from foo; It looks like independent feature and can work too - it is more natural for user. merlin
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
On Wed, Oct 22, 2014 at 03:14:26PM +0200, Dag-Erling Smørgrav wrote: In a case like POODLE we probably wouldn't have done it anyway, as it doesn't affect our connections (we're not http) If I understand correctly, imaps has been shown to be vulnerable as well, so I wouldn't be so sure. Reference? It's an SSL3 specific attack, so most servers are not vulnerable because TLS will negotiate to the highest supported protocol. So unless one of the client/server doesn't support TLS1.0 there's no issue. The only reason http is vulnerable is because browsers do protocol downgrading, something strictly forbidden by the spec. Additionally, the man-in-the-middle must be able to control the padding in the startup packet, which just isn't possible (no scripting language in the client). Since you can already specify the cipher list, couldn't you just add -SSLv3 to the cipher list and be done? Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On 10/22/2014 04:24 PM, Michael Paquier wrote: On Wed, Oct 22, 2014 at 1:53 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think we should take a more wholesale approach to this. We should enforce the rule that the server only ever archives WAL files belonging to the same timeline that the server generates. IOW, the server only archives the WAL that it has generated. Hm?! Would that be really back-patchable? There may be in the wild tools or users that rely on the fact a node archives segment files from all timelines. Hmm, so it would be a tool or user that manually copies a file to the pg_xlog directory of a standby server, and expects the standby to archive the file after promotion. That seems a bit far-fetched, although I've seen people do strange things. I think it would be acceptable as long as we document the change in behavior in the release notes. I don't have much hope that we'll ever be able to nail down the correct behavior with the current approach. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Martijn van Oosterhout klep...@svana.org writes: Dag-Erling Smørgrav d...@des.no writes: If I understand correctly, imaps has been shown to be vulnerable as well, so I wouldn't be so sure. Reference? Sorry, no reference. I was told that Thunderbird was vulnerable to POODLE when talking imaps. Since you can already specify the cipher list, couldn't you just add -SSLv3 to the cipher list and be done? I didn't want to change the existing behavior; all I wanted was to give users a way to do so if they wish. DES -- Dag-Erling Smørgrav - d...@des.no -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Tue, Sep 23, 2014 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Add a new p_tableref_hook function pointer, similar to p_paramref_hook. Whenever the parser sees a RangeVar that it doesn't recognize (or actually, I think it should call it *before* resolving regular tables, but let's ignore that for now), it calls the p_tableref_hook. It can return a new RelationParam node (similar to regular Param), which contains a numeric ID for the table/tuplestore, as well as its tuple descriptor. For the execution phase, add a new array of Tuplestorestates to ParamListInfo. Similar to the existing array of ParamExternalDatas. I haven't been following this issue closely, but this sounds like a really nice design. I'm on board with the parser hooks part of that. I don't especially agree with the idea of a new sub-structure for ParamListInfo: if you do that you will need a whole bunch of new boilerplate infrastructure to allocate, copy, and generally manage that structure, for darn little gain. What I'd suggest is just saying that some Params might have type INTERNAL with Datum values that are pointers to tuplestores; then all you need to do is remember which Param number has been assigned to the particular tuplestore you want. There is already precedent for that in the recursive CTE code, IIRC. Studying this proposed design a bit further, I am a little fuzzy on what is supposed to happen in this design during parse analysis. In Kevin's current code, after checking whether a RangeVar might be a CTE reference and before deciding that it must be a table reference, scanNameSpaceForTsr() is used to check whether there's a tuplestore by that name and, if so, then we insert a RTE with type RTE_TUPLESTORE which references the tuplestore by name. To me, the obvious thing to do here seems to be to instead call p_tableref_hook and let it return a suitable RangeTblRef (or NULL if it wishes to take no action). In the case where the hook wants to redirect the use of that name to a tuplestore, it can add a range-table entry of type RTE_TUPLESTORE, and that entry can store a parameter-index rather than, as in the current design, a name. But then where does Heikki's notion of a RelationParam get used? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving of INT64_FORMAT to c.h
On 10/16/2014 08:08 AM, Andres Freund wrote: On 2014-10-16 08:04:17 -0400, Jan Wieck wrote: Hi, PostgreSQL has for ages defined INT64_FORMAT and UINT64_FORMAT in pg_config.h. This commit http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ce486056ecd28050 moved those definitions to c.h, which breaks compilation of all released Slony-I versions against current master. Can those be moved back to where they used to be? Well, you could add additional configure stuff to also emit what you want. Slony uses the definitions in external tools, like slon and slonik, to format sequence numbers in log output. Then it should include c.h/postgres_fe.h? So the header of c.h says Note that the definitions here are not intended to be exposed to clients but postgres_fe.h says This should be the first file included by PostgreSQL client libraries and Should client programs that live outside the postgres source tree be including postgres_fe.h ? I have a feeling the answer is no. If the answer is no, then why does a make install install postgres_fe.h ? Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing so because it was causing issues (I think on win32 builds) Maybe slony client programs shouldn't be trying to steal portability definitions from postgres headers, but I doubt we are the only ones doing that. It isn't a big deal for slony to define it's own INT64_FORMAT for 9.5+ but third party projects that have been including pg_config.h will hit similar issues. if there was good reason for the change then fine (Postgres isn't intended to be a general purpose C portability layer). Steve Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving of INT64_FORMAT to c.h
On Wed, Oct 22, 2014 at 4:12 PM, Steve Singer st...@ssinger.info wrote: So the header of c.h says Note that the definitions here are not intended to be exposed to clients but postgres_fe.h says This should be the first file included by PostgreSQL client libraries and Should client programs that live outside the postgres source tree be including postgres_fe.h ? I have a feeling the answer is no. If the answer is no, then why does a make install install postgres_fe.h ? I think the answer is yes. Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing so because it was causing issues (I think on win32 builds) That seems like something we ought to consider fixing, but obviously we'd need more details. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving of INT64_FORMAT to c.h
Robert Haas robertmh...@gmail.com writes: On Wed, Oct 22, 2014 at 4:12 PM, Steve Singer st...@ssinger.info wrote: So the header of c.h says Note that the definitions here are not intended to be exposed to clients but postgres_fe.h says This should be the first file included by PostgreSQL client libraries and Should client programs that live outside the postgres source tree be including postgres_fe.h ? I have a feeling the answer is no. If the answer is no, then why does a make install install postgres_fe.h ? I think the answer is yes. Yeah. In particular, postgres_fe.h sees to it that FRONTEND is #define'd; without that there is *not* a guarantee that c.h will work for non-backend compiles. We would much rather you were including postgres_fe.h than c.h directly. Having said that, there is not and never will be a guarantee that everything that postgres_fe.h declares is immutable, and that certainly applies to pg_config.h in particular. There's a reason why libpq doesn't include that into its public headers ... Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing so because it was causing issues (I think on win32 builds) That seems like something we ought to consider fixing, but obviously we'd need more details. Indeed. But I suspect the core of it is going to be that clients of postgres_fe.h are expected to be linking in libpgport.a ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On 10/22/2014 11:10 PM, Robert Haas wrote: On Tue, Sep 23, 2014 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Add a new p_tableref_hook function pointer, similar to p_paramref_hook. Whenever the parser sees a RangeVar that it doesn't recognize (or actually, I think it should call it *before* resolving regular tables, but let's ignore that for now), it calls the p_tableref_hook. It can return a new RelationParam node (similar to regular Param), which contains a numeric ID for the table/tuplestore, as well as its tuple descriptor. For the execution phase, add a new array of Tuplestorestates to ParamListInfo. Similar to the existing array of ParamExternalDatas. I haven't been following this issue closely, but this sounds like a really nice design. I'm on board with the parser hooks part of that. I don't especially agree with the idea of a new sub-structure for ParamListInfo: if you do that you will need a whole bunch of new boilerplate infrastructure to allocate, copy, and generally manage that structure, for darn little gain. What I'd suggest is just saying that some Params might have type INTERNAL with Datum values that are pointers to tuplestores; then all you need to do is remember which Param number has been assigned to the particular tuplestore you want. There is already precedent for that in the recursive CTE code, IIRC. Studying this proposed design a bit further, I am a little fuzzy on what is supposed to happen in this design during parse analysis. In Kevin's current code, after checking whether a RangeVar might be a CTE reference and before deciding that it must be a table reference, scanNameSpaceForTsr() is used to check whether there's a tuplestore by that name and, if so, then we insert a RTE with type RTE_TUPLESTORE which references the tuplestore by name. To me, the obvious thing to do here seems to be to instead call p_tableref_hook and let it return a suitable RangeTblRef (or NULL if it wishes to take no action). In the case where the hook wants to redirect the use of that name to a tuplestore, it can add a range-table entry of type RTE_TUPLESTORE, and that entry can store a parameter-index rather than, as in the current design, a name. But then where does Heikki's notion of a RelationParam get used? I was thinking that the hook would return a RelationParam. When parse analysis sees the returned RelationParam, it adds an entry for that to the range table, and creates the RangeTblRef for it. The way you describe it works too, but manipulating the range table directly in the hook seems a bit too low-level. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Heikki Linnakangas hlinnakan...@vmware.com writes: On 10/22/2014 11:10 PM, Robert Haas wrote: Studying this proposed design a bit further, I am a little fuzzy on what is supposed to happen in this design during parse analysis. In Kevin's current code, after checking whether a RangeVar might be a CTE reference and before deciding that it must be a table reference, scanNameSpaceForTsr() is used to check whether there's a tuplestore by that name and, if so, then we insert a RTE with type RTE_TUPLESTORE which references the tuplestore by name. To me, the obvious thing to do here seems to be to instead call p_tableref_hook and let it return a suitable RangeTblRef (or NULL if it wishes to take no action). In the case where the hook wants to redirect the use of that name to a tuplestore, it can add a range-table entry of type RTE_TUPLESTORE, and that entry can store a parameter-index rather than, as in the current design, a name. But then where does Heikki's notion of a RelationParam get used? I was thinking that the hook would return a RelationParam. When parse analysis sees the returned RelationParam, it adds an entry for that to the range table, and creates the RangeTblRef for it. The way you describe it works too, but manipulating the range table directly in the hook seems a bit too low-level. The problem with that idea is that then the API for the hook has to cover every possible sort of RTE that hooks might wish to create; I see no reason to restrict them to creating just one kind. I agree that the hook should avoid *physically* manipulating the rangetable, but it seems reasonable to expect that it can call one of the addRangeTableEntryXXX functions provided by parse_relation.c, and then return a RangeTblEntry* gotten that way. So hooks would have an API more or less equivalent to, eg, transformRangeFunction(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote: I got to ask: Why is it helpful that we have this in contrib? I have a good share of blame to bear for that, but I think we need to stop dilluting contrib evermore with test programs. These have a place, but I don't think it should be contrib. I don't think pg_background is merely a test program: I think it's a quite useful piece of functionality. It can be used for running VACUUM from a procedure, which is something people have asked for more than once, or for simulating an autonomous transaction. Granted, it'll be a lot slower than a real autonomous transaction, but it's still better than doing it via dblink, because you don't have to futz with authentication. I would be all in favor of moving things like test_decoding, test_parser, and test_shm_mq to src/test or contrib/test or wherever else we want to put them, but I think pg_background belongs in contrib. +/* Fixed-size data passed via our dynamic shared memory segment. */ +typedef struct pg_background_fixed_data +{ + Oid database_id; + Oid authenticated_user_id; + Oid current_user_id; + int sec_context; + char database[NAMEDATALEN]; + char authenticated_user[NAMEDATALEN]; +} pg_background_fixed_data; Why not NameData? No particular reason. Changed. whitespace damage. I went through and fixed everything that git diff --check complained about. Let me know if you see other problems yet. +static HTAB *worker_hash; Hm. So the lifetime of this hash's contents is managed via on_dsm_detach(), do I understand that correctly? More or less, yes. Hm. So every user can do this once the extension is created as the functions are most likely to be PUBLIC. Is this a good idea? Why not? If they can log in, they could start separate sessions with similar effect. + /* + * Whether we succeed or fail, a future invocation of this function + * may not try to read from the DSM once we've begun to do so. + * Accordingly, make arrangements to clean things up at end of query. + */ + dsm_unkeep_mapping(info-seg); + + /* Set up tuple-descriptor based on colum definition list. */ + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg(function returning record called in context + that cannot accept type record), + errhint(Try calling the function in the FROM clause + using a column definition list.))); Hm, normally we don't add linebreaks inside error messages. I copied it from dblink. I'm unsure right now about the rules surrounding this, but shouldn't we check that the user is allowed to execute these? And shouldn't we fall back to non binary functions if no binary ones are available? I can't see any reason to do either of those things. I'm not aware that returning data in binary format is in any way intended to be a security-restricted operation, or that we have any data types that actually matter without send and receive functions. If we do, I think the solution is to add them, not make this more complicated. + /* + * Limit the maximum error level to ERROR. We don't want + * a FATAL inside the background worker to kill the user + * session. + */ + if (edata.elevel ERROR) + edata.elevel = ERROR; Hm. But we still should report that it FATALed? Maybe add error context notice about it? Not nice, but I don't have a immediately better idea. I think it generally would be rather helpful to add the information that this wasn't originally an error triggered by this process. The user might otherwise be confused when looking for the origin of the error in the log. Yeah, I was wondering if we needed some kind of annotation here. What I'm wondering about is appending something to the errcontext, perhaps background worker, PID %d. + case 'A': + { + /* Propagate NotifyResponse. */ + pq_putmessage(msg.data[0], msg.data[1], nbytes - 1); + break; Hm. Are we sure to be in a situation where the client expects these? And are we sure their encoding is correct? The other data goe through input/output methods checking for that, but here we don't.
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 10/22/2014 04:03 PM, Robert Haas wrote: On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote: I got to ask: Why is it helpful that we have this in contrib? I have a good share of blame to bear for that, but I think we need to stop dilluting contrib evermore with test programs. These have a place, but I don't think it should be contrib. I don't think pg_background is merely a test program: I think it's a quite useful piece of functionality. It can be used for running VACUUM from a procedure, which is something people have asked for more than once, or for simulating an autonomous transaction. Granted, it'll be a lot slower than a real autonomous transaction, but it's still better than doing it via dblink, because you don't have to futz with authentication. I think this is a very useful program but I wonder if it makes more sense to push it out to pgxn? Why do we need an ever growing contrib? Further, if it is good enough to go into contrib, why isn't it just in core? Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc If we send our children to Caesar for their education, we should not be surprised when they come back as Romans. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
Brightwell, Adam wrote: All, Thanks! Please add it to the next commitfest. Sounds good. I'll update the patch and add accordingly. Attached is an updated patch. I noticed something strange while perusing this patch, but the issue predates the patch. Some messages say must be superuser or replication role to foo, but our longstanding practice is to say permission denied to foo. Why do we have this inconsistency? Should we remove it? If we do want to keep the old wording this patch should use permission denied to in the places that it touches. Other than that, since we already agreed that it's something we want, the only comment I have about this patch is an empty line in variable declarations here which should be removed: diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c new file mode 100644 index c9a9baf..ed89b23 *** a/src/backend/commands/alter.c --- 807,848 bool *nulls; bool *replaces; ! AclObjectKind aclkind = get_object_aclkind(classId); ! -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
Sawada Masahiko wrote: Thank you for reviewing. I agree 2) - 5). Attached patch is latest version patch I modified above. Also, I noticed I had forgotten to add the patch regarding document of reindexdb. Please don't use pg_catalog in the regression test. That way we will need to update the expected file whenever a new catalog is added, which seems pointless. Maybe create a schema with a couple of tables specifically for this, instead. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Wed, Oct 22, 2014 at 7:12 PM, Joshua D. Drake j...@commandprompt.com wrote: I don't think pg_background is merely a test program: I think it's a quite useful piece of functionality. It can be used for running VACUUM from a procedure, which is something people have asked for more than once, or for simulating an autonomous transaction. Granted, it'll be a lot slower than a real autonomous transaction, but it's still better than doing it via dblink, because you don't have to futz with authentication. I think this is a very useful program but I wonder if it makes more sense to push it out to pgxn? Why do we need an ever growing contrib? Further, if it is good enough to go into contrib, why isn't it just in core? Sure, there's always this debate. Certainly, there's no reason it has to be in contrib rather than core or PGXN. From my point of view, putting it on PGXN would be a less-than-awesome development because then the infrastructure that the patch introduces (patches 1-5 of the series) would have no users in core or contrib, which means that (A) if anyone is thinking of modifying that code in the future, they'll have more difficulty knowing whether their changes break anything and (B) if someone breaks something unwittingly, the buildfarm will not tell us. Now, admittedly, I expect that eventually all of this stuff will be used in core - for parallelism. But in the meantime I think having at least one example in the PostgreSQL source tree of how each piece of extensibility infrastructure that we have can be used is handy and useful and generally something that we ought to encourage. And this one has the advantage - unlike some stuff in contrib - of actually being somewhat useful for real work, which a decent amount of the stuff in contrib isn't. What about moving it the opposite direction, from contrib all the way into core? The main advantage of that in my book is that it might make it easier to reduce the code duplication between pg_background and exec_simple_query; the main disadvantage is that it reduces the super-user's control, because now the functionality is always available instead of being something that the super-user can choose to install, or not. But these are all judgement calls, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of accept incoming network connections prompts on OS X
On Tue, Oct 21, 2014 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: If you do any Postgres development on OS X, you've probably gotten seriously annoyed by the way that, every single time you reinstall the postmaster executable, you get a dialog box asking whether you'd like to allow it to accept incoming network connections. (At least, you do unless you disable the OS firewall, which is not a great idea.) It's particularly awful to run make check-world in this environment, because you get a pop-up for each test install. Ugh. This must be new in Mavericks, because I don't get any such behavior on 10.8.5. What an awful, awful behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
What about moving it the opposite direction, from contrib all the way into core? The main advantage of that in my book is that it might make it easier to reduce the code duplication between pg_background and exec_simple_query; the main disadvantage is that it reduces the super-user's control, because now the functionality is always available instead of being something that the super-user can choose to install, or not. But these are all judgement calls, of course. I lean toward a push into core because: 1. I expect that eventually all of this stuff will be used in core - for parallelism. 2. Contrib is already bloated out. (which is a whole other discussion) 3. I am not sure I buy into the super-user argument. Just because the functionality is there, doesn't mean it has to be used. 4. The main advantage of that in my book is that it might make it easier to reduce the code duplication between pg_background and exec_simple_query; JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc If we send our children to Caesar for their education, we should not be surprised when they come back as Romans. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of accept incoming network connections prompts on OS X
Robert Haas robertmh...@gmail.com writes: On Tue, Oct 21, 2014 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: If you do any Postgres development on OS X, you've probably gotten seriously annoyed by the way that, every single time you reinstall the postmaster executable, you get a dialog box asking whether you'd like to allow it to accept incoming network connections. Ugh. This must be new in Mavericks, because I don't get any such behavior on 10.8.5. Hm, I thought it went further back than that ... I remember having put up with it for some time now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving of INT64_FORMAT to c.h
On 10/22/2014 04:17 PM, Robert Haas wrote: Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing so because it was causing issues (I think on win32 builds) That seems like something we ought to consider fixing, but obviously we'd need more details. When I'll try to find sometime to get a windows environment running and try to figure out what the problems were. It's also possible that I removed the postgres_fe include at the same time as I was fixing other win32 issues and the postgres_fe include wasn't causing a problem it was just removed because it was unnecessary. At the moment slony only includes the server include dir if we are building with pgport (which we normally do on windows) We have had reports of issues like described (http://www.slony.info/bugzilla/show_bug.cgi?id=315) where some installs make us pick up server and client includes from different versions of PG. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplify calls of pg_class_aclcheck when multiple modes are used
On 10/21/14 6:19 PM, Michael Paquier wrote: On Wed, Oct 22, 2014 at 5:03 AM, Peter Eisentraut pete...@gmx.net mailto:pete...@gmx.net wrote: While looking at this, I wrote a few tests cases for sequence privileges, because that was not covered at all. That patch is attached. +1 for those tests. -- Michael Committed your patch and tests. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function array_agg(array)
2014-10-22 22:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: 2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com: Thanks for the review 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. OK, i'll add the documentation and regression test 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we? Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is. aha, so isn't better to fix a performance for accumArrayResult ? Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter. In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented? where you can start? postgres=# explain select array(select a from test); ERROR: 42704: could not find array type for data type integer[] LOCATION: exprType, nodeFuncs.c:116 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword Found it. Thanks. On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109 if (sublink-subLinkType == ARRAY_SUBLINK) { type = get_array_type(type); if (!OidIsValid(type)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(could not find array type for data type %s, format_type_be(exprType((Node *) tent-expr); } to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places. I'll think more about this later. For this patch, i'll go without changes in pg_type.h. 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): astate-alen = 64; /* arbitrary starting array size */ it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory. you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable. You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. Ok. Regards, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
2014-10-23 4:00 GMT+02:00 Ali Akbar the.ap...@gmail.com: 2014-10-22 22:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: 2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com: Thanks for the review 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. OK, i'll add the documentation and regression test 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we? Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is. aha, so isn't better to fix a performance for accumArrayResult ? Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter. In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented? where you can start? postgres=# explain select array(select a from test); ERROR: 42704: could not find array type for data type integer[] LOCATION: exprType, nodeFuncs.c:116 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword Found it. Thanks. On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109 yes, it is true - this is really big change and maybe needs separate discuss - ***if we allow cycle there. I am not sure about possible side effects***. Maybe this change is not necessary, you can fix a check only ... if type is not array or if get_array_type is null raise a error if (sublink-subLinkType == ARRAY_SUBLINK) { type = get_array_type(type); if (!OidIsValid(type)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(could not find array type for data type %s, format_type_be(exprType((Node *) tent-expr); } to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places. attention: probably we don't would to allow arrays everywhere. I'll think more about this later. For this patch, i'll go without changes in pg_type.h. 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): astate-alen = 64; /* arbitrary starting array size */ it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory. you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable. You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. Ok. Regards, -- Ali Akbar