Re: [HACKERS] moving from contrib to bin
On 4/12/15 8:00 PM, Alvaro Herrera wrote: Peter Eisentraut wrote: On 3/11/15 8:21 PM, Michael Paquier wrote: Attached is a series of patch rebased on current HEAD, there were some conflicts after perl-tidying the refactoring patch for MSVC. Note that this series still uses PGXS in the Makefiles, I am fine to update them if necessary once this matter is set (already did this stuff upthread with a previous version). OK, here we go. I have committed the pg_archivecleanup move, with a complete makefile and your Windows help. Let's see how it builds. Hmm, should it appear automatically in babel.pg.org? I don't see it there yet. It needs an nls.mk file first. Feel free to add them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TAP tests of pg_rewind not stopping servers used for the tests
Hi all, In the refactoring of pg_rewind tests committed as 53ba107, it happens that some of the servers used for the tests are not stopped at the end of the test. The issue is that RewindTest.pm uses END to stop the servers, but now that the local and remote tests are grouped half of the servers are not stopped. The reason why I did not notice that before is because as pg_rewind uses local Unix socket to work on the nodes, the tests are not failing. Sorry about that, that's my fault. And attached is a patch fixing the issue: it replaces END by a cleanup function called at the end of each local/remote test to be sure that the servers are shut down. Regards, -- Michael diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm index 50cae2c..1339871 100644 --- a/src/bin/pg_rewind/RewindTest.pm +++ b/src/bin/pg_rewind/RewindTest.pm @@ -22,6 +22,8 @@ package RewindTest; # 5. run_pg_rewind - stops the old master (if it's still running) and runs # pg_rewind to synchronize it with the now-promoted standby server. # +# 6. clean_rewind_test - enforces stop of the servers used in the test. +# # The test script can use the helper functions master_psql and standby_psql # to run psql against the master and standby servers, respectively. The # test script can also use the $connstr_master and $connstr_standby global @@ -56,6 +58,7 @@ our @EXPORT = qw( create_standby promote_standby run_pg_rewind + clean_rewind_test ); @@ -262,9 +265,8 @@ recovery_target_timeline='latest' } # Clean up after the test. Stop both servers, if they're still running. -END +sub clean_rewind_test { - my $save_rc = $?; if ($test_master_datadir) { system pg_ctl -D $test_master_datadir -s -m immediate stop 2 /dev/null; @@ -273,5 +275,4 @@ END { system pg_ctl -D $test_standby_datadir -s -m immediate stop 2 /dev/null; } - $? = $save_rc; } diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl index ae26d01..a1d679f 100644 --- a/src/bin/pg_rewind/t/001_basic.pl +++ b/src/bin/pg_rewind/t/001_basic.pl @@ -78,6 +78,7 @@ in master, before promotion ), 'tail-copy'); + RewindTest::clean_rewind_test(); } # Run the test in both modes diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl index 1cf9a3a..be1e194 100644 --- a/src/bin/pg_rewind/t/002_databases.pl +++ b/src/bin/pg_rewind/t/002_databases.pl @@ -40,6 +40,7 @@ standby_afterpromotion ), 'database names'); + RewindTest::clean_rewind_test(); } # Run the test in both modes. diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl index 218b865..ed50659 100644 --- a/src/bin/pg_rewind/t/003_extrafiles.pl +++ b/src/bin/pg_rewind/t/003_extrafiles.pl @@ -62,6 +62,8 @@ sub run_test $test_master_datadir/tst_standby_dir/standby_subdir, $test_master_datadir/tst_standby_dir/standby_subdir/standby_file3], file lists match); + + RewindTest::clean_rewind_test(); } # Run the test in both modes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
On Wed, Apr 15, 2015 at 7:54 PM, Peter Eisentraut pete...@gmx.net wrote: On 4/14/15 8:32 PM, Peter Eisentraut wrote: Move pg_upgrade from contrib/ to src/bin/ Oh dear. It appears the buildfarm client code needs to be updated to support this. How do we do this? (I guess the critters that are still green are not running this test.) Argh.. Looking at the buildfarm client code (https://github.com/PGBuildFarm/client-code) we are going to need to make the logic of TestUpgrade.pm aware that pg_upgrade is now in src/bin and not in contrib. A simple idea would be to use a Find to guess where pg_upgrade oath is located and then use the path found instead of hardcoding contrib/. Should I revert this patch while we sort this out? I think so, keeping the buildfarm red for a long time is no good... This is going to require an update of all the buildfarm machines, so taking the problem at root we had better improve the buildfarm code, get it deployed on a maximum of machines, and then have this patch pushed. It doesn't prevent the other patches in the src/bin to be pushed first at least, only pg_upgrade has a specific test case. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FPW compression leaks information
On Wed, Apr 15, 2015 at 9:20 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Apr 15, 2015 at 2:22 PM, Fujii Masao wrote: On Wed, Apr 15, 2015 at 11:55 AM, Michael Paquier wrote: 1) Doc patch to mention that it is possible that compression can give hints to attackers when working on sensible fields that have a non-fixed size. I think that this patch is enough as the first step. I'll get something done for that at least, a big warning below the description of wal_compression would do it. 2) Switch at relation level to control wal_compression. ALTER TABLE SET is not allowed on system catalog like pg_authid. So should we change it so that a user can change the flag even on system catalog? I'm afraid that the change might cause another problem, though. Probably we can disable the compression on every system catalogs by default. But I can imagine that someone wants to enable the compression even on system catalog. For example, pg_largeobject may cause lots of FPW. We could enforce a value directly in pg_class.h for only pg_authid if we think that it is a problem that bad, and rely on the default system value for the rest. That's a hacky-ugly approach though... Something else that I recalled and has not yet been mentioned on this thread. Even if the server-wide wal_compression is off, any user can change its value because it is PGC_USERSET, hence I think that we had better make it at least PGC_SUSET. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 15 April 2015 at 08:04, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 15, 2015 at 3:37 AM, Simon Riggs si...@2ndquadrant.com wrote: On 14 April 2015 at 21:53, Robert Haas robertmh...@gmail.com wrote: Peter commented previously that README.HOT should get an update. The relevant section seems to be When can/should we prune or defragment?. That's easy enough to change once we agree to commit. I wonder if it would be a useful heuristic to still prune pages if those pages are already dirty. Useful for who? This is about responsibility. Why should someone performing a large SELECT take the responsibility for cleaning pages? Because it makes it subsequent accesses to the page cheaper. Cheaper for whom? Of course, that applies in all cases, but when the page is already dirty, the cost of pruning it is probably quite small - we're going to have to write the page anyway, and pruning it before it gets evicted (perhaps even by our scan) will be cheaper than writing it now and writing it again after it's pruned. When the page is clean, the cost of pruning is significantly higher. We aren't going to have to write the page, but someone will. In a single workload, the mix of actions can be useful. In separate workloads, where some guy just wants to run a report or a backup, its not right that we slow them down because of someone else's actions. I won't take responsibility for paying my neighbor's tax bill, but I might take responsibility for picking up his mail while he's on holiday. That makes it sound like this is an occasional, non-annoying thing. It's more like, whoever fetches the mail needs to fetch it for everybody. So we are slowing down one person disproportionately, while others fly through without penalty. There is no argument that one workload necessarily needs to perform that on behalf of the other workload. The actions you suggest are reasonable and should ideally be the role of a background process. But that doesn't mean in the absence of that we should pay the cost in the foreground. Let me apply this patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Attached explain_forein_join.patch adds capability to show join combination of a ForeignScan in EXPLAIN output as an additional item “Relations”. I thought that using array to list relations is a good way too, but I chose one string value because users would like to know order and type of joins too. A bit different from my expectation... I expected to display name of the local foreign tables (and its alias), not remote one, because all the local join logic displays local foreign tables name. Is it easy to adjust isn't it? Probably, all you need to do is, putting a local relation name on the text buffer (at deparseSelectSql) instead of the deparsed remote relation. Oops, that’s right. Attached is the revised version. I chose fully qualified name, schema.relname [alias] for the output. It would waste some cycles during planning if that is not for EXPLAIN, but it seems difficult to get a list of name of relations in ExplainForeignScan() phase, because planning information has gone away at that time. I understand. Private data structure of the postgres_fdw is not designed to keep tree structure data (like relations join tree), so it seems to me a straightforward way to implement the feature. I have a small suggestion. This patch makes deparseSelectSql initialize the StringInfoData if supplied, however, it usually shall be a task of function caller, not callee. In this case, I like to initStringInfo(relations) next to the line of initStingInfo(sql) on the postgresGetForeignPlan. In my sense, it is a bit strange to pass uninitialized StringInfoData, to get a text form. @@ -803,7 +806,7 @@ postgresGetForeignPlan(PlannerInfo *root, */ initStringInfo(sql); deparseSelectSql(sql, root, baserel, fpinfo-attrs_used, remote_conds, -params_list, fdw_ps_tlist, retrieved_attrs); +params_list, fdw_ps_tlist, retrieved_attrs, relations); /* * Build the fdw_private list that will be available in the executor. Also, could you merge the EXPLAIN output feature on the main patch? I think here is no reason why to split this feature. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru HANADA Sent: Tuesday, April 14, 2015 7:49 PM To: Kaigai Kouhei(海外 浩平) Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown; pgsql-hackers@postgreSQL.org Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) KaiGai-san, 2015/04/14 14:04、Kouhei Kaigai kai...@ak.jp.nec.com のメール: * Fix typos Please review the v11 patch, and mark it as “ready for committer” if it’s ok. It's OK for me, and wants to be reviewed by other people to get it committed. Thanks! In addition to essential features, I tried to implement relation listing in EXPLAIN output. Attached explain_forein_join.patch adds capability to show join combination of a ForeignScan in EXPLAIN output as an additional item “Relations”. I thought that using array to list relations is a good way too, but I chose one string value because users would like to know order and type of joins too. A bit different from my expectation... I expected to display name of the local foreign tables (and its alias), not remote one, because all the local join logic displays local foreign tables name. Is it easy to adjust isn't it? Probably, all you need to do is, putting a local relation name on the text buffer (at deparseSelectSql) instead of the deparsed remote relation. Oops, that’s right. Attached is the revised version. I chose fully qualified name, schema.relname [alias] for the output. It would waste some cycles during planning if that is not for EXPLAIN, but it seems difficult to get a list of name of relations in ExplainForeignScan() phase, because planning information has gone away at that time. -- Shigeru HANADA shigeru.han...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 15, 2015 at 10:07 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 15, 2015 at 12:15 AM, Amit Kapila amit.kapil...@gmail.com wrote: IIUC, this will allow us to increase usage count only when the buffer is touched by clocksweep to decrement the usage count. Yes. I think such a solution will be good for the cases when many evictions needs to be performed to satisfy the workload, OTOH when there are not too many evictions that needs to be done, in such a case some of the buffers that are accessed much more will have equal probability to get evicted as compare to buffers which are less accessed. Possibly, but I think it's even worse under the current algorithm. Under this proposal, if we go for a long time without any buffer evictions, every buffer usage's count will top out at 2 more than wherever it was after the last clock sweep. In the worst case, every buffer (or most of them) could end up with the same usage count. But under the status quo, they'll all go to 5, which is an even bigger loss of information, and which will make the first eviction much more expensive than if they are all pegged at 2 or 3. Okay, got your point. On further thinking on this idea, it occurred to me that with this idea you are trying to reduce the clock-sweep time which in-turn will inturn improve the chances of finding usable buffer in less time under high pressure, if that is true then I think we should have seen the similar gain even with the bgreclaimer idea which I have tried sometime back, because that has also reduced the time for clock-sweep. There could be ways to improve things further, such as by slowly advancing the clock sweep even when no eviction is required. But that's a bit tricky to do, too. You'd like (perhaps) to advance it one step for every buffer allocation, but you don't want a centralized counter, or unnecessary contention on the clock sweep when no eviction is necessary in the first place. There's probably some way to make it work, though, if we put our mind to it. I'm inclined to try the simpler approach first and see what it buys. Sure, I think it is worth a try. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 15 April 2015 at 09:10, Andres Freund and...@anarazel.de wrote: On 2015-04-15 08:42:33 -0400, Simon Riggs wrote: Because it makes it subsequent accesses to the page cheaper. Cheaper for whom? Everyone. I think what you mean is Everyone else. It is demonstrably quicker and more consistent for a process when it limits the amount of pruning it does, as well as the fact that it causes additional WAL traffic when it does so, causing replication lag. I love it when someone cleans up for me. I just don't think they'll accept the argument that they should clean up for me because it makes their life easier. Certainly doesn't work with my kids. I don't really see the downside to this suggestion. The suggestion makes things better than they are now but is still less than I have proposed. If what you both mean is IMHO this is an acceptable compromise, I can accept it also, at this point in the CF. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On Wed, Apr 15, 2015 at 3:37 AM, Simon Riggs si...@2ndquadrant.com wrote: On 14 April 2015 at 21:53, Robert Haas robertmh...@gmail.com wrote: Peter commented previously that README.HOT should get an update. The relevant section seems to be When can/should we prune or defragment?. That's easy enough to change once we agree to commit. I wonder if it would be a useful heuristic to still prune pages if those pages are already dirty. Useful for who? This is about responsibility. Why should someone performing a large SELECT take the responsibility for cleaning pages? Because it makes it subsequent accesses to the page cheaper. Of course, that applies in all cases, but when the page is already dirty, the cost of pruning it is probably quite small - we're going to have to write the page anyway, and pruning it before it gets evicted (perhaps even by our scan) will be cheaper than writing it now and writing it again after it's pruned. When the page is clean, the cost of pruning is significantly higher. I won't take responsibility for paying my neighbor's tax bill, but I might take responsibility for picking up his mail while he's on holiday. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FPW compression leaks information
On Wed, Apr 15, 2015 at 2:22 PM, Fujii Masao wrote: On Wed, Apr 15, 2015 at 11:55 AM, Michael Paquier wrote: 1) Doc patch to mention that it is possible that compression can give hints to attackers when working on sensible fields that have a non-fixed size. I think that this patch is enough as the first step. I'll get something done for that at least, a big warning below the description of wal_compression would do it. 2) Switch at relation level to control wal_compression. ALTER TABLE SET is not allowed on system catalog like pg_authid. So should we change it so that a user can change the flag even on system catalog? I'm afraid that the change might cause another problem, though. Probably we can disable the compression on every system catalogs by default. But I can imagine that someone wants to enable the compression even on system catalog. For example, pg_largeobject may cause lots of FPW. We could enforce a value directly in pg_class.h for only pg_authid if we think that it is a problem that bad, and rely on the default system value for the rest. That's a hacky-ugly approach though... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 2015-04-15 08:42:33 -0400, Simon Riggs wrote: Because it makes it subsequent accesses to the page cheaper. Cheaper for whom? Everyone. Including further readers. Following HOT chains in read mostly workloads can be really expensive. If you have workloads with a 'hot' value range that's frequently updated, but that range moves you can easily end up with heavily chained tuples which won't soon be touched by a writer again. And writers will often not yet be able to prune the page because there's still live readers for the older versions (like other updaters). Of course, that applies in all cases, but when the page is already dirty, the cost of pruning it is probably quite small - we're going to have to write the page anyway, and pruning it before it gets evicted (perhaps even by our scan) will be cheaper than writing it now and writing it again after it's pruned. When the page is clean, the cost of pruning is significantly higher. We aren't going to have to write the page, but someone will. If it's already dirty that doesn't change at all. *Not* pruning in that moment actually will often *increase* the total amount of writes to the OS. Because now the pruning will happen on the next write access or vacuum - when the page already might have been undirtied. I don't really see the downside to this suggestion. The actions you suggest are reasonable and should ideally be the role of a background process. But that doesn't mean in the absence of that we should pay the cost in the foreground. I'm not sure that's true. A background process will either cause additional read IO to find worthwhile pages, or it'll not find worthwhile pages because they're already paged out. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
On 04/15/2015 07:51 AM, Peter Geoghegan wrote: +heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict) +{ + if (!conflict) + { + /* +* Update the tuple in-place, in the common case where no conflict was +* detected during speculative insertion. +* +* When heap_insert is called in respect of a speculative tuple, the +* page will actually have a tuple inserted. However, during recovery +* replay will add an all-zero tuple to the page instead, which is the +* same length as the original (but the tuple header is still WAL +* logged and will still be restored at that point). If and when the +* in-place update record corresponding to releasing a value lock is +* replayed, crash recovery takes the final tuple value from there. +* Thus, speculative heap records require two WAL records. +* +* Logical decoding interprets an in-place update associated with a +* speculative insertion as a regular insert change. In other words, +* the in-place record generated affirms that a speculative insertion +* completed successfully. +*/ + heap_inplace_update(relation, tuple); + } + else + { That's a bizarre solution. May I suggest a much simpler one: Make the heap-insert record the same for normal and speculative insertions, except for a flag that's set if it's a speculative one. Replay as usual. When the speculative insertion is finished, write a new kind of a WAL record for that. The record only needs to contain the ctid of the tuple. Replaying that record will clear the flag on the heap tuple that said that it was a speculative insertion. In logical decoding, decode speculative insertions like any other insertion. To decode a super-deletion record, scan the reorder buffer for the transaction to find the corresponding speculative insertion record for the tuple, and remove it. BTW, that'd work just as well without the new WAL record to finish a speculative insertion. Am I missing something? --- a/src/include/storage/off.h +++ b/src/include/storage/off.h @@ -26,6 +26,7 @@ typedef uint16 OffsetNumber; #define InvalidOffsetNumber((OffsetNumber) 0) #define FirstOffsetNumber ((OffsetNumber) 1) #define MaxOffsetNumber((OffsetNumber) (BLCKSZ / sizeof(ItemIdData))) +#define MagicOffsetNumber (MaxOffsetNumber + 1) #define OffsetNumberMask (0x)/* valid uint16 bits */ IMHO it would be nicer if the magic value was more constant, e.g. 0x or 0xfffe, instead of basing it on MaxOffsetNumber which depends on block size. I would rather not include MaxOffsetNumber of anything derived from it in the on-disk dormat. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes break the suppression of do-nothing checkpoints.
On 03/31/2015 09:09 PM, Jeff Janes wrote: On Tue, Mar 31, 2015 at 6:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 03/30/2015 09:01 PM, Jeff Janes wrote: commit 2c03216d831160bedd72d45f7 has invalidated the part of the docs saying If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout has passed, presumably by accident. It seems that this part is no longer true when it should be true: if (curInsert == ControlFile-checkPoint + MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint) is now 96, but the amount by which curInsert gets advanced is still 104, like it was before the commit. Hmm. Wasn't this a bit broken before too, when the checkpoint record crosses a page boundary? But once the next one it was written, it wouldn't across a boundary anymore. So you would get one extra checkpoint, rather than an infinite stream of them. So a bit broken, but just a bit. Yeah. It's not serious. It occurs to me that that's actually nice from a robustness point of view: if something goes badly wrong and corrupt, you're more likely to be able to recover the last checkpoint record if it doesn't cross a page boundary. And more importantly, a segment boundary. OTOH, when you create a spurious checkpoint, you lose the previous checkpoint location. A good compromise is probably to not skip the checkpoint if the last one crossed a segment boundary. (Committed that way) Instead of trying to calculate where the checkpoint record ends, I think we could check that the prev-pointer points to the last checkpoint record. Wouldn't doing that involve reading WAL while not in recovery? No. We keep track of the beginning position of the last inserted record at all times, because when the next record is inserted, we have to put that in the xl_prev field. Committed a patch to do that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
Simon Riggs wrote: On 15 April 2015 at 09:10, Andres Freund and...@anarazel.de wrote: I don't really see the downside to this suggestion. The suggestion makes things better than they are now but is still less than I have proposed. If what you both mean is IMHO this is an acceptable compromise, I can accept it also, at this point in the CF. Let me see if I understand things. What we have now is: when reading a page, we also HOT-clean it. This runs HOT-cleanup a large number of times, and causes many pages to become dirty. Your patch is when reading a page, HOT-clean it, but only 5 times in each scan. This runs HOT-cleanup at most 5 times, and causes at most 5 pages to become dirty. Robert's proposal is when reading a page, if dirty HOT-clean it; if not dirty, also HOT-clean it but only 5 times in each scan. This runs HOT-cleanup some number of times (as many as there are dirty), and causes at most 5 pages to become dirty. Am I right in thinking that HOT-clean in a dirty page is something that runs completely within CPU cache? If so, it would be damn fast and would have benefits for future readers, for very little cost. Dirtying a page is very different; if buffer reads are common, the system is later bogged down trying to find clean pages to read uncached buffers (including the read-only scan itself, so it becomes slower.) If I have understood things correctly, then I stand behind Robert's suggestion. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
On 04/15/2015 10:51 AM, Andrew Dunstan wrote: On 04/15/2015 08:31 AM, Michael Paquier wrote: On Wed, Apr 15, 2015 at 7:54 PM, Peter Eisentraut pete...@gmx.net wrote: On 4/14/15 8:32 PM, Peter Eisentraut wrote: Move pg_upgrade from contrib/ to src/bin/ Oh dear. It appears the buildfarm client code needs to be updated to support this. How do we do this? (I guess the critters that are still green are not running this test.) Argh.. Looking at the buildfarm client code (https://github.com/PGBuildFarm/client-code) we are going to need to make the logic of TestUpgrade.pm aware that pg_upgrade is now in src/bin and not in contrib. A simple idea would be to use a Find to guess where pg_upgrade oath is located and then use the path found instead of hardcoding contrib/. I'm testing a fix. I just happened to check the status this morning. It would have been nice to have had a heads up. The Find idea is cute, I might have used it if I hadn't already coded a fix. OK, the fix is here: https://github.com/PGBuildFarm/client-code/commit/47b24efa758360699413640dd14c4096e1643fb2 and the new TestUpgrade.pm can be grabbed from here: https://raw.githubusercontent.com/PGBuildFarm/client-code/47b24efa758360699413640dd14c4096e1643fb2/PGBuild/Modules/TestUpgrade.pm It can just be dropped into place as a hot fix. We're overdue for a buildfarm client release, but this hot fix should get things green again. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 04/15/2015 05:44 PM, Alvaro Herrera wrote: Simon Riggs wrote: On 15 April 2015 at 09:10, Andres Freund and...@anarazel.de wrote: I don't really see the downside to this suggestion. The suggestion makes things better than they are now but is still less than I have proposed. If what you both mean is IMHO this is an acceptable compromise, I can accept it also, at this point in the CF. Let me see if I understand things. What we have now is: when reading a page, we also HOT-clean it. This runs HOT-cleanup a large number of times, and causes many pages to become dirty. Your patch is when reading a page, HOT-clean it, but only 5 times in each scan. This runs HOT-cleanup at most 5 times, and causes at most 5 pages to become dirty. Robert's proposal is when reading a page, if dirty HOT-clean it; if not dirty, also HOT-clean it but only 5 times in each scan. This runs HOT-cleanup some number of times (as many as there are dirty), and causes at most 5 pages to become dirty. Am I right in thinking that HOT-clean in a dirty page is something that runs completely within CPU cache? If so, it would be damn fast and would have benefits for future readers, for very little cost. If there are many tuples on the page, it takes some CPU effort to scan all the HOT chains and move tuples around. Also, it creates a WAL record, which isn't free. Another question is whether the patch can reliably detect whether it's doing a read-only scan or not. I haven't tested, but I suspect it'd not do pruning when you do something like INSERT INTO foo SELECT * FROM foo WHERE blah. I.e. when the target relation is referenced twice in the same statement: once as the target, and second time as a source. Maybe that's OK, though. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote: On 04/15/2015 07:51 AM, Peter Geoghegan wrote: +heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict) +{ +if (!conflict) +{ +/* + * Update the tuple in-place, in the common case where no conflict was + * detected during speculative insertion. + * + * When heap_insert is called in respect of a speculative tuple, the + * page will actually have a tuple inserted. However, during recovery + * replay will add an all-zero tuple to the page instead, which is the + * same length as the original (but the tuple header is still WAL + * logged and will still be restored at that point). If and when the + * in-place update record corresponding to releasing a value lock is + * replayed, crash recovery takes the final tuple value from there. + * Thus, speculative heap records require two WAL records. + * + * Logical decoding interprets an in-place update associated with a + * speculative insertion as a regular insert change. In other words, + * the in-place record generated affirms that a speculative insertion + * completed successfully. + */ +heap_inplace_update(relation, tuple); +} +else +{ That's a bizarre solution. I tend to agree, but for different reasons. In logical decoding, decode speculative insertions like any other insertion. To decode a super-deletion record, scan the reorder buffer for the transaction to find the corresponding speculative insertion record for the tuple, and remove it. Not that easy. That buffer is spilled to disk and such. As discussed. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)
On Wed, Apr 15, 2015 at 10:52 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Wed, Apr 15, 2015 at 8:57 AM, David Steele da...@pgmasters.net wrote: On 4/14/15 7:13 PM, Tatsuo Ishii wrote: This patch does not apply cleanly due to the moving of pgbench (patch to filelist.sgml failed). Thank you for pointing that out! Ironic that it was the commit directly after the one I was testing with that broke the patch. It appears the end of the last CF is a very bad time to be behind HEAD. Fixed in attached v8 patch. Thank you for updating the patch! I applied the patch and complied them successfully without WARNING. I tested v8 patch with CURSOR case I mentioned before, and got segmentation fault again. Here are log messages in my environment, =# select test(); LOG: server process (PID 29730) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: select test(); LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: the database system is in recovery mode I hope that these messages helps you to address this problem. I will also try to address this. Regards, --- Sawada Masahiko I will also try to address this. I investigated this problem and inform you my result here. When we execute test() function I mentioned, the three stack items in total are stored into auditEventStack. The two of them are freed by stack_pop() - stack_free() (i.g, stack_free() is called by stack_pop()). One of them is freed by PortalDrop() - .. - MemoryContextDeleteChildren() - ... - stack_free(). And it is freed at the same time as deleting pg_audit memory context, and stack will be completely empty. But after freeing all items, finish_xact_command() function could call PortalDrop(), so ExecutorEnd() function could be called again. pg_audit has ExecutorEnd_hook, so postgres tries to free that item.. SEGV. In my environment, the following change fixes it. --- pg_audit.c.org2015-04-15 14:21:07.541866525 +0900 +++ pg_audit.c2015-04-15 11:36:53.758877339 +0900 @@ -1291,7 +1291,7 @@ standard_ExecutorEnd(queryDesc); /* Pop the audit event off the stack */ -if (!internalStatement) +if (!internalStatement auditEventStack != NULL) { stack_pop(auditEventStack-stackId); } It might be good idea to add Assert() at before calling stack_pop(). I'm not sure this change is exactly correct, but I hope this information helps you. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
On 04/15/2015 08:31 AM, Michael Paquier wrote: On Wed, Apr 15, 2015 at 7:54 PM, Peter Eisentraut pete...@gmx.net wrote: On 4/14/15 8:32 PM, Peter Eisentraut wrote: Move pg_upgrade from contrib/ to src/bin/ Oh dear. It appears the buildfarm client code needs to be updated to support this. How do we do this? (I guess the critters that are still green are not running this test.) Argh.. Looking at the buildfarm client code (https://github.com/PGBuildFarm/client-code) we are going to need to make the logic of TestUpgrade.pm aware that pg_upgrade is now in src/bin and not in contrib. A simple idea would be to use a Find to guess where pg_upgrade oath is located and then use the path found instead of hardcoding contrib/. I'm testing a fix. I just happened to check the status this morning. It would have been nice to have had a heads up. The Find idea is cute, I might have used it if I hadn't already coded a fix. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
On 04/15/2015 06:01 PM, Andres Freund wrote: On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote: On 04/15/2015 07:51 AM, Peter Geoghegan wrote: +heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict) +{ + if (!conflict) + { + /* +* Update the tuple in-place, in the common case where no conflict was +* detected during speculative insertion. +* +* When heap_insert is called in respect of a speculative tuple, the +* page will actually have a tuple inserted. However, during recovery +* replay will add an all-zero tuple to the page instead, which is the +* same length as the original (but the tuple header is still WAL +* logged and will still be restored at that point). If and when the +* in-place update record corresponding to releasing a value lock is +* replayed, crash recovery takes the final tuple value from there. +* Thus, speculative heap records require two WAL records. +* +* Logical decoding interprets an in-place update associated with a +* speculative insertion as a regular insert change. In other words, +* the in-place record generated affirms that a speculative insertion +* completed successfully. +*/ + heap_inplace_update(relation, tuple); + } + else + { That's a bizarre solution. I tend to agree, but for different reasons. In logical decoding, decode speculative insertions like any other insertion. To decode a super-deletion record, scan the reorder buffer for the transaction to find the corresponding speculative insertion record for the tuple, and remove it. Not that easy. That buffer is spilled to disk and such. As discussed. Hmm, ok, I've read the INSERT ... ON CONFLICT UPDATE and logical decoding thread now, and I have to say that IMHO it's a lot more sane to handle this in ReorderBufferCommit() like Peter first did, than to make the main insertion path more complex like this. Another idea is to never spill the latest record to disk, at least if it was a speculative insertion. Then you would be sure that when you see the super-deletion record, the speculative insertion it refers to is still in memory. That seems simple. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP tests of pg_rewind not stopping servers used for the tests
On 04/15/2015 03:07 PM, Michael Paquier wrote: Hi all, In the refactoring of pg_rewind tests committed as 53ba107, it happens that some of the servers used for the tests are not stopped at the end of the test. The issue is that RewindTest.pm uses END to stop the servers, but now that the local and remote tests are grouped half of the servers are not stopped. The reason why I did not notice that before is because as pg_rewind uses local Unix socket to work on the nodes, the tests are not failing. Sorry about that, that's my fault. And attached is a patch fixing the issue: it replaces END by a cleanup function called at the end of each local/remote test to be sure that the servers are shut down. Thanks, committed. I kept the END block, though, so that we still clean up if the test dies with an exception. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bugs in CreateCheckPoint
hi, To avoid inserting duplicate checkpoints when the system is idle, the code of CreateCheckPoint make two tests to determine that nothing has happened since the start of the last checkpoint. But because the layout of XLOG record was changed, the condition would never to be true. The attachment is the patch to fix it, thanks. Zhang Zq repeatly-checkpoint.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
Heikki Linnakangas wrote: On 04/15/2015 05:44 PM, Alvaro Herrera wrote: Robert's proposal is when reading a page, if dirty HOT-clean it; if not dirty, also HOT-clean it but only 5 times in each scan. This runs HOT-cleanup some number of times (as many as there are dirty), and causes at most 5 pages to become dirty. Am I right in thinking that HOT-clean in a dirty page is something that runs completely within CPU cache? If so, it would be damn fast and would have benefits for future readers, for very little cost. If there are many tuples on the page, it takes some CPU effort to scan all the HOT chains and move tuples around. Also, it creates a WAL record, which isn't free. But if the page is in CPU cache, the CPU effort shouldn't be all that noticeable, should it? That's my point, but then maybe I'm wrong. Now, the WAL logging is annoying, so let's limit that too -- do it at most for, say, 20 dirty pages and at most 5 clean pages. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On Wed, Apr 15, 2015 at 8:42 AM, Simon Riggs si...@2ndquadrant.com wrote: I won't take responsibility for paying my neighbor's tax bill, but I might take responsibility for picking up his mail while he's on holiday. That makes it sound like this is an occasional, non-annoying thing. It's more like, whoever fetches the mail needs to fetch it for everybody. So we are slowing down one person disproportionately, while others fly through without penalty. There is no argument that one workload necessarily needs to perform that on behalf of the other workload. Sure there is. It's called a tragedy of the commons - everybody acts in their own selfish interest (it's not *my* responsibility to limit grazing on public land, or prune this page that I'm not modifying) and as a result some resource that everybody cares about (grass, system-wide I/O) gets trashed to everyone's detriment. Purely selfish behavior can only be justified here if we assume that the selfish actor intends to participate in the system only once: I'm going to run one big reporting query which must run as fast as possible, and then I'm getting on a space ship to Mars. So if my refusal to do any pruning during that reporting query causes lots of extra I/O on the system ten minutes from now, I don't care, because I'll have left the playing field forever at that point. As Heikki points out, any HOT pruning operation generates WAL and has a CPU cost. However, pruning a page that is currently dirty *decreases* the total volume of writes to the data files, whereas pruning a page that is currently clean *increases* the total volume of writes to the data files. In the first case, if we prune the page right now while it's still dirty, we can't possibly cause any additional data-file writes, and we may save one, because it's possible that someone else would later have pruned it when it was clean and there was no other reason to dirty it. In the second case, if we prune the page that is currently clean, it will become dirty. That will cost us no additional I/O if the page is again modified before it's written out, but otherwise it costs an additional data file write. I think there's a big difference between those two cases. Sure, from the narrow point of view of how much work it takes this scan to process this page, it's always better not to prune. But if you make the more realistic assumption that you will keep on issuing queries on the system, then what you're doing to the overall system I/O load is pretty important. By the way, was anything ever done about this: http://www.postgresql.org/message-id/20140929091343.ga4...@alap3.anarazel.de That's just a workload that is 5/6th pgbench -S and 1/6th pgbench, which is in no way an unrealistic workload, and showed a significant regression with an earlier version of the patch. You seem very eager to commit this patch after four months of inactivity, but I think this is a pretty massive behavior change that deserves careful scrutiny before it goes in. If we push something that changes longstanding behavior and can't even be turned off, and it regresses behavior for a use case that common, our users are going to come after us with pitchforks. That's not to say some people won't be happy, but in my experience it takes a lot of happy users to make up for getting stabbed with even one pitchfork. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Assert there is no duplicated exit callbacks
Attached is a patch to assert there is no duplicated exit callbacks. Along the way, I downgrade the runtime enough room check to an assertion: the callbacks are registered in pretty fixed initialization code path, thus assertion is good enough to prevent overlook there. If we don't agree with this, then the duplication check shall also bump to runtime checks. Regards, Qingqing diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c index 6bc0b06..bfc7764 100644 --- a/src/backend/storage/ipc/ipc.c +++ b/src/backend/storage/ipc/ipc.c @@ -291,10 +291,12 @@ atexit_callback(void) void on_proc_exit(pg_on_exit_callback function, Datum arg) { - if (on_proc_exit_index = MAX_ON_EXITS) - ereport(FATAL, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), -errmsg_internal(out of on_proc_exit slots))); + int i; + + /* check no duplicates and there are rooms */ + for (i = 0; i on_proc_exit_index; i++) + Assert (on_proc_exit_list[i] != function); + Assert (on_proc_exit_index MAX_ON_EXITS); on_proc_exit_list[on_proc_exit_index].function = function; on_proc_exit_list[on_proc_exit_index].arg = arg; @@ -319,10 +321,12 @@ on_proc_exit(pg_on_exit_callback function, Datum arg) void before_shmem_exit(pg_on_exit_callback function, Datum arg) { - if (before_shmem_exit_index = MAX_ON_EXITS) - ereport(FATAL, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), -errmsg_internal(out of before_shmem_exit slots))); + int i; + + /* check no duplicated callbacks and there are rooms */ + for (i = 0; i before_shmem_exit_index; i++) + Assert (before_shmem_exit_list[i] != function); + Assert (before_shmem_exit_index MAX_ON_EXITS); before_shmem_exit_list[before_shmem_exit_index].function = function; before_shmem_exit_list[before_shmem_exit_index].arg = arg; @@ -347,10 +351,12 @@ before_shmem_exit(pg_on_exit_callback function, Datum arg) void on_shmem_exit(pg_on_exit_callback function, Datum arg) { - if (on_shmem_exit_index = MAX_ON_EXITS) - ereport(FATAL, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), -errmsg_internal(out of on_shmem_exit slots))); + int i; + + /* check no duplicates and there are rooms */ + for (i = 0; i on_shmem_exit_index; i++) + Assert (on_shmem_exit_list[i] != function); + Assert (on_shmem_exit_index MAX_ON_EXITS); on_shmem_exit_list[on_shmem_exit_index].function = function; on_shmem_exit_list[on_shmem_exit_index].arg = arg; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 04/15/2015 07:11 PM, Alvaro Herrera wrote: Heikki Linnakangas wrote: On 04/15/2015 05:44 PM, Alvaro Herrera wrote: Robert's proposal is when reading a page, if dirty HOT-clean it; if not dirty, also HOT-clean it but only 5 times in each scan. This runs HOT-cleanup some number of times (as many as there are dirty), and causes at most 5 pages to become dirty. Am I right in thinking that HOT-clean in a dirty page is something that runs completely within CPU cache? If so, it would be damn fast and would have benefits for future readers, for very little cost. If there are many tuples on the page, it takes some CPU effort to scan all the HOT chains and move tuples around. Also, it creates a WAL record, which isn't free. But if the page is in CPU cache, the CPU effort shouldn't be all that noticeable, should it? That's my point, but then maybe I'm wrong. Now, the WAL logging is annoying, so let's limit that too -- do it at most for, say, 20 dirty pages and at most 5 clean pages. There isn't much difference between that and just doing it on first 5 pages. Both of those numbers were pulled out of thin air, anyway. I'd rather just keep it simple. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert there is no duplicated exit callbacks
湏〠⼴㔱㈯‵㜰㔺′䵐楑杮楱杮娠潨⁵牷瑯㩥㸊䄠瑴捡敨獩愠瀠瑡档琠獡敳瑲琠敨敲椠潮搠灵楬慣整硥瑩挠污扬捡獫ਮ䤊搠湯琧爠浥浥敢桴瑡攠敶敢湥愠瀠潲汢浥ਮ㸊䄠潬杮琠敨眠祡⁉潤湷牧摡桴畲瑮浩攢潮杵潲浯•档捥潴愠੮‾獡敳瑲潩㩮琠敨挠污扬捡獫愠敲爠来獩整敲湩瀠敲瑴⁹楦數湩瑩慩楬慺楴湯㸊挠摯慰桴桴獵愠獳牥楴湯椠潧摯攠潮杵潴瀠敲敶瑮漠敶汲潯桴牥䤠੦‾敷搠湯琧愠牧敥眠瑩桴獩桴湥琠敨搠灵楬慣楴湯挠敨正猠慨汬愠獬畢灭㸊琠畲瑮浩档捥獫ਮ䤊搠湯琧琠楨歮愠祮漠桴獩椠敮摥摥ਮⴊ䠠楥歫੩ਊⴊ匊湥⁴楶杰煳慨正牥慭汩湩楬瑳⠠杰煳慨正牥䁳潰瑳牧獥汱漮杲潔洠歡档湡敧潴礠畯畳獢牣灩楴湯瑨灴⼺眯睷瀮獯杴敲煳牯⽧慭汩牰晥瀯獧汱栭捡敫獲
Re: [HACKERS] Bugs in CreateCheckPoint
On 04/15/2015 07:02 PM, Zhang Zq wrote: hi, To avoid inserting duplicate checkpoints when the system is idle, the code of CreateCheckPoint make two tests to determine that nothing has happened since the start of the last checkpoint. But because the layout of XLOG record was changed, the condition would never to be true. The attachment is the patch to fix it, thanks. Jeff Janes reported this a while ago, and as it happens, I committed a fix earlier today. (See http://www.postgresql.org/message-id/552e7611.1000...@iki.fi) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
Andrew Dunstan wrote: OK, the fix is here: https://github.com/PGBuildFarm/client-code/commit/47b24efa758360699413640dd14c4096e1643fb2 and the new TestUpgrade.pm can be grabbed from here: https://raw.githubusercontent.com/PGBuildFarm/client-code/47b24efa758360699413640dd14c4096e1643fb2/PGBuild/Modules/TestUpgrade.pm It can just be dropped into place as a hot fix. We're overdue for a buildfarm client release, but this hot fix should get things green again. There's no way we're going to have *everyone* install the hot fix right away, though, no? Or is people really that quick with updating BF scripts? If not, I think it'd be better to revert, get a buildfarm release out, then push the patch again in a week or so. In the meantime surely we can push other contrib move patches ... Regarding the buildfarm update, don't we need a patch for MSVC and src/test/modules? I remember you (Andrew) and Tom hot-patched run_build.pm to run those tests, but of course it doesn't work as a complete release because MSVC doesn't support it. It would be awesome to get BF patched for both those issues before the next release. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert there is no duplicated exit callbacks
On Wed, Apr 15, 2015 at 11:14:43AM -0700, Qingqing Zhou wrote: Hmm, the email text seems corrupted? I think it was an encoding issue. Here's what I got when I piped it through less. Cheers, David. On 04/15/2015 07:52 PM, Qingqing Zhou wrote: Attached is a patch to assert there is no duplicated exit callbacks. I don't remember that ever been a problem. Along the way, I downgrade the runtime enough room check to an assertion: the callbacks are registered in pretty fixed initialization code path, thus assertion is good enough to prevent overlook there. If we don't agree with this, then the duplication check shall also bump to runtime checks. I don't think any of this is needed. - Heikki -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)
On 4/14/15 8:37 PM, Tatsuo Ishii wrote: BTW, in my understanding pg_audit allows to track a table access even if it's used in a view. I think this is a nice feature and it would be better explicitly stated in the document and the test case is better included in the regression test. Here is a sample session: CREATE TABLE test2 (id INT); CREATE VIEW vtest2 AS SELECT * FROM test2; GRANT SELECT ON TABLE public.test2 TO auditor; GRANT SELECT ON TABLE public.vtest2 TO auditor; SELECT * FROM vtest2; NOTICE: AUDIT: SESSION,1,1,READ,SELECT,,,SELECT * FROM vtest2; NOTICE: AUDIT: OBJECT,1,1,READ,SELECT,VIEW,public.vtest2,SELECT * FROM vtest2; NOTICE: AUDIT: OBJECT,1,1,READ,SELECT,TABLE,public.test2,SELECT * FROM vtest2; That's the idea! In the documentation I throw around the word relation pretty liberally, but you are right that some clarification would be helpful. I have added a few parenthetical statements to the docs that should make them clearer. I also took your suggestion and added a view regression test. Both are in patch v9 which I attached to my previous email on this thread. Thank you for taking the time to have a look. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Assert there is no duplicated exit callbacks
Hmm, the email text seems corrupted? Regards, Qingqing On Wed, Apr 15, 2015 at 10:03 AM, Heikki Linnakangas hlinn...@iki.fi wrote: 湏〠⼴㔱㈯‵㜰㔺′䵐楑杮楱杮娠潨⁵牷瑯㩥‾瑁慴档摥椠慰捴潴愠獳牥⁴ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] show xl_prev in xlog.c errcontext
I found this patch in my local repo that I wrote some weeks or months ago while debugging some XLog corruption problem: it was difficult to pinpoint what XLog record in a long sequence of WAL files was causing a problem, and the displaying the prev pointer in errcontext made finding it much easier -- I could correlate it with pg_xlogdump output, I think. Anyone sees a reason not to apply this or something like it? (As I recall, we eventually found out that the underlying issue was that two postmasters shared one data area over NFS and were overwriting one another's WAL files, or something like that. But this was some time ago so I could be mistaken that it's the same case.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services commit 835e7b3c62590574e529558493c50fbfb5a85d6c Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Tue Apr 14 12:41:11 2015 -0300 show prev in errcontext diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2580996..36dc422 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10389,7 +10389,10 @@ rm_redo_error_callback(void *arg) initStringInfo(buf); xlog_outdesc(buf, record); - errcontext(xlog redo %s, buf.data); + errcontext(xlog redo (prev %X/%X) %s, + (uint32) (XLogRecGetPrev(record) 32), + (uint32) (XLogRecGetPrev(record)), + buf.data); pfree(buf.data); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] reparsing query
Hi I knock once again with this : reparse query to XML ( last knock 5-6 years before) . Motivation: more info for front-end developers about query. Users of jdbc driver are not able to gather information from meta data about such simple query : select x.a,y.b from xxx x, yyy y Solution : modification backend/utils/adt/ruleutils.c I've done it few years ago, but with every new version of potgresql there is a need to modify library. I prepared version for 9.4.1, standalone, which could be used as library loaded by backend. I wish to have such functionality in main distro. Is anyone interested ? Andrzej Barszcz
Re: [HACKERS] reparsing query
On Wed, Apr 15, 2015 at 1:40 PM, Andrzej Barszcz abus...@gmail.com wrote: I knock once again with this : reparse query to XML ( last knock 5-6 years before) . What exactly reparse query to XML does? Regards, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
On 04/15/2015 12:23 PM, Alvaro Herrera wrote: Andrew Dunstan wrote: OK, the fix is here: https://github.com/PGBuildFarm/client-code/commit/47b24efa758360699413640dd14c4096e1643fb2 and the new TestUpgrade.pm can be grabbed from here: https://raw.githubusercontent.com/PGBuildFarm/client-code/47b24efa758360699413640dd14c4096e1643fb2/PGBuild/Modules/TestUpgrade.pm It can just be dropped into place as a hot fix. We're overdue for a buildfarm client release, but this hot fix should get things green again. There's no way we're going to have *everyone* install the hot fix right away, though, no? Or is people really that quick with updating BF scripts? If not, I think it'd be better to revert, get a buildfarm release out, then push the patch again in a week or so. In the meantime surely we can push other contrib move patches ... Regarding the buildfarm update, don't we need a patch for MSVC and src/test/modules? I remember you (Andrew) and Tom hot-patched run_build.pm to run those tests, but of course it doesn't work as a complete release because MSVC doesn't support it. It would be awesome to get BF patched for both those issues before the next release. Yes, we do want support for testmodules. I think that needs to be built into src/tools/msvc, probably in vcregress.pl. Once we have that the buildfarm changes will be trivial. But if that's not forthcoming quickly I can just avoid the step on msvc for now. We've handled the buildfarm being red for a few days before. People are usually good about applying fixes fairly quickly. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 15 April 2015 at 12:39, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 15, 2015 at 8:42 AM, Simon Riggs si...@2ndquadrant.com wrote: I won't take responsibility for paying my neighbor's tax bill, but I might take responsibility for picking up his mail while he's on holiday. That makes it sound like this is an occasional, non-annoying thing. It's more like, whoever fetches the mail needs to fetch it for everybody. So we are slowing down one person disproportionately, while others fly through without penalty. There is no argument that one workload necessarily needs to perform that on behalf of the other workload. Sure there is. It's called a tragedy of the commons - everybody acts in their own selfish interest (it's not *my* responsibility to limit grazing on public land, or prune this page that I'm not modifying) and as a result some resource that everybody cares about (grass, system-wide I/O) gets trashed to everyone's detriment. Purely selfish behavior can only be justified here if we assume that the selfish actor intends to participate in the system only once: I'm going to run one big reporting query which must run as fast as possible, and then I'm getting on a space ship to Mars. So if my refusal to do any pruning during that reporting query causes lots of extra I/O on the system ten minutes from now, I don't care, because I'll have left the playing field forever at that point. It all depends upon who is being selfish. Why is a user selfish for not wanting to clean every single block they scan, when the people that made the mess do nothing and go faster 10 minutes from now? Randomly and massively penalising large SELECTs makes no sense. Some cleanup is OK, with reasonable limits, which is why that is proposed. On 04/15/2015 05:44 PM, Alvaro Herrera wrote: Robert's proposal is when reading a page, if dirty HOT-clean it; if not dirty, also HOT-clean it but only 5 times in each scan. This runs HOT-cleanup some number of times (as many as there are dirty), and causes at most 5 pages to become dirty. My understanding of Robert's proposal was when reading a page, HOT-clean it, but only do this up to 5 times on clean pages, but continue to do this indefinitely when the page is already dirty.. Andres said that was the only way and I have agreed to it. Are you now saying not to commit your proposal at all? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
Simon Riggs wrote: On 15 April 2015 at 12:39, Robert Haas robertmh...@gmail.com wrote: On 04/15/2015 05:44 PM, Alvaro Herrera wrote: Robert's proposal is when reading a page, if dirty HOT-clean it; if not dirty, also HOT-clean it but only 5 times in each scan. This runs HOT-cleanup some number of times (as many as there are dirty), and causes at most 5 pages to become dirty. My understanding of Robert's proposal was when reading a page, HOT-clean it, but only do this up to 5 times on clean pages, but continue to do this indefinitely when the page is already dirty.. To me, both statements look identical. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] raw output from copy
On 4/10/15 5:26 PM, Pavel Stehule wrote: Hi I wrote a prototype of this patch, and it works well postgres=# set client_encoding to 'latin2'; SET Time: 1.488 ms postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format 'raw') COPY 1 Time: 1.108 ms postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format 'raw') ; ?xml version=1.0 encoding=LATIN2?xxpříliš žluťoučký kůň/xxTime: 1.000 ms I think you can get the same thing using regular psql output and just turning off all field and record separators and tuple headers and so on. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 15 April 2015 at 16:01, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Simon Riggs wrote: On 15 April 2015 at 12:39, Robert Haas robertmh...@gmail.com wrote: On 04/15/2015 05:44 PM, Alvaro Herrera wrote: Robert's proposal is when reading a page, if dirty HOT-clean it; if not dirty, also HOT-clean it but only 5 times in each scan. This runs HOT-cleanup some number of times (as many as there are dirty), and causes at most 5 pages to become dirty. My understanding of Robert's proposal was when reading a page, HOT-clean it, but only do this up to 5 times on clean pages, but continue to do this indefinitely when the page is already dirty.. To me, both statements look identical. I didn't read it that way, apologies for any confusion. But that is good news. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] raw output from copy
It would be nice, but it is not true. You can get correct non utf8 xml with encoding specification only when binary mode is used. Psql doesn't support binary mode. Regards Pavel Dne 15. 4. 2015 22:06 napsal uživatel Peter Eisentraut pete...@gmx.net: On 4/10/15 5:26 PM, Pavel Stehule wrote: Hi I wrote a prototype of this patch, and it works well postgres=# set client_encoding to 'latin2'; SET Time: 1.488 ms postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format 'raw') COPY 1 Time: 1.108 ms postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format 'raw') ; ?xml version=1.0 encoding=LATIN2?xxpříliš žluťoučký kůň/xxTime: 1.000 ms I think you can get the same thing using regular psql output and just turning off all field and record separators and tuple headers and so on.
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
Andrew Dunstan wrote: Yes, we do want support for testmodules. I think that needs to be built into src/tools/msvc, probably in vcregress.pl. Once we have that the buildfarm changes will be trivial. But if that's not forthcoming quickly I can just avoid the step on msvc for now. Well, *I* can't write the MSVC patch, but I can't find anyone to volunteer some time either :-( I think it's best to have MSVC skip the step and have the Makefile-based animals run it. That would give us a lot more coverage than currently, which is just two animals AFAIU. We've handled the buildfarm being red for a few days before. People are usually good about applying fixes fairly quickly. Okay. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Use outerPlanState() consistently in executor code
In executor context, outerPlanState(node) is the same as node-ss.ps.lefttree. We follow this in most places except a few. This patch clean up the outliers and might save us a few instructions by removing indirection. Most of changes are trivial. Except I take out an outerPlan nullable check in grouping iterator - as a it surely has a left child. I noticed that we mixed use node for plan node and plan state. While changing it can make code clear, but back patching could be terrible. Regards, Qingqing diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 9ff0eff..672825a 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -2054,10 +2054,11 @@ ExecReScanAgg(AggState *node) { ExprContext *econtext = node-ss.ps.ps_ExprContext; int aggno; + PlanState *outerPlan; node-agg_done = false; - node-ss.ps.ps_TupFromTlist = false; + outerPlan = outerPlanState(node); if (((Agg *) node-ss.ps.plan)-aggstrategy == AGG_HASHED) { @@ -2075,7 +2076,7 @@ ExecReScanAgg(AggState *node) * parameter changes, then we can just rescan the existing hash table; * no need to build it again. */ - if (node-ss.ps.lefttree-chgParam == NULL) + if (outerPlan-chgParam == NULL) { ResetTupleHashIterator(node-hashtable, node-hashiter); return; @@ -2133,8 +2134,8 @@ ExecReScanAgg(AggState *node) * if chgParam of subnode is not null then plan will be re-scanned by * first ExecProcNode. */ - if (node-ss.ps.lefttree-chgParam == NULL) - ExecReScan(node-ss.ps.lefttree); + if (outerPlan-chgParam == NULL) + ExecReScan(outerPlan); } diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c index 8ea8b9f..6502841 100644 --- a/src/backend/executor/nodeBitmapHeapscan.c +++ b/src/backend/executor/nodeBitmapHeapscan.c @@ -449,6 +449,8 @@ ExecBitmapHeapScan(BitmapHeapScanState *node) void ExecReScanBitmapHeapScan(BitmapHeapScanState *node) { + PlanState *outerPlan; + /* rescan to release any page pin */ heap_rescan(node-ss.ss_currentScanDesc, NULL); @@ -469,8 +471,9 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node) * if chgParam of subnode is not null then plan will be re-scanned by * first ExecProcNode. */ - if (node-ss.ps.lefttree-chgParam == NULL) - ExecReScan(node-ss.ps.lefttree); + outerPlan = outerPlanState(node); + if (outerPlan-chgParam == NULL) + ExecReScan(outerPlan); } /* diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c index 83d562e..b0d5442 100644 --- a/src/backend/executor/nodeGroup.c +++ b/src/backend/executor/nodeGroup.c @@ -280,6 +280,8 @@ ExecEndGroup(GroupState *node) void ExecReScanGroup(GroupState *node) { + PlanState *outerPlan; + node-grp_done = FALSE; node-ss.ps.ps_TupFromTlist = false; /* must clear first tuple */ @@ -289,7 +291,7 @@ ExecReScanGroup(GroupState *node) * if chgParam of subnode is not null then plan will be re-scanned by * first ExecProcNode. */ - if (node-ss.ps.lefttree - node-ss.ps.lefttree-chgParam == NULL) - ExecReScan(node-ss.ps.lefttree); + outerPlan = outerPlanState(node); + if (outerPlan-chgParam == NULL) + ExecReScan(outerPlan); } diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c index 1158825..41859e0 100644 --- a/src/backend/executor/nodeMaterial.c +++ b/src/backend/executor/nodeMaterial.c @@ -317,6 +317,9 @@ ExecMaterialRestrPos(MaterialState *node) void ExecReScanMaterial(MaterialState *node) { + PlanState *outerPlan; + + outerPlan = outerPlanState(node); ExecClearTuple(node-ss.ps.ps_ResultTupleSlot); if (node-eflags != 0) @@ -339,13 +342,13 @@ ExecReScanMaterial(MaterialState *node) * Otherwise we can just rewind and rescan the stored output. The * state of the subnode does not change. */ - if (node-ss.ps.lefttree-chgParam != NULL || + if (outerPlan-chgParam != NULL || (node-eflags EXEC_FLAG_REWIND) == 0) { tuplestore_end(node-tuplestorestate); node-tuplestorestate = NULL; - if (node-ss.ps.lefttree-chgParam == NULL) - ExecReScan(node-ss.ps.lefttree); + if (outerPlan-chgParam == NULL) + ExecReScan(outerPlan);
Re: [HACKERS] reparsing query
On Wed, Apr 15, 2015 at 2:04 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: His old posting: https://www.postgresql.org/message-id/1247323023.16438.35.camel%40ab-desktop Is this a proposal to have a better formatted (JSON etc) debug_print_parse results? Thanks, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: default_index_tablespace
Hi, This small patch implements a new GUC (default_index_tablespace) plus supporting code. Originated from a customer request, the feature intends to make creation of indexes on SSD-backed tablespaces easy and convenient (almost transparent) for users: the DBA can just set it and indexes will be placed in the specified tablespace --as opposed to the same tablespace where the referenced table is-- without having to specify it every time. Feedback appreciated. Thanks, / J.L. *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 5622,5627 COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; --- 5622,5664 /listitem /varlistentry + varlistentry id=guc-default-index-tablespace xreflabel=default_index_tablespace + termvarnamedefault_index_tablespace/varname (typestring/type) + indexterm +primaryvarnamedefault_index_tablespace/ configuration parameter/primary + /indexterm + indextermprimarytablespace/secondarydefault// + /term + listitem +para + This variable specifies the default tablespace in which to create + indexes when a commandCREATE INDEX/ command does + not explicitly specify a tablespace. +/para + +para + The value is either the name of a tablespace, or an empty string + to specify using the default tablespace of the current database + unless the xref linkend=guc-default-tablespace is defined, + in which case the rules for that parameter apply. + If the value does not match the name of any existing tablespace, + productnamePostgreSQL/ will automatically use the default + tablespace of the current database; the user must have + literalCREATE/ privilege for it, or creation attempts will fail. +/para + +para + This variable is not used for indexes on temporary tables; for them, + xref linkend=guc-temp-tablespaces is consulted instead. +/para + +para + For more information on tablespaces, + see xref linkend=manage-ag-tablespaces. +/para + /listitem + /varlistentry + varlistentry id=guc-temp-tablespaces xreflabel=temp_tablespaces termvarnametemp_tablespaces/varname (typestring/type) indexterm *** a/src/backend/commands/indexcmds.c --- b/src/backend/commands/indexcmds.c *** *** 417,423 DefineIndex(Oid relationId, } else { ! tablespaceId = GetDefaultTablespace(rel-rd_rel-relpersistence); /* note InvalidOid is OK in this case */ } --- 417,423 } else { ! tablespaceId = GetIndexTablespace(rel-rd_rel-relpersistence); /* note InvalidOid is OK in this case */ } *** a/src/backend/commands/tablespace.c --- b/src/backend/commands/tablespace.c *** *** 86,91 --- 86,92 /* GUC variables */ char *default_tablespace = NULL; + char *default_index_tblspc = NULL; char *temp_tablespaces = NULL; *** *** 1085,1090 GetDefaultTablespace(char relpersistence) --- 1086,1149 /* + * GetIndexTablespace -- get the OID of the tablespace for an index + * + * Temporary objects have different default tablespaces, hence the + * relpersistence parameter must be specified. + * + * May return InvalidOid to indicate use the database's default tablespace. + * + * Note that caller is expected to check appropriate permissions for any + * result other than InvalidOid. + * + * This exists to hide (and possibly optimize the use of) the + * default_index_tablespace GUC variable. + */ + Oid + GetIndexTablespace(char relpersistence) + { + Oid result; + const char *tablespace; + + /* The temp-table case is handled elsewhere */ + if (relpersistence == RELPERSISTENCE_TEMP) + { + PrepareTempTablespaces(); + return GetNextTempTableSpace(); + } + + /* Fast path for empty GUC: check defaults */ + if (default_index_tblspc == NULL || default_index_tblspc[0] == '\0') + { + /* if default_tablespace is also empty, return immediately */ + if (default_tablespace == NULL || default_tablespace[0] == '\0') + return InvalidOid; + else + tablespace = default_tablespace; + } + else + tablespace = default_index_tblspc; + + /* +* It is tempting to cache this lookup for more speed, but then we would +* fail to detect the case where the tablespace was dropped since the GUC +* variable was set. Note also that we don't complain if the value fails +* to refer to an existing tablespace; we just silently return InvalidOid, +
Re: [HACKERS] PATCH: default_index_tablespace
jltal...@adv-solutions.net writes: This small patch implements a new GUC (default_index_tablespace) plus supporting code. Originated from a customer request, the feature intends to make creation of indexes on SSD-backed tablespaces easy and convenient (almost transparent) for users: the DBA can just set it and indexes will be placed in the specified tablespace --as opposed to the same tablespace where the referenced table is-- without having to specify it every time. I'm afraid this idea is a nonstarter, because it will break existing applications, and in particular existing pg_dump output files, which expect to be able to determine an index's tablespace by setting default_tablespace. (It is *not* adequate that the code falls back to default_tablespace if the new GUC is unset; if it is set, you've still broken pg_dump.) The incremental value, if indeed there is any, of being able to control index positioning this way seems unlikely to justify a backwards-compatibility break of such magnitude. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On 2015/04/15 3:52, Alvaro Herrera wrote: On 4/14/15 5:49 AM, Etsuro Fujita wrote: postgres=# create foreign table ft1 (c1 int) server myserver options (table_name 't1'); CREATE FOREIGN TABLE postgres=# create foreign table ft2 (c1 int) server myserver options (table_name 't2'); CREATE FOREIGN TABLE postgres=# alter foreign table ft2 inherit ft1; ALTER FOREIGN TABLE postgres=# select * from ft1 for update; ERROR: could not find junk tableoid1 column I think this is a bug. Attached is a patch fixing this issue. I think the real question is whether we're now (I mean after this patch) emitting useless tableoid columns that we didn't previously have. I think the answer is yes, and if so I think we need a smaller comb to fix the problem. This one seems rather large. My answer for that would be *no* because I think tableoid would be needed for EvalPlanQual checking in more complex SELECT FOR UPDATE on the inheritance or UPDATE/DELETE involving the inheritance as a source table. Also, if we allow the FDW to change the behavior of SELECT FOR UPDATE so as to match the local semantics exactly, which I'm working on in another thread, I think tableoid would also be needed for the actual row locking. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reparsing query
On Wed, Apr 15, 2015 at 5:19 PM, Lukas Fittl lu...@fittl.com wrote: It'd be interesting to explore if there is some way to make this less hack-ish, and enable tools to parse queries in a better way. Essentially what is needed is some way to reliably translate SQL into an AST-like output, from an outside tool, whilst reusing the current PostgreSQL parser. It is not difficult to output parsed query in some tool readable format but it comes with a maintain overhead: once tools rely on it, we have to conform to some schema continuously, like the xml/xmlns. Do we want to take this? Depends on how far the tools can go with this exposed information. We actually already have explain command in json format and tools drawing/analyzing query plan rely on it. Will that work for your scenario? Regards, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP tests of pg_rewind not stopping servers used for the tests
On Thu, Apr 16, 2015 at 1:57 AM, Heikki Linnakangas wrote: Thanks, committed. I kept the END block, though, so that we still clean up if the test dies with an exception. That makes sense. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
Michael Paquier wrote: On Thu, Apr 16, 2015 at 4:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Well, *I* can't write the MSVC patch, but I can't find anyone to volunteer some time either :-( I think it's best to have MSVC skip the step and have the Makefile-based animals run it. That would give us a lot more coverage than currently, which is just two animals AFAIU. Meh. I'll just do it then. We've been slacking for some time regarding that, and it is not cool to have the modules untested on Windows. No kidding. So let's do the following: - Rework the patch doing refactoring of contrib/ and src/test/modules in MSVC. - Include the modules in install when install target is 'all' or default, like we did in previous releases. - Add a new option in vcregress, modulecheck to kick the tests of those modules. vcregress will need to refactoring as well. This makes 3 patches visibly, one for the build part, one for the install part and one for vcregress. Sounds fine? Sounds like a plan. We've handled the buildfarm being red for a few days before. People are usually good about applying fixes fairly quickly. Okay. I did it on my stuff already... Great. It's a bit less red already ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: default_index_tablespace
* Bruce Momjian (br...@momjian.us) wrote: On Wed, Apr 15, 2015 at 07:12:11PM -0400, Tom Lane wrote: jltal...@adv-solutions.net writes: This small patch implements a new GUC (default_index_tablespace) plus supporting code. Originated from a customer request, the feature intends to make creation of indexes on SSD-backed tablespaces easy and convenient (almost transparent) for users: the DBA can just set it and indexes will be placed in the specified tablespace --as opposed to the same tablespace where the referenced table is-- without having to specify it every time. I'm afraid this idea is a nonstarter, because it will break existing applications, and in particular existing pg_dump output files, which expect to be able to determine an index's tablespace by setting default_tablespace. (It is *not* adequate that the code falls back to default_tablespace if the new GUC is unset; if it is set, you've still broken pg_dump.) The incremental value, if indeed there is any, of being able to control index positioning this way seems unlikely to justify a backwards-compatibility break of such magnitude. I can see why someone would want this because random I/O, which is frequent for indexes, is much faster on SSD than magnetic disks. (Sequential I/O is more similar for the two.) The general idea is something I've brought up previously also (I believe it was even discussed at the Dev meeting in, uh, 2013?) so I'm not anxious to simply dismiss it, but it'd certainly have to be done correctly.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Improve sleep processing of pg_rewind TAP tests
Michael Paquier wrote: However after discussion with a colleague we have noticed that those values may not be enough in slow environments, a value of up to 10s being sometimes needed after promotion to make tests pass. Yeah, hardcoded sleep times are not reliable. (/me would love to get rid of hardcoded times in isolationtester timeout.spec test ...) Attached is a patch improving this sleep logic and doing the following things: 1) To ensure that standby has caught up, check replay position on the standby and compare it with the current WAL position of master. 2) To ensure that promotion is effective, use pg_is_in_recovery() and continue processing until we are sure that the standby is out of recovery. Seems reasonable, but why are you sleeping 1s if pg_ctl -w is in use? I thought the -w would wait until promotion has taken effect, so there's no need to sleep additional time. Note that this patch adds a small routine called command_result in TestLib.pm, able to return stdout, stderr and the exit code. That's really handy, and I am planning to use something like that as well in the replication test suite. Makes sense to me. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On Wed, Apr 15, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote: Of course, that applies in all cases, but when the page is already dirty, the cost of pruning it is probably quite small - we're going to have to write the page anyway, and pruning it before it gets evicted (perhaps even by our scan) will be cheaper than writing it now and writing it again after it's pruned. When the page is clean, the cost of pruning is significantly higher. We aren't going to have to write the page, but someone will. If it's already dirty that doesn't change at all. *Not* pruning in that moment actually will often *increase* the total amount of writes to the OS. Because now the pruning will happen on the next write access or vacuum - when the page already might have been undirtied. I don't really see the downside to this suggestion. +1. I think, in general, the opportunity cost of not pruning when a page is already dirty is likely to be rather high. In general, it's likely to be worth it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: default_index_tablespace
On Wed, Apr 15, 2015 at 07:12:11PM -0400, Tom Lane wrote: jltal...@adv-solutions.net writes: This small patch implements a new GUC (default_index_tablespace) plus supporting code. Originated from a customer request, the feature intends to make creation of indexes on SSD-backed tablespaces easy and convenient (almost transparent) for users: the DBA can just set it and indexes will be placed in the specified tablespace --as opposed to the same tablespace where the referenced table is-- without having to specify it every time. I'm afraid this idea is a nonstarter, because it will break existing applications, and in particular existing pg_dump output files, which expect to be able to determine an index's tablespace by setting default_tablespace. (It is *not* adequate that the code falls back to default_tablespace if the new GUC is unset; if it is set, you've still broken pg_dump.) The incremental value, if indeed there is any, of being able to control index positioning this way seems unlikely to justify a backwards-compatibility break of such magnitude. I can see why someone would want this because random I/O, which is frequent for indexes, is much faster on SSD than magnetic disks. (Sequential I/O is more similar for the two.) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: default_index_tablespace
On Thu, Apr 16, 2015 at 8:01 AM, Bruce Momjian br...@momjian.us wrote: On Wed, Apr 15, 2015 at 07:12:11PM -0400, Tom Lane wrote: jltal...@adv-solutions.net writes: This small patch implements a new GUC (default_index_tablespace) plus supporting code. Originated from a customer request, the feature intends to make creation of indexes on SSD-backed tablespaces easy and convenient (almost transparent) for users: the DBA can just set it and indexes will be placed in the specified tablespace --as opposed to the same tablespace where the referenced table is-- without having to specify it every time. I'm afraid this idea is a nonstarter, because it will break existing applications, and in particular existing pg_dump output files, which expect to be able to determine an index's tablespace by setting default_tablespace. (It is *not* adequate that the code falls back to default_tablespace if the new GUC is unset; if it is set, you've still broken pg_dump.) The incremental value, if indeed there is any, of being able to control index positioning this way seems unlikely to justify a backwards-compatibility break of such magnitude. I can see why someone would want this because random I/O, which is frequent for indexes, is much faster on SSD than magnetic disks. (Sequential I/O is more similar for the two.) Another way to provide different default tablespace for index could be to provide it at Database level. Have a new option INDEX_TABLESPACE in Create Database command which can be used to create indexes when not specified during Create Index command. This would also need changes in pg_dump (like while dumping info about database) but on initial look, it seems it can be done without much changes. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] reparsing query
On Wed, Apr 15, 2015 at 8:39 PM, Qingqing Zhou zhouqq.postg...@gmail.com wrote: It is not difficult to output parsed query in some tool readable format but it comes with a maintain overhead: once tools rely on it, we have to conform to some schema continuously, like the xml/xmlns. Do we want to take this? Depends on how far the tools can go with this exposed information. We actually already have explain command in json format and tools drawing/analyzing query plan rely on it. Will that work for your scenario? Sure, that would work - but as I understand adding an explicit SQL command (or function) is not something that would ever be merged into Postgres core. Especially since that command's output could easily change across versions. My thought was more along the lines of making something like raw_parser + nodeToString available through an extension, but with a JSON output format. Note that an important detail in the monitoring case is that you don't necessarily have the statement's underlying relations available (since you might work with statistics data on a different machine). Best, Lukas -- Lukas Fittl Skype: lfittl Phone: +1 415 321 0630
Re: [HACKERS] reparsing query
On Wed, Apr 15, 2015 at 2:43 PM, Qingqing Zhou zhouqq.postg...@gmail.com wrote: Is this a proposal to have a better formatted (JSON etc) debug_print_parse results? I've run into the need for this as well for monitoring purposes, my solution was to compile the needed object files into a library (see [0]). It'd be interesting to explore if there is some way to make this less hack-ish, and enable tools to parse queries in a better way. Essentially what is needed is some way to reliably translate SQL into an AST-like output, from an outside tool, whilst reusing the current PostgreSQL parser. I believe the recent work on deparsing DDL statements relates to some of that (i.e. might be re-usable for this purpose, through an extension), if I'm understanding correctly. [0]: https://github.com/pganalyze/pg_query Best, Lukas -- Lukas Fittl Skype: lfittl Phone: +1 415 321 0630
[HACKERS] Improve sleep processing of pg_rewind TAP tests
Hi all, TAP tests of pg_rewind are using in 2 places hardcoded values to wait for a given amount of time for some events. In HEAD, those things are: 1) Wait for 1s for standby to catch up. 2) Wait for 2s for promotion of standby. However after discussion with a colleague we have noticed that those values may not be enough in slow environments, a value of up to 10s being sometimes needed after promotion to make tests pass. And actually the current way of doing is not reliable because it depends on how the environment is able to handle quickly standby replay and promotion (I am expecting issues regarding that on Windows btw, and on small spec machines). Attached is a patch improving this sleep logic and doing the following things: 1) To ensure that standby has caught up, check replay position on the standby and compare it with the current WAL position of master. 2) To ensure that promotion is effective, use pg_is_in_recovery() and continue processing until we are sure that the standby is out of recovery. In each case the patch attached makes a maximum of 30 attempts, each attempt waiting 1s before processing, so we let a margin of up to 30s for environments to handle replay and promotion properly. Note that this patch adds a small routine called command_result in TestLib.pm, able to return stdout, stderr and the exit code. That's really handy, and I am planning to use something like that as well in the replication test suite. Regards, -- Michael diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm index e6a5b9b..86e9def 100644 --- a/src/bin/pg_rewind/RewindTest.pm +++ b/src/bin/pg_rewind/RewindTest.pm @@ -183,7 +183,7 @@ sub create_standby # Base backup is taken with xlog files included system_or_bail(pg_basebackup -D $test_standby_datadir -p $port_master -x $log_path 21); append_to_file($test_standby_datadir/recovery.conf, qq( -primary_conninfo='$connstr_master' +primary_conninfo='$connstr_master application_name=rewind_standby' standby_mode=on recovery_target_timeline='latest' )); @@ -191,8 +191,29 @@ recovery_target_timeline='latest' # Start standby system_or_bail(pg_ctl -w -D $test_standby_datadir -o \-k $tempdir_short --listen-addresses='' -p $port_standby\ start $log_path 21); - # sleep a bit to make sure the standby has caught up. - sleep 1; + # Wait until the standby has caught up with the primary by comparing + # WAL positions on both nodes. Note that this is fine + my $max_attempts = 30; + my $attempts = 0; + while ($attempts $max_attempts) + { + # Wait a bit before proceeding. + sleep 1; + $attempts++; + + my $query = SELECT pg_current_xlog_location() = replay_location FROM pg_stat_replication WHERE application_name = 'rewind_standby';; + my $cmd = ['psql', '-At', '-c', $query, '-d', $connstr_master ]; + my ($res, $stdout, $stderr) = command_result($cmd); + chomp($stdout); + if ($stdout eq t) + { + last; + } + } + if ($attempts == $max_attempts) + { + die Maximum number of attempts reached when waiting for standby to catch up; + } } sub promote_standby @@ -201,9 +222,31 @@ sub promote_standby # up standby # Now promote slave and insert some new data on master, this will put - # the master out-of-sync with the standby. + # the master out-of-sync with the standby. Be sure that we leave here + # with a standby actually ready for the next operations. system_or_bail(pg_ctl -w -D $test_standby_datadir promote $log_path 21); - sleep 2; + my $max_attempts = 30; + my $attempts = 0; + while ($attempts $max_attempts) + { + # Wait a bit before proceeding, promotion may have not taken effect + # in such a short time. + sleep 1; + $attempts++; + + my $query = SELECT pg_is_in_recovery(); + my $cmd = ['psql', '-At', '-c', $query, '-d', $connstr_standby ]; + my ($res, $stdout, $stderr) = command_result($cmd); + chomp($stdout); + if ($stdout eq f) + { + last; + } + } + if ($attempts == $max_attempts) + { + die Maximum number of attempts reached when waiting for promotion of standby; + } } sub run_pg_rewind diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index 003cd9a..3c6a49e 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -16,6 +16,7 @@ our @EXPORT = qw( command_ok command_fails command_exit_is + command_result program_help_ok program_version_ok program_options_handling_ok @@ -161,6 +162,14 @@ sub command_exit_is is($h-result(0), $expected, $test_name); } +sub command_result +{ + my ($cmd) = @_; + my ($stdout, $stderr); + my $result = run $cmd, '', \$stdout, '2', \$stderr; + return ($result, $stdout, $stderr); +} + sub program_help_ok { my ($cmd) = @_; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve sleep processing of pg_rewind TAP tests
On Thu, Apr 16, 2015 at 12:51 PM, Alvaro Herrera wrote: Michael Paquier wrote: However after discussion with a colleague we have noticed that those values may not be enough in slow environments, a value of up to 10s being sometimes needed after promotion to make tests pass. Yeah, hardcoded sleep times are not reliable. (/me would love to get rid of hardcoded times in isolationtester timeout.spec test ...) Attached is a patch improving this sleep logic and doing the following things: 1) To ensure that standby has caught up, check replay position on the standby and compare it with the current WAL position of master. 2) To ensure that promotion is effective, use pg_is_in_recovery() and continue processing until we are sure that the standby is out of recovery. Seems reasonable, but why are you sleeping 1s if pg_ctl -w is in use? I thought the -w would wait until promotion has taken effect, so there's no need to sleep additional time. Visibly that's not the case for this test case, the timing issues that we saw happened not because of the standby not catching up, but because of the promotion not taking effect in a timely fashion. And that's as well something I saw on my OSX box some days ago as well, explaining why the sleep time has been increased from 1 to 2 in 53ba1077. This patch just takes it the careful way. In perl, system waits for the process of pg_ctl to exit before moving on, perhaps the problem is that pg_ctl thinks that promotion is done, but the node is not quite ready, explaining why the test fails. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
On Thu, Apr 16, 2015 at 4:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andrew Dunstan wrote: Yes, we do want support for testmodules. I think that needs to be built into src/tools/msvc, probably in vcregress.pl. Once we have that the buildfarm changes will be trivial. But if that's not forthcoming quickly I can just avoid the step on msvc for now. Well, *I* can't write the MSVC patch, but I can't find anyone to volunteer some time either :-( I think it's best to have MSVC skip the step and have the Makefile-based animals run it. That would give us a lot more coverage than currently, which is just two animals AFAIU. Meh. I'll just do it then. We've been slacking for some time regarding that, and it is not cool to have the modules untested on Windows. So let's do the following: - Rework the patch doing refactoring of contrib/ and src/test/modules in MSVC. - Include the modules in install when install target is 'all' or default, like we did in previous releases. - Add a new option in vcregress, modulecheck to kick the tests of those modules. vcregress will need to refactoring as well. This makes 3 patches visibly, one for the build part, one for the install part and one for vcregress. Sounds fine? We've handled the buildfarm being red for a few days before. People are usually good about applying fixes fairly quickly. Okay. I did it on my stuff already... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 15, 2015 at 12:37:44AM -0400, Robert Haas wrote: I think such a solution will be good for the cases when many evictions needs to be performed to satisfy the workload, OTOH when there are not too many evictions that needs to be done, in such a case some of the buffers that are accessed much more will have equal probability to get evicted as compare to buffers which are less accessed. Possibly, but I think it's even worse under the current algorithm. Under this proposal, if we go for a long time without any buffer evictions, every buffer usage's count will top out at 2 more than wherever it was after the last clock sweep. In the worst case, every buffer (or most of them) could end up with the same usage count. But under the status quo, they'll all go to 5, which is an even bigger loss of information, and which will make the first eviction much more expensive than if they are all pegged at 2 or 3. I've been following this thread from the side with interest and got twigged by the point about loss of information. If you'd like better information about relative ages, you can acheive this by raising the cap on the usage count and dividing (or right-shifting) each sweep. This would allow you to remember much more about about the relative worth of often used pages. With a cap of 32 you'd have the same effect as now where after 5 sweeps the buffer is evicted. Mathematically the count would converge to the number of times the block is used per sweep. If you wanted to be really clever, you could at the beginning of each sweep take an estimate of the number of buffers used since the last sweep (from the stats collector perhaps) and use that to drive your divisor, so if you have a lots of allocations you become more aggressive about reducing the counts. Or if the load is light fall back to just subtracting one. Then you don't need a cap at all. (Apologies if this has been suggested before, Google didn't find anything for me). Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] reparsing query
Qingqing Zhou wrote: On Wed, Apr 15, 2015 at 1:40 PM, Andrzej Barszcz abus...@gmail.com wrote: I knock once again with this : reparse query to XML ( last knock 5-6 years before) . What exactly reparse query to XML does? His old posting: https://www.postgresql.org/message-id/1247323023.16438.35.camel%40ab-desktop -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 15, 2015 at 5:26 AM, Robert Haas robertmh...@gmail.com wrote: The way our cache works we promote when a buffer is accessed but we only demote when a buffer is flushed. We flush a lot less often than we touch buffers so it's not surprising that the cache ends up full of buffers that are all in the most recently used section. This isn't really correct. We promote when it's accessed, but we demote it when the clock sweep hand passes over it, which happens each time we consider it for eviction. It does not have to do with flushing dirty date to disk, and it does not happen only when the buffer is actually evicted. This is my point though (you're right that flushed isn't always the same as eviction but that's not the important point here). Right now we only demote when we consider buffers for eviction. But we promote when we pin buffers. Those two things aren't necessarily happening at the same rate and in fact are often orders of magnitude different. So it makes sense that we end up with a lot of buffers promoted all the way to the most recently used ntile and then have to do n passes around the clock and have no good information about which buffer to evict. What I'm saying is that we should demote a buffer every time we promote a buffer. So every time we pin a buffer we should advance the clock a corresponding amount. I know I'm being intentionally vague about what the corresponding amount is.) The important thing is that the two should be tied together. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb -S and tablespaces
At 2015-04-06 10:16:36 -0300, alvhe...@2ndquadrant.com wrote: Well, we have many things that can be set as symlinks; some you can have initdb create the links for you (initdb --xlogdir), others you can move manually. Looking at sendDir() in backend/replication/basebackup.c, I notice that the only places where it expects/allows symlinks are for pg_xlog and in pg_tblspc. Still, thanks to the example in basebackup.c, I've got most of a patch that should follow any symlinks in PGDATA. I just need to test a little more before I post it. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FPW compression leaks information
On 04/10/2015 05:17 AM, Robert Haas wrote: I bet that there are at least 1000 covert channel attacks that are more practically exploitable than this. When this is anywhere near the top of the list of things to worry about, I recommend we throw a huge party and then fly home in our faster-than-light starships which, by then, will probably be available at Walmart. Let's try to put some perspective on how serious (or not) this leak is. I'm going to use finding a password hash in pg_authid as the target. After each checkpoint, we can make one guess, and see how that compresses. If we assume ideal conditions, and guess one hex digit at a time (the md5 hashes are stored as hex strings), you can find a 32 digit hash in 16*32 = 512 checkpoints. You could possible improve on that, by doing a binary search of the digits: first use a guess that contains digits 0-8, then 9-f, and find which range the victim was in. The split that range in two, repeat, until you find the digit. With that method, the theoretical minimum is log2(16)*32 = 128 checkpoints. Obviously, the real world is more complicated and you won't get very close to that ideal case. For my entertainment, I wrote a little exploit script (attached). I didn't use the binary-search approach, it guesses one digit at a time. Testing on my laptop, in a newly initdb'd database with no other activity, it found a full MD5 hash in 3594 checkpoints. I cheated to make it faster, and used the CHECKPOINT command instead of waiting or writing a lot of WAL, to trigger the checkpoints, but that shouldn't affect the number of checkpoints required. Your mileage may vary, of course, and I'm sure this script will get confused and not work in many cases, but OTOH I'm sure you could also optimize it further. If we take that ~4000 checkpoints figure, and checkpoint_timeout=5 minutes, it would take about two weeks. But if you're happy with getting just the first 4 or 8 bytes or so, and launch a dictionary attack on that, you can stop much earlier than that. - Heikki #!/usr/bin/python3 # # Prerequisites: # Create a user called eve. The script logs in as that user, and changes # its password repeatedly. You also need to create a dummy table called # garbage, which is used by the script to create some dummy WAL activity. # # CREATE USER eve; # CREATE TABLE garbage (t text); # ALTER TABLE garbage OWNER To eve; # # # Usage: # Edit the connection string below. Launch the script. It will run # for a long time, printing out a line after each guess (after each # checkpoint). # # If you want to make the script run faster, you can cheat and grant eve # superuser privileges, and modify the perform_checkpoint function below # to use a direct CHECKPOINT command, instead of forcing a checkpoint with # write activity. # # Caveats: # There are a lot of things that could confuse the script and stall it: # * Concurrent update of pg_authid table # * Other concurrent WAL activity # * More than one other ueser on the same pg_authid page (or even a dead row) # import psycopg2; import statistics; connectstr = host=localhost dbname=postgres user=eve; # characters to use for padding our guesses. This needs to be non-repeating # and not contain hex characters (0-9, a-f), to avoid compression. paddingstr = ABCDEFGHIJKLMNOPQRSTUVXYZghijklmnopqrstuvxyz; # If you have an initial guess of the first N bytes, you can type it here initial_guess = md5; # length of the string we're guessing. victim_length = 3 + 32; # How many times to repeat each guess? Increasing this makes the attack take # longer, but makes it less likely to get confused by small random differences # in WAL record sizes. num_repeats = 5; checkpoints = 0; # how many checkpoints have we executed so far? # Perform a checkpoint. We can't just call CHECKPOINT, because that's # superuser-only. We resort to doing dummy activity until a checkpoint # is triggered. # # There is also no direct way of detecting when a checkpoint has happened, so # we take advantage of the WAL compression to detect that. We insert a 1k row # that compressess well. Normally, the WAL record will take somewhat over 1k # bytes. But just after a checkpoint has happened, a full-page image is taken, # and full-page compression is enabled, the WAL record with the compresses to # much less than 1k bytes. # def perform_checkpoint(): global checkpoints; checkpoints += 1; # If you're testing this as superuser, you can cheat and perform direct # CHECKPOINT command by uncommenting this. It makes the script run *much* # faster. c.execute(CHECKPOINT); return; conn.rollback(); conn.autocommit = True; c.execute(vacuum garbage); conn.autocommit = False; c.execute(SELECT pg_current_xlog_insert_location();); beforexlog = c.fetchone()[0]; for i in range(0, 10): c.execute(INSERT INTO garbage VALUES (repeat('x', 1000))); c.execute(SELECT pg_current_xlog_insert_location(), pg_current_xlog_insert_location() - %s;,
Re: [HACKERS] Turning off HOT/Cleanup sometimes
On 14 April 2015 at 21:53, Robert Haas robertmh...@gmail.com wrote: Peter commented previously that README.HOT should get an update. The relevant section seems to be When can/should we prune or defragment?. That's easy enough to change once we agree to commit. I wonder if it would be a useful heuristic to still prune pages if those pages are already dirty. Useful for who? This is about responsibility. Why should someone performing a large SELECT take the responsibility for cleaning pages? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make more portable TAP tests of initdb
On Wed, Apr 15, 2015 at 2:38 PM, Noah Misch wrote: Solaris 10 ships Perl 5.8.4, and RHEL 5.11 ships Perl 5.8.8. Therefore, Perl installations lacking this File::Path feature will receive vendor support for years to come. Replacing the use of keep_root with rmtree+mkdir will add 2-10 lines of code, right? Let's do that and not raise the system requirements. Good point. Let's use mkdir in combination then. Attached is an updated patch. -- Michael diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl index d12be84..483ebb5 100644 --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -1,6 +1,7 @@ use strict; use warnings; use TestLib; +use File::Path qw(rmtree); use Test::More tests = 19; my $tempdir = TestLib::tempdir; @@ -18,27 +19,30 @@ command_fails([ 'initdb', '-S', $tempdir/data3 ], mkdir $tempdir/data4 or BAIL_OUT($!); command_ok([ 'initdb', $tempdir/data4 ], 'existing empty data directory'); -system_or_bail rm -rf '$tempdir'/*; - +rmtree($tempdir); +mkdir $tempdir; command_ok([ 'initdb', '-X', $tempdir/pgxlog, $tempdir/data ], 'separate xlog directory'); -system_or_bail rm -rf '$tempdir'/*; +rmtree($tempdir); +mkdir $tempdir; command_fails( [ 'initdb', $tempdir/data, '-X', 'pgxlog' ], 'relative xlog directory not allowed'); -system_or_bail rm -rf '$tempdir'/*; +rmtree($tempdir); +mkdir $tempdir; mkdir $tempdir/pgxlog; command_ok([ 'initdb', '-X', $tempdir/pgxlog, $tempdir/data ], 'existing empty xlog directory'); -system_or_bail rm -rf '$tempdir'/*; +rmtree($tempdir); +mkdir $tempdir; mkdir $tempdir/pgxlog; mkdir $tempdir/pgxlog/lost+found; command_fails([ 'initdb', '-X', $tempdir/pgxlog, $tempdir/data ], 'existing nonempty xlog directory'); -system_or_bail rm -rf '$tempdir'/*; +rmtree($tempdir); command_ok([ 'initdb', '-T', 'german', $tempdir/data ], 'select default dictionary'); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: adaptive ndistinct estimator v4
On Tue, Mar 31, 2015 at 12:02 PM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: Hi all, attached is v4 of the patch implementing adaptive ndistinct estimator. Hi Tomas, I have a case here where the adaptive algorithm underestimates ndistinct by a factor of 7 while the default estimator is pretty close. 5MB file: https://drive.google.com/file/d/0Bzqrh1SO9FcETU1VYnQxU2RZSWM/view?usp=sharing # create table foo2 (x text); # \copy foo2 from program 'bzcat ~/temp/foo1.txt.bz2' # analyze verbose foo2; INFO: analyzing public.foo2 INFO: foo2: scanned 6021 of 6021 pages, containing 1113772 live rows and 0 dead rows; 3 rows in sample, 1113772 estimated total rows WARNING: ndistinct estimate current=998951.78 adaptive=135819.00 Cheers, Jeff
Re: [HACKERS] initdb -S and tablespaces
At 2015-04-15 11:40:34 +0530, a...@2ndquadrant.com wrote: Still, thanks to the example in basebackup.c, I've got most of a patch that should follow any symlinks in PGDATA. I notice that src/bin/pg_rewind/copy_fetch.c has a traverse_datadir() function that does what we want (but it recurses into symlinks only inside pg_tblspc). -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move pg_upgrade from contrib/ to src/bin/
On 4/14/15 8:32 PM, Peter Eisentraut wrote: Move pg_upgrade from contrib/ to src/bin/ Oh dear. It appears the buildfarm client code needs to be updated to support this. How do we do this? (I guess the critters that are still green are not running this test.) Should I revert this patch while we sort this out? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers