Re: [HACKERS] buffer assertion tripping under repeat pgbench load
Hi, On 2012-12-23 02:36:42 -0500, Greg Smith wrote: I'm testing a checkout from a few days ago and trying to complete a day long pgbench stress test, with assertions and debugging on. I want to make sure the base code works as expected before moving on to testing checksums. It's crashing before finishing though. Here's a sample: 2012-12-20 20:36:11 EST [26613]: LOG: checkpoint complete: wrote 32 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.108 s, sync=0.978 s, total=4.187 s; sync files=7, longest=0.832 s, average=0.139 s TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1703) 2012-12-20 20:36:44 EST [26611]: LOG: server process (PID 4834) was terminated by signal 6: Aborted 2012-12-20 20:36:44 EST [26611]: DETAIL: Failed process was running: END; Running the same test set again gave the same crash, so this looks repeatable. It's not trivial to trigger given the average runtime to crash I'm seeing. Second sample: 2012-12-22 21:27:17 EST [6006]: LOG: checkpoint complete: wrote 34 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.310 s, sync=2.906 s, total=6.424 s; sync files=7, longest=2.648 s, average=0.415 s TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1703) 2012-12-22 21:27:23 EST [26611]: LOG: server process (PID 12143) was terminated by signal 6: Aborted 2012-12-22 21:27:23 EST [26611]: DETAIL: Failed process was running: END; The important part: TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1703) That's the check done by AtEOXact_Buffers - clean up at end of transaction. It fired while executing the END; statement at the end of the standard pgbench test. The test looks for loose things that weren't pinned/unpinned correctly by the transaction. I'm not sure if getting a stack trace at that point will be useful. The ref count mistake could easily have been made by an earlier statement in the transaction block, right? This is on a server where shared_buffers=2GB, checkpoint_segments=64. Test runs were automated with pgbench-tools, using the following configuration: MAX_WORKERS=4 SCRIPT=tpc-b.sql SCALES=500 1000 SETCLIENTS=4 8 16 32 64 96 SETTIMES=3 RUNTIME=600 All of the crashes so far have been at scale=500, and I haven't seen those finish yet. It failed 7 hours in the first time, then 4 hours in. For code reference, the last commit in the source code tree I built against was: af275a12dfeecd621ed9899a9382f26a68310263 Thu Dec 20 14:23:31 2012 +0200 Looking at recent activity, the only buffer pin related changes I saw was this one: commit 62656617dbe49cca140f3da588a5e311b3fc35ea Date: Mon Dec 3 18:53:31 2012 + Avoid holding vmbuffer pin after VACUUM. I don't immediately see anything bad here. This is my first test like this against 9.3 development though, so the cause could be an earlier commit. I'm just starting with the most recent work as the first suspect. Next I think I'll try autovacuum=off and see if the crash goes away. Other ideas are welcome. Something like the (untested) debug message below might be helpful: diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 7be767b..20f61a1 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1700,7 +1700,13 @@ AtEOXact_Buffers(bool isCommit) for (i = 0; i NBuffers; i++) { - Assert(PrivateRefCount[i] == 0); + if (PrivateRefCount[i] != 0) + { + BufferDesc *bufHdr = BufferDescriptors[i]; + elog(PANIC, refcount of %s is %u should be 0, globally: %u, +relpathbackend(bufHdr-tag.rnode, InvalidBackendId, bufHdr-tag.forkNum), +PrivateRefCount[i], bufHdr-refcount); + } } } #endif Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not thrilled with the suggestion to do RAND_cleanup() after forking though, as that seems like it'll guarantee that each session starts out with only a minimal amount of entropy. No, that's actually the right fix - that forces OpenSSL to do new reseed with system randomness, thus making backend (including SSL_accept) maximally disconnected from static pool in postmaster. This also makes behaviour equal to current ssl=off and exec-backend mode, which already do initial seeding in backend. The fact that PRNG behaviour is affected by complex set of compile- and run-time switches makes current situation rather fragile and hard to understand. What seems to me like it'd be most secure is to make the postmaster do RAND_add() with a gettimeofday result after each successful fork, say at the bottom of BackendStartup(). That way the randomness accumulates in the parent process, and there's no way to predict the initial state of any session without exact knowledge of every child process launch since postmaster start. So it'd go something like #ifdef USE_SSL if (EnableSSL) { struct timeval tv; gettimeofday(tv, NULL); RAND_add(tv, sizeof(tv), 0); } #endif If you decide against RAND_cleanup in backend and instead do workarounds in backend or postmaster, then please also to periodic RAND_cleanup in postmaster. This should make harder to exploit weaknesses in reused slowly moving state. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup from cascading standby after timeline switch
On Fri, Dec 21, 2012 at 9:54 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Yes, this should be backpatched to 9.2. I came up with the attached. In this patch, if '-X stream' is specified in pg_basebackup, the timeline history files are not backed up. We should change pg_backup background process and walsender so that they stream also timeline history files, for example, by using 'TIMELINE_HISTORY' replication command? Or basebackup.c should send all timeline history files at the end of backup even if '-X stream' is specified? However, thinking about this some more, there's a another bug in the way WAL files are included in the backup, when a timeline switch happens. basebackup.c includes all the WAL files on ThisTimeLineID, but when the backup is taken from a standby, the standby might've followed a timeline switch. So it's possible that some of the WAL files should come from timeline 1, while others should come from timeline 2. This leads to an error like requested WAL segment 0001000C has already been removed in pg_basebackup. Attached is a script to reproduce that bug, if someone wants to play with it. It's a bit sensitive to timing, and needs tweaking the paths at the top. One solution to that would be to pay more attention to the timelines to include WAL from. basebackup.c could read the timeline history file, to see exactly where the timeline switches happened, and then construct the filename of each WAL segment using the correct timeline id. Another approach would be to do readdir() on pg_xlog, and include all WAL files, regardless of timeline IDs, that fall in the right XLogRecPtr range. The latter seems easier to backpatch. The latter sounds good to me. But how all WAL files with different timelines are shipped in pg_basebackup -X stream mode? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 22 December 2012 20:13, Kevin Grittner kgri...@mail.com wrote: I apologize again for coming in so late with strong opinions, but I thought I knew what row level security meant, and it was just a question of how to do it, but I can't reconcile what I thought the feature was about with the patch I'm seeing; perhaps it's just a lack of the hight level context that's making it difficult. Agreed, I think we're all feeling that. I'll do my best to accommodate all viewpoints. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Sat, Dec 22, 2012 at 4:07 AM, Phil Sorber p...@omniti.com wrote: Added new version with default verbose and quiet option. Also updated docs to reflect changes. Thanks for the updated patches. Here is the status about the binary patch: - Code compiles without any warnings - After testing the patch, it behaves as expected, default option is now verbose, the output can be hidden using -q or --quiet However, I still have the following comments: - in pg_isready.c, the function help needs to be static. - the list of options called with getopt_long should be classified in alphabetical order (the option q is not at the correct position) d:h:p:U:qV = d:h:p:qU:V Then, about the doc patch: - docs compile correctly (I did a manual check) - I am not a native English speaker, but the docs look correct to me. There are enough examples and description is enough. No problems either with the sgml format. Once the 2 small things I noticed are fixed, this patch can be marked as ready for committer. Thanks, -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] [WIP] pg_ping utility
On Sun, Dec 23, 2012 at 9:29 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Dec 22, 2012 at 4:07 AM, Phil Sorber p...@omniti.com wrote: Added new version with default verbose and quiet option. Also updated docs to reflect changes. Thanks for the updated patches. Here is the status about the binary patch: - Code compiles without any warnings - After testing the patch, it behaves as expected, default option is now verbose, the output can be hidden using -q or --quiet However, I still have the following comments: - in pg_isready.c, the function help needs to be static. I have no objection to making this static, but curious what is your reason for it here? - the list of options called with getopt_long should be classified in alphabetical order (the option q is not at the correct position) d:h:p:U:qV = d:h:p:qU:V Then, about the doc patch: - docs compile correctly (I did a manual check) - I am not a native English speaker, but the docs look correct to me. There are enough examples and description is enough. No problems either with the sgml format. Once the 2 small things I noticed are fixed, this patch can be marked as ready for committer. Ok, I will get an updated version later today. Thanks, -- Michael Paquier http://michael.otacoo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On Fri, Dec 21, 2012 at 1:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Dec 15, 2012 at 9:36 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Dec 8, 2012 at 12:51 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 06.12.2012 15:39, Amit Kapila wrote: On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote: On 05.12.2012 14:32, Amit Kapila wrote: On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote: After some diversions to fix bugs and refactor existing code, I've committed a couple of small parts of this patch, which just add some sanity checks to notice incorrect PITR scenarios. Here's a new version of the main patch based on current HEAD. After testing with the new patch, the following problems are observed. Defect - 1: 1. start primary A 2. start standby B following A 3. start cascade standby C following B. 4. start another standby D following C. 5. Promote standby B. 6. After successful time line switch in cascade standby C D, stop D. 7. Restart D, Startup is successful and connecting to standby C. 8. Stop C. 9. Restart C, startup is failing. Ok, the error I get in that scenario is: C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to startup process failure That mismatch causes the error. I'd like to fix this by always treating the checkpoint record to be part of the new timeline. That feels more correct. The most straightforward way to implement that would be to peek at the xlog record before updating replayEndRecPtr and replayEndTLI. If it's a checkpoint record that changes TLI, set replayEndTLI to the new timeline before calling the redo-function. But it's a bit of a modularity violation to peek into the record like that. Or we could just revert the sanity check at beginning of recovery that throws the requested timeline 2 does not contain minimum recovery point 0/3023F08 on timeline 1 error. The error I added to redo of checkpoint record that says unexpected timeline ID %u in checkpoint record, before reaching minimum recovery point %X/%X on timeline %u checks basically the same thing, but at a later stage. However, the way minRecoveryPointTLI is updated still seems wrong to me, so I'd like to fix that. I'm thinking of something like the attached (with some more comments before committing). Thoughts? This has fixed the problem reported. However, I am not able to think will there be any problem if we remove check requested timeline 2 does not contain minimum recovery point 0/3023F08 on timeline 1 at beginning of recovery and just update replayEndTLI with ThisTimeLineID? Well, it seems wrong for the control file to contain a situation like this: pg_control version number:932 Catalog version number: 201211281 Database system identifier: 5819228770976387006 Database cluster state: shut down in recovery pg_control last modified: pe 7. joulukuuta 2012 17.39.57 Latest checkpoint location: 0/3023EA8 Prior checkpoint location:0/260 Latest checkpoint's REDO location:0/3023EA8 Latest checkpoint's REDO WAL file:00020003 Latest checkpoint's TimeLineID: 2 ... Time of latest checkpoint:pe 7. joulukuuta 2012 17.39.49 Min recovery ending location: 0/3023F08 Min recovery ending loc's timeline: 1 Note the latest checkpoint location and its TimelineID, and compare them with the min recovery ending location. The min recovery ending location is ahead of latest checkpoint's location; the min recovery ending location actually points to the end of the checkpoint record. But how come the min recovery ending location's timeline is 1, while the checkpoint record's timeline is 2. Now maybe that would happen to work if remove the sanity check, but it still seems horribly confusing. I'm afraid that discrepancy will come back to haunt us later if we leave it like that. So I'd like to fix that. Mulling over this for some more, I propose the attached patch. With the patch, we peek into the checkpoint record, and actually perform the timeline switch (by changing ThisTimeLineID) before replaying it. That way the checkpoint record is really considered to be on the new timeline for all purposes. At the moment, the only difference that makes in practice is that we set replayEndTLI, and thus minRecoveryPointTLI, to the new TLI, but it feels logically more correct to do it that way. This patch has already been included in HEAD. Right? I found another requested timeline does not contain minimum recovery point error scenario in HEAD: 1. Set up the master 'M',
Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 20 December 2012 14:56, Amit Kapila amit.kap...@huawei.com wrote: 1. There is no performance change for cloumns that have all valid values(non- NULLs). I don't see any tests (at all) that measure this. I'm particularly interested in lower numbers of columns, so we can show no regression for the common case. 2. There is a visible performance increase when number of columns containing NULLS are more than 60~70% in table have large number of columns. 3. There are visible space savings when number of columns containing NULLS are more than 60~70% in table have large number of columns. Agreed. I would call that quite disappointing though and was expecting better. Are we sure the patch works and the tests are correct? The lack of any space saving for lower % values is strange and somewhat worrying. There should be a 36? byte saving for 300 null columns in an 800 column table - how does that not show up at all? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Sun, December 23, 2012 15:29, Michael Paquier wrote: Once the 2 small things I noticed are fixed, this patch can be marked as ready for committer. I wasn't going to complain about it, but if we're going for small things anyway... The output is now capitalised: /tmp:6543 - Accepting Connections /tmp:6000 - No Response which looks unusual to me, could we please make it all lower-case? TYhanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Sun, Dec 23, 2012 at 9:57 AM, Erik Rijkers e...@xs4all.nl wrote: On Sun, December 23, 2012 15:29, Michael Paquier wrote: Once the 2 small things I noticed are fixed, this patch can be marked as ready for committer. I wasn't going to complain about it, but if we're going for small things anyway... The output is now capitalised: /tmp:6543 - Accepting Connections /tmp:6000 - No Response which looks unusual to me, could we please make it all lower-case? I'm not apposed to it in principal. Is that how it is done elsewhere in the code? If there are no objections from anyone else I will roll that change in with the others. TYhanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request: pg_replication_master()
On Sat, Dec 22, 2012 at 5:14 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 21.12.2012 21:43, Simon Riggs wrote: On 21 December 2012 19:35, Bruce Momjianbr...@momjian.us wrote: It's not too complex. You just want that to be true. The original developer has actually literally gone away, but not because of this. Well, Robert and I remember it differently. Anyway, I will ask for a vote now. And what will you ask for a vote on? Why not spend that effort on solving the problem? Why is it OK to waste so much time? Having already explained how to do this, I'll add backwards compatibility within 1 day of the commit of the patch you claim was blocked by this. I think it will take me about an hour and not be very invasive, just to prove what a load of hot air is being produced here. I haven't been following this.. Could you two post a link to the patch we're talking about, and the explanation of how to add backwards compatibility to it? The latest patch is the following. Of course, this cannot be applied cleanly in HEAD. http://archives.postgresql.org/message-id/CAHGQGwHB==cjehme6jiuy-knutrx-k3ywqzieg1gpnb3ck6...@mail.gmail.com Just by looking at the last few posts, this seems like a no brainer. The impression I get is that there's a patch that's otherwise ready to be applied, but Simon and some others want to have backwards-compatiblity added to it first. And Simon has a plan on how to do it, and can do it in one day. The obvious solution is that Simon posts the patch, with the backwards-compatibility added. We can then discuss that, and assuming no surprises there, commit it. And everyone lives happily ever after. Sounds reasonable. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Sun, Dec 23, 2012 at 10:07 AM, Phil Sorber p...@omniti.com wrote: On Sun, Dec 23, 2012 at 9:57 AM, Erik Rijkers e...@xs4all.nl wrote: On Sun, December 23, 2012 15:29, Michael Paquier wrote: Once the 2 small things I noticed are fixed, this patch can be marked as ready for committer. I wasn't going to complain about it, but if we're going for small things anyway... The output is now capitalised: /tmp:6543 - Accepting Connections /tmp:6000 - No Response which looks unusual to me, could we please make it all lower-case? I'm not apposed to it in principal. Is that how it is done elsewhere in the code? If there are no objections from anyone else I will roll that change in with the others. TYhanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Updated patch attached. pg_isready_bin_v9.diff Description: Binary data pg_isready_docs_v9.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Simon Riggs si...@2ndquadrant.com writes: The lack of any space saving for lower % values is strange and somewhat worrying. There should be a 36? byte saving for 300 null columns in an 800 column table - how does that not show up at all? You could only fit about 4 such rows in an 8K page (assuming the columns are all int4s). Unless the savings is enough to allow 5 rows to fit in a page, the effective savings will be zilch. This may well mean that the whole thing is a waste of time in most scenarios --- the more likely it is to save anything, the more likely that the savings will be lost anyway due to page alignment considerations, because wider rows inherently pack less efficiently. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 23 December 2012 17:38, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: The lack of any space saving for lower % values is strange and somewhat worrying. There should be a 36? byte saving for 300 null columns in an 800 column table - how does that not show up at all? You could only fit about 4 such rows in an 8K page (assuming the columns are all int4s). Unless the savings is enough to allow 5 rows to fit in a page, the effective savings will be zilch. If that's the case, the use case is tiny, especially considering how sensitive the saving is to the exact location of the NULLs. This may well mean that the whole thing is a waste of time in most scenarios --- the more likely it is to save anything, the more likely that the savings will be lost anyway due to page alignment considerations, because wider rows inherently pack less efficiently. ISTM that we'd get a better gain and a wider use case by compressing the whole block, with some bits masked out to allow updates/deletes. The string of zeroes in the null bitmap would compress easily, but so would other aspects also. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buffer assertion tripping under repeat pgbench load
Andres Freund and...@2ndquadrant.com writes: This is my first test like this against 9.3 development though, so the cause could be an earlier commit. I'm just starting with the most recent work as the first suspect. Next I think I'll try autovacuum=off and see if the crash goes away. Other ideas are welcome. Something like the (untested) debug message below might be helpful: It might also be interesting to know if there is more than one still-pinned buffer --- that is, if you're going to hack the code, fix it to elog(LOG) each pinned buffer and then panic after completing the loop. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
2012/12/22 Kevin Grittner kgri...@mail.com: Kohei KaiGai wrote: RLS entry of wiki has not been updated for long time, I'll try to update the entry for high-level design in a couple of days. Thanks, I think that is essential for a productive discussion of the issue. I tried to update http://wiki.postgresql.org/wiki/RLS I backed to the definition of feature for information security; that requires to ensure confidentiality, integrity and availability (C.I.A) of information asset managed by system. Access control contributes the first two elements. So, I'm inclined RLS feature eventually support reader-side and writer-side, to prevent unprivileged rows are read or written. If I could introduce the most conceptual stuff in one statement, it shall be: Overall, RLS prevents users to read and write rows that does not satisfies the row-security policy being configured on the table by the table owner. Reader-side ensures confidentiality of data, writer-side ensures integrity of data. Also note that, I believe this criteria never deny to have multiple (asymmetric) row-security policy for each command type, as long as we care about problematic scenario properly. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 21 December 2012 16:51, Kevin Grittner kgri...@mail.com wrote: It would be easy enough to include read/write variable clauses by using a function similar to TG_OP for triggers, e.g. StatementType. That would make the security clause that applied only to reads into something like this (StatementType('INSERT, UPDATE') OR other-quals). In the case where the logic in encapsulated into a function, what are the logical differences from a BEFORE EACH ROW trigger? Logically it is broadly similar to a CHECK constraint, which is also similar to a trigger in a few ways. Implemented as a BEFORE EACH ROW trigger it would need to be a new class of trigger that always executed after all other BEFORE EACH ROW triggers had executed, in the same way that security barrier views act last. The act last bit seems to be the most important thing here, just as it was for security barriers and by analogy a string enough reason to add specific syntax to support this case. (Syntax as yet undecided...) If none, and this is strictly an optimization, what are the benchmarks showing? AFAIK its well known that a check constraint is much faster than a trigger. The objection to using triggers is partially for that reason and partially because of the admin overhead, as I already said. Adding a trigger could be done automatically, just as the current patch does with the check constraint approach. So if anyone has benchmarks that show triggers are actually faster, then we could add that automatically instead, I guess. Anyway, hope you can make call on 28th so we can discuss this and agree a way forwards you're happy with. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
Simon Riggs si...@2ndquadrant.com writes: On 21 December 2012 16:51, Kevin Grittner kgri...@mail.com wrote: If none, and this is strictly an optimization, what are the benchmarks showing? AFAIK its well known that a check constraint is much faster than a trigger. I don't believe that that's well known at all, at least not for apples-to-apples comparison cases. A C-coded BEFORE trigger doesn't have very much overhead; I suspect it's probably comparable to expression evaluation setup overhead. I think if you want to argue for this on performance grounds, you need to actually prove there's a significant performance advantage, not just assume there will be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 23 December 2012 19:16, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 21 December 2012 16:51, Kevin Grittner kgri...@mail.com wrote: If none, and this is strictly an optimization, what are the benchmarks showing? AFAIK its well known that a check constraint is much faster than a trigger. I don't believe that that's well known at all, at least not for apples-to-apples comparison cases. A C-coded BEFORE trigger doesn't have very much overhead; I suspect it's probably comparable to expression evaluation setup overhead. I think if you want to argue for this on performance grounds, you need to actually prove there's a significant performance advantage, not just assume there will be. If you want to see some tests, I'm sure those can be arranged, no problem. But honestly, if its low enough, then which is the fastest will likely be moot in comparison with the cost of a non-C coded role-based security check. So I think our attention is best spent on providing a few likely C-coded security checks, so we're able to address the whole performance concern not just the constraint/trigger debate. That still leaves the points about ensuring the trigger/checks are executed last and also that they are added automatically, rather than requiring them to be added manually. As KaiGai points out, if they are added automatically, it doesn't really matter which we pick. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buffer assertion tripping under repeat pgbench load
On 12/23/12 1:10 PM, Tom Lane wrote: It might also be interesting to know if there is more than one still-pinned buffer --- that is, if you're going to hack the code, fix it to elog(LOG) each pinned buffer and then panic after completing the loop. Easy enough; I kept it so the actual source of panic is still an assertion failure: diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dddb6c0..df43643 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1697,11 +1697,21 @@ AtEOXact_Buffers(bool isCommit) if (assert_enabled) { int i; + int RefCountErrors = 0; for (i = 0; i NBuffers; i++) { - Assert(PrivateRefCount[i] == 0); + + if (PrivateRefCount[i] != 0) + { + BufferDesc *bufHdr = BufferDescriptors[i]; + elog(LOG, refcount of %s is %u should be 0, globally: %u, +relpathbackend(bufHdr-tag.rnode, InvalidBackendId, bufHdr-tag.forkNum), +PrivateRefCount[i], bufHdr-refcount); + RefCountErrors++; + } } + Assert(RefCountErrors == 0); } #endif -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
Marko Kreen mark...@gmail.com writes: On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not thrilled with the suggestion to do RAND_cleanup() after forking though, as that seems like it'll guarantee that each session starts out with only a minimal amount of entropy. No, that's actually the right fix - that forces OpenSSL to do new reseed with system randomness, thus making backend (including SSL_accept) maximally disconnected from static pool in postmaster. I don't think that maximal disconnectedness is the deciding factor here, or even a very important factor. If we do RAND_cleanup() then each new session will be forced to suck entropy from /dev/urandom (immediately, if an SSL connection is attempted). This opens the door to quasi-denial-of-service attacks that bleed all the entropy out of the system, forcing long delays for new PRNG-using processes, Postgres or otherwise. And long delays are the *best case* scenario --- worst case, if the system lacks decent /dev/random support, is that new processes are starting up with highly correlated PRNG states. If such an approach were a good thing, the OpenSSL guys would have implemented it internally --- it'd be easy enough to do, just by forcing RAND_cleanup anytime getpid() reads differently than it did when the pool was set up. gettimeofday() might not be the very best way of changing the inherited PRNG state from child to child, but I think it's a more attractive solution than RAND_cleanup. This also makes behaviour equal to current ssl=off and exec-backend mode, which already do initial seeding in backend. No, it's not equal, because when ssl=off seeding attempts won't happen immediately after fork and thus an attacker doesn't have such an easy way of draining entropy. If we do what you're suggesting, the attacker doesn't even need a valid database login to drain entropy --- he can just fire-and-forget NEGOTIATE_SSL startup packets. (The exec-backend case will have such issues anyway, but who thought Windows was a secure OS?) If you decide against RAND_cleanup in backend and instead do workarounds in backend or postmaster, then please also to periodic RAND_cleanup in postmaster. This should make harder to exploit weaknesses in reused slowly moving state. We could consider that, but it's not apparent to me that it has any real value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
Noah Misch n...@leadboat.com writes: On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote: I believe that we'd be better off doing something in postmaster.c to positively ensure that each session has a distinct seed value. Notice that BackendRun() already takes measures to ensure that's the case for the regular libc random() function; it seems like a reasonable extension to also worry about OpenSSL's PRNG. #ifdef USE_SSL if (EnableSSL) { struct timeval tv; gettimeofday(tv, NULL); RAND_add(tv, sizeof(tv), 0); } #endif Take the caution one step further and make it independent of EnableSSL. In a stock installation, a !EnableSSL postmaster will never seed its PRNG, and there's no vulnerability. Add a shared_preload_libraries module that uses the OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again. Meh. In a postmaster that wasn't built with SSL support at all, such a module is still dangerous (and I'm not convinced anybody would build such a module anyway). I think we should confine our ambitions to preventing security issues caused by our own code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
Andres Freund and...@2ndquadrant.com writes: On 2012-12-22 14:20:56 -0500, Tom Lane wrote: I believe that we'd be better off doing something in postmaster.c to positively ensure that each session has a distinct seed value. I am entirely unconvinced that adding in a gettimeofday() provides more entropy than what openssl gathers internally after a RAND_cleanup(). No, it doesn't provide more entropy, but what it does do is avoid draining entropy from the kernel (see my reply to Marko just now). gettimeofday() will yield the same result in close enough forks on a significant number of systems whereas openssl should use stuff like /dev/urandom to initialize its PRNG after a cleanup. Well, in the first place, that doesn't matter much because the PRNG state will still change from repeated mix-ins of gettimeofday, even if we obtain the same reading successive times. In the second place, if we add it where I suggest, it would be easy to also include the child process PID in the added entropy -- we could xor it into the tv_sec value for instance -- thus further reducing the chance of identical values getting added in. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buffer assertion tripping under repeat pgbench load
On 23 December 2012 19:42, Greg Smith g...@2ndquadrant.com wrote: diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dddb6c0..df43643 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1697,11 +1697,21 @@ AtEOXact_Buffers(bool isCommit) if (assert_enabled) { int i; + int RefCountErrors = 0; for (i = 0; i NBuffers; i++) { - Assert(PrivateRefCount[i] == 0); + + if (PrivateRefCount[i] != 0) + { + BufferDesc *bufHdr = BufferDescriptors[i]; + elog(LOG, refcount of %s is %u should be 0, globally: %u, +relpathbackend(bufHdr-tag.rnode, InvalidBackendId, bufHdr-tag.forkNum), +PrivateRefCount[i], bufHdr-refcount); + RefCountErrors++; + } } + Assert(RefCountErrors == 0); } #endif We already have PrintBufferLeakWarning() for this, which might be a bit neater. If that last change was the cause, then its caused within VACUUM. I'm running a thrash test with autovacuums set much more frequently but nothing yet. Are you by any chance running with unlogged tables? There was a change there recently, something around checkpoint IIRC. Can you set checkpoints more frequent also? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making view dump/restore safe at the column-alias level
On Fri, Dec 21, 2012 at 9:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm having a hard time following this. Can you provide a concrete example? regression=# create table t1 (x int, y int); CREATE TABLE regression=# create table t2 (x int, z int); CREATE TABLE regression=# create view v1 as select * from t1 join t2 using (x); CREATE VIEW regression=# \d+ v1 View public.v1 Column | Type | Modifiers | Storage | Description +-+---+-+- x | integer | | plain | y | integer | | plain | z | integer | | plain | View definition: SELECT t1.x, t1.y, t2.z FROM t1 JOIN t2 USING (x); regression=# alter table t2 rename column x to q; ALTER TABLE regression=# \d+ v1 View public.v1 Column | Type | Modifiers | Storage | Description +-+---+-+- x | integer | | plain | y | integer | | plain | z | integer | | plain | View definition: SELECT t1.x, t1.y, t2.z FROM t1 JOIN t2 USING (x); At this point the dumped view definition is wrong: if you try to execute it you get regression=# SELECT t1.x, t1.y, t2.z regression-#FROM t1 regression-#JOIN t2 USING (x); ERROR: column x specified in USING clause does not exist in right table I'm suggesting that we could fix this by emitting something that forces the right alias to be assigned to t2.q: SELECT t1.x, t1.y, t2.z FROM t1 JOIN t2 AS t2(x,z) USING (x); Sneaky. I didn't know that would even work, but it seems like a sensible approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buffer assertion tripping under repeat pgbench load
On 12/23/12 3:17 PM, Simon Riggs wrote: If that last change was the cause, then its caused within VACUUM. I'm running a thrash test with autovacuums set much more frequently but nothing yet. I am not very suspicious of that VACUUM change; just pointed it out for completeness sake. Are you by any chance running with unlogged tables? There was a change there recently, something around checkpoint IIRC. Can you set checkpoints more frequent also? This is straight pgbench, modifying only scale, client count, workers, and duration. No unlogged tables involved. The crash is so intermittent that I'm trying not to change anything until I get more productive output out of one. The last run I kicked off has been chugging for 14 hours without a hiccup yet. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buffer assertion tripping under repeat pgbench load
On 23 December 2012 21:52, Greg Smith g...@2ndquadrant.com wrote: On 12/23/12 3:17 PM, Simon Riggs wrote: If that last change was the cause, then its caused within VACUUM. I'm running a thrash test with autovacuums set much more frequently but nothing yet. I am not very suspicious of that VACUUM change; just pointed it out for completeness sake. Nothing seen here either after 100 minutes of thrash with ~1 VACUUM per sec, at SF1 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers: adding information
On Fri, Dec 21, 2012 at 11:35 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: It's hard to devise exactly what kind of refactoring we need to do before trying to write a patch that benefits from it, in my experience. In some cases the refactoring will make things more complex, not less. Perhaps. I have a feeling there's some more simplification that can be done here, somehow. That's a good idea. I only started quite late in that patch to work on the ObjectID piece of information, that's why I didn't split it before doing so. That's fine. I've committed that part. I think the remainder of the patch is really several different features that ought to be split apart and considered independently. We may want some but not others, or some may be ready to go in but not others. Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP statements, so my current patch is still some bricks shy of a load… (I added ObjectID only in the commands I added rewrite support for, apart from DROP). I shall rely on you to provide those bricks which are still missing. change to gram.y that affects any of the supported commands will require a compensating change to ddl_rewrite.c. That is a whole lot of work and it is almost guaranteed that future patch authors will sometimes fail to do it correctly. At an absolute bare minimum I think we would need some kind of comprehensive test suite for this functionality, as opposed to the less than 60 lines of new test cases this patch adds which completely fail to test this code in any way at all, but really I think we should just not have it. It WILL get I had a more complete test suite last rounds and you did oppose to it for good reasons. I'm ok to revisit that now, and I think the test case should look like this: - install an auditing command trigger that will insert any DDL command string into a table with a sequence ordering - select * from audit - \d… of all objects created - drop schema cascade - DO $$ loop for sql in select command from audit do execute sql … - \d… of all objects created again Then any whacking of the grammar should alter the output here and any case that's not supported will fail too. We might be able to have a better way of testing the feature here, but I tried to stay into the realms of what I know pg_regress able to do. I was thinking that we might need to go beyond what pg_regress can do in this case. For example, one idea would be to install an audit trigger that records all the DDL that happens during the regression tests. Then, afterwards, replay that DDL into a new database. Then do a schema-only dump of the old and new databases and diff the dump files. That's a little wacky by current community standards but FWIW EDB has a bunch of internal tests that we run to check our proprietary stuff; some of them work along these lines and it's pretty effective at shaking out bugs. In this particular case, it would significantly reduce the maintenance burden of putting a proper test framework in place, because we can hope that the regression tests create (and leave around) at least one object of any given type, and that any new DDL features will be accompanied by similar regression tests. We already rely on the regression database for pg_upgrade testing, so if it's not complete, it impacts pg_upgrade test coverage as well. broken (if it isn't already) and it WILL slow down future development of SQL features. It also WILL have edge cases where it doesn't work properly, such as where the decision of the actual index name to use is only decided at execution time and cannot be inferred from the parse tree. I know that you feel that all of these are tolerable The way to solve that, I think, as Tom said, is to only rewrite the command string when the objects exist in the catalogs. That's why we now have ddl_command_trace which is an alias to either start or end depending on whether we're doing a DROP or a CREATE or ALTER operation. Hmm. I have to study that more, maybe. I certainly agree that if you can look at the catalogs, you should be able to reliably reconstruct what happened - which isn't possible just looking at the parse tree. However, it feels weird to me to do something that's partly based on the parse tree and partly based on the catalog contents. Moreover, the current pre-create hook isn't the right place to gather information from the catalogs anyway as it happens before locks have been taken. Now, there's another thing that is bothering me here, too. How complete is the support you've implemented in this patch? Does it cover ALL CREATE/ALTER/DROP commands, including all options to and all variants of those commands? Because while I'm not very sanguine about doing this at all, it somehow feels like doing it only halfway would be even worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via
[HACKERS] initdb and share/postgresql.conf.sample
In some of my git branches I have editorialized src/backend/utils/misc/postgresql.conf.sample to contain my configuration preferences for whatever it is that that branch is for testing. Then this gets copied to share/postgresql.conf.sample during install and from there to data/postgresql.conf during initdb, and I don't need to remember to go make the necessary changes. Am I insane to be doing this? Is there a better way to handle this branch-specific configuration needs? Anyway, I was recently astonished to discovery that the contents of share/postgresql.conf.sample during the initdb affected the performance of the server, even when the conf file was replaced with something else before the server was started up. To make a very long story short, if share/postgresql.conf.sample is set up for archiving, then somewhere in the initdb process some bootstrap process pre-creates a bunch of extra xlog files. Is this alarming? It looks like initdb takes some pains, at least on one place, to make an empty config file rather than using postgresql.conf.sample, but it seems like a sub-process is not doing that. Those extra log files then give the newly started server a boost (whether it is started in archive mode or not) because it doesn't have to create them itself. It isn't so much a boost, as the absence of a new-server penalty. I want to remove that penalty to get better numbers from benchmarking. What I am doing now is this, between the initdb and the pg_ctl start: for g in `perl -e 'printf(000100%02X\n,$_) foreach 2..120'`; do cp /tmp/data/pg_xlog/00010001 /tmp/data/pg_xlog/$g -i /dev/null; The 120 comes from 2 * checkpoint_segments. That's mighty ugly, is there a better trick? You could say that benchmarks should run long enough to average out such changes, but needing to run a benchmark that long can make some kinds of work (like git bisect) unrealistic rather than merely tedious. Cheers, Jeff
Re: [HACKERS] WIP: store additional info in GIN index
Hi! On 22.12.2012 17:15, Alexander Korotkov wrote: I'm not saying this is a perfect benchmark, but the differences (of querying) are pretty huge. Not sure where this difference comes from, but it seems to be quite consistent (I usually get +-10% results, which is negligible considering the huge difference). Is this an expected behaviour that will be fixed by another patch? Another patches which significantly accelerate index search will be provided. This patch changes only GIN posting lists/trees storage. However, it wasn't expected that this patch significantly changes index scan speed in any direction. That was exactly my expectation - probably not an improvement, but definitely not a worse performance. The database contains ~680k messages from the mailing list archives, i.e. about 900 MB of data (in the table), and the GIN index on tsvector is about 900MB too. So the whole dataset nicely fits into memory (8GB RAM), and it seems to be completely CPU bound (no I/O activity at all). The configuration was exactly the same in both cases shared buffers = 1GB work mem = 64 MB maintenance work mem = 256 MB I can either upload the database somewhere, or provide the benchmarking script if needed. Unfortunately, I can't reproduce such huge slowdown on my testcases. Could you share both database and benchmarking script? It's strange, but no matter what I do I can't reproduce those results (with the significant performance decrease). So either I've done some strange mistake when running those tests, or there was something wrong with my system, or whatever :-( But when running the benchmarks now (double-checked everything, properly repeated the tests, ...), I've noticed a different behaviour. But first some info about the scripts I use for testing. All the scripts are available here: https://bitbucket.org/tvondra/archie It's my hobby project implementing fulltext mbox archive. It should be usable but it's still a bit WIP so let me know in case of any issues. The README should give you all the instructions on how to setup and load the database. I'm using ~1700 mbox files downloaded from http://archives.postgresql.org/ for these lists (until 2012/11): pgadmin-hackers pgsql-advocacy pgsql-announce pgsql-bugs pgsql-general pgsql-hackers pgsql-jdbc pgsql-novice pgsql-odbc pgsql-patches pgsql-sql which in the end gives ~677k rows in the 'messages' table, occupying ~5.5GB disk space (including all the indexes etc). Once you have the data loaded, you need to warmup the database and then start benchmarking it - I'm using the warmup.py script to both things. The script is quite simple, it basically just To warmup the DB, just run this ./warmup.py --db archie --duration 300 until the %util drops near 0 (assuming you have enough RAM to fit the whole database into memory). Then I usually do this as a benchmarking ./warmup.py --db archie --duration 300 --no-hash \ --no-thread --words 1 ./warmup.py --db archie --duration 300 --no-hash \ --no-thread --words 2 which runs 60-second tests and outputs one line for worker (by default equal to the number of CPUs). The script itself is very simple, it fetches a random message and uses the tsvector column as a source of words for the actual benchmark. It takes N words from the tsvector, splits them into groups and performs a simple fulltext query using plainto_tsquery('word1 word2 ...'). At the end it prints info including the number of queries per second. I've run the tests on the current master with and without the v3 patch. I've tested it with 1GB or 2GB shared buffers, and 32MB or 64MB work mem. The tests were run for 1, 2, 3, 4 and 5 words, and I've repeated it five times for each configuration. Duration of each run was 5-minutes. These are the averages (from the 5 runs) of queries per second for each combination of parameters: 1 2 3 4 5 master 1GB/32MB 19 179 165 12799 patched1GB/32MB 19 175 163 12496 master 1GB/64MB 20 181 165 12799 patched1GB/64MB 19 174 159 12095 master 2GB/32MB 27 181 165 12798 patched2GB/32MB 25 176 156 12093 master 2GB/64MB 27 180 166 128 102 patched2GB/64MB 40 402 364 245 176 There's no significant difference in performance, except for the 2GB/64MB combination. And in that case it's actually the opposite direction than I've reported before - i.e. this time it's up to 100% faster than the unpatched master. The results are pretty consistent (very small variance across the repeated runs), so I'm not sure about the previous results. Any idea what might cause such behavior? Why should it
Re: [HACKERS] initdb and share/postgresql.conf.sample
On Sun, Dec 23, 2012 at 11:11 PM, Jeff Janes jeff.ja...@gmail.com wrote: You could say that benchmarks should run long enough to average out such changes I would say any benchmark needs to be run long enough to reach a steady state before the measurements are taken. The usual practice is to run a series groups and observe the aggregate measurements for each group. For instance run 10 runs with each run including of 1000 repetitions of the transaction. Then you can observe at which point the averages for individual groups begin to behave consistently. If the first three are outliers but the remaining 7 are stable then discard the first three and take the average (or often median) of the remaining 7. If you include the early runs which are affected by non-steady-state conditions such as cache effects or file fragmentation then it can take a very long time for those effects to be erased by averaging with later results. Worse, it's very difficult to tell whether you've waited long enough. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
On Sun, Dec 23, 2012 at 02:49:08PM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote: #ifdef USE_SSL if (EnableSSL) { struct timeval tv; gettimeofday(tv, NULL); RAND_add(tv, sizeof(tv), 0); } #endif Take the caution one step further and make it independent of EnableSSL. In a stock installation, a !EnableSSL postmaster will never seed its PRNG, and there's no vulnerability. Add a shared_preload_libraries module that uses the OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again. Meh. In a postmaster that wasn't built with SSL support at all, such a module is still dangerous (and I'm not convinced anybody would build such a module anyway). I think we should confine our ambitions to preventing security issues caused by our own code. You're adding lines of code to prematurely micro-optimize the backend fork cycle. If code introduced into the postmaster, by us or others, ever violates the assumption behind that micro-optimization, certain users get a precipitous loss of security with no clear alarm bells. I don't like that trade. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 20.12.2012 12:33, Andres Freund wrote: On 2012-12-20 02:54:48 +0100, Tomas Vondra wrote: On 19.12.2012 02:18, Andres Freund wrote: On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote: I think except of the temp buffer issue mentioned below its ready. -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode) +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes) { - int i; + int i, j; + + /* sort the list of rnodes */ + pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator); /* If it's a local relation, it's localbuf.c's problem. */ - if (RelFileNodeBackendIsTemp(rnode)) + for (i = 0; i nnodes; i++) { - if (rnode.backend == MyBackendId) - DropRelFileNodeAllLocalBuffers(rnode.node); - return; + if (RelFileNodeBackendIsTemp(rnodes[i])) + { + if (rnodes[i].backend == MyBackendId) + DropRelFileNodeAllLocalBuffers(rnodes[i].node); + } } While you deal with local buffers here you don't anymore in the big loop over shared buffers. That wasn't needed earlier since we just returned after noticing we have local relation, but thats not the case anymore. Hmm, but that would require us to handle the temp relations explicitly wherever we call DropRelFileNodeAllBuffers. Currently there are two such places - smgrdounlink() and smgrdounlinkall(). By placing it into DropRelFileNodeAllBuffers() this code is shared and I think it's a good thing. But that does not mean the code is perfect - it was based on the assumption that if there's a mix of temp and regular relations, the temp relations will be handled in the first part and the rest in the second one. Maybe it'd be better to improve it so that the temp relations are removed from the array after the first part (and skip the second one if there are no remaining relations). The problem is that afaik without the backend-local part a temporary relation could match the same relfilenode as a full relation which would have pretty bad consequences. Attached is a patch with fixed handling of temporary relations. I've chosen to keep the logic in DropRelFileNodeAllBuffers and rather do a local copy without the local relations. regards Tomas diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 8dc622a..8c12a43 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -312,6 +312,11 @@ smgrDoPendingDeletes(bool isCommit) PendingRelDelete *pending; PendingRelDelete *prev; PendingRelDelete *next; + + SMgrRelation *srels = palloc(sizeof(SMgrRelation)); + int nrels = 0, + i = 0, + maxrels = 1; prev = NULL; for (pending = pendingDeletes; pending != NULL; pending = next) @@ -335,14 +340,31 @@ smgrDoPendingDeletes(bool isCommit) SMgrRelation srel; srel = smgropen(pending-relnode, pending-backend); - smgrdounlink(srel, false); - smgrclose(srel); + + /* extend the array if needed (double the size) */ + if (maxrels = nrels) { + maxrels *= 2; + srels = repalloc(srels, sizeof(SMgrRelation) * maxrels); + } + + srels[nrels++] = srel; } /* must explicitly free the list entry */ pfree(pending); /* prev does not change */ } } + + if (nrels 0) + { + smgrdounlinkall(srels, nrels, false); + + for (i = 0; i nrels; i++) + smgrclose(srels[i]); + } + + pfree(srels); + } /* diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dddb6c0..8a7190b 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -62,6 +62,7 @@ #define BUF_WRITTEN0x01 #define BUF_REUSABLE 0x02 +#define BSEARCH_LIMIT 10 /* GUC variables */ bool zero_damaged_pages = false; @@ -108,6 +109,7 @@ static volatile BufferDesc *BufferAlloc(SMgrRelation smgr, static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln); static void AtProcExit_Buffers(int code, Datum arg); +static int rnode_comparator(const void * p1, const void * p2); /* * PrefetchBuffer -- initiate asynchronous read of a block of a relation @@ -2094,35 +2096,85 @@
[HACKERS] compliation error in 9.3 devel
The following commit is causing some compilation errors. c504513f83a9ee8dce4a719746ca73102cae9f13 Error: ../../../src/include/catalog/pg_aggregate.h:246: error: previous declaration of AggregateCreate was here make: *** [pg_aggregate.o] Error 1 Author: Robert Haas rh...@postgresql.org 2012-12-24 04:55:03 Committer: Robert Haas rh...@postgresql.org 2012-12-24 05:07:58 Parent: 31bc839724439440b2e94ea616b28ce5be94e19c (Prevent failure when RowExpr or XmlExpr is parse-analyzed twice.) Child: (Local uncommitted changes, not checked in to index) Branches: Truncate-trailing-null-columns, master Follows: REL9_2_BETA2 Precedes: Adjust many backend functions to return OID rather than void. Extracted from a larger patch by Dimitri Fontaine. It is hoped that this will provide infrastructure for enriching the new event trigger functionality, but it seems possibly useful for other purposes as well. Regards, Hari babu.