Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Andres Freund
Hi,

On 2012-12-23 02:36:42 -0500, Greg Smith wrote:
 I'm testing a checkout from a few days ago and trying to complete a day long
 pgbench stress test, with assertions and debugging on.  I want to make sure
 the base code works as expected before moving on to testing checksums.  It's
 crashing before finishing though.  Here's a sample:

 2012-12-20 20:36:11 EST [26613]: LOG:  checkpoint complete: wrote 32 buffers
 (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.108
 s, sync=0.978 s, total=4.187 s; sync files=7, longest=0.832 s, average=0.139
 s
 TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line:
 1703)
 2012-12-20 20:36:44 EST [26611]: LOG:  server process (PID 4834) was
 terminated by signal 6: Aborted
 2012-12-20 20:36:44 EST [26611]: DETAIL:  Failed process was running: END;

 Running the same test set again gave the same crash, so this looks
 repeatable.  It's not trivial to trigger given the average runtime to crash
 I'm seeing.  Second sample:

 2012-12-22 21:27:17 EST [6006]: LOG:  checkpoint complete: wrote 34 buffers
 (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.310
 s, sync=2.906 s, total=6.424 s; sync files=7, longest=2.648 s, average=0.415
 s
 TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line:
 1703)
 2012-12-22 21:27:23 EST [26611]: LOG:  server process (PID 12143) was
 terminated by signal 6: Aborted
 2012-12-22 21:27:23 EST [26611]: DETAIL:  Failed process was running: END;

 The important part:

 TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line:
 1703)

 That's the check done by AtEOXact_Buffers - clean up at end of
 transaction.  It fired while executing the END; statement at the end of
 the standard pgbench test.  The test looks for loose things that weren't
 pinned/unpinned correctly by the transaction.  I'm not sure if getting a
 stack trace at that point will be useful.  The ref count mistake could
 easily have been made by an earlier statement in the transaction block,
 right?

 This is on a server where shared_buffers=2GB, checkpoint_segments=64. Test
 runs were automated with pgbench-tools, using the following configuration:

 MAX_WORKERS=4
 SCRIPT=tpc-b.sql
 SCALES=500 1000
 SETCLIENTS=4 8 16 32 64 96
 SETTIMES=3
 RUNTIME=600

 All of the crashes so far have been at scale=500, and I haven't seen those
 finish yet.  It failed 7 hours in the first time, then 4 hours in.

 For code reference, the last commit in the source code tree I built against
 was:

 af275a12dfeecd621ed9899a9382f26a68310263
 Thu Dec 20 14:23:31 2012 +0200

 Looking at recent activity, the only buffer pin related changes I saw was
 this one:

 commit 62656617dbe49cca140f3da588a5e311b3fc35ea
 Date:   Mon Dec 3 18:53:31 2012 +
 Avoid holding vmbuffer pin after VACUUM.

I don't immediately see anything bad here.

 This is my first test like this against 9.3 development though, so the cause
 could be an earlier commit.  I'm just starting with the most recent work as
 the first suspect.  Next I think I'll try autovacuum=off and see if the
 crash goes away.  Other ideas are welcome.

Something like the (untested) debug message below might be helpful:

diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index 7be767b..20f61a1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1700,7 +1700,13 @@ AtEOXact_Buffers(bool isCommit)
 
for (i = 0; i  NBuffers; i++)
{
-   Assert(PrivateRefCount[i] == 0);
+   if (PrivateRefCount[i] != 0)
+   {
+   BufferDesc *bufHdr = BufferDescriptors[i];
+   elog(PANIC, refcount of %s is %u should be 0, 
globally: %u,
+relpathbackend(bufHdr-tag.rnode, 
InvalidBackendId, bufHdr-tag.forkNum),
+PrivateRefCount[i], bufHdr-refcount);
+   }
}
}
 #endif

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-23 Thread Marko Kreen
On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not thrilled with the suggestion to do RAND_cleanup() after forking
 though, as that seems like it'll guarantee that each session starts out
 with only a minimal amount of entropy.

No, that's actually the right fix - that forces OpenSSL to do new reseed
with system randomness, thus making backend (including SSL_accept)
maximally disconnected from static pool in postmaster.

This also makes behaviour equal to current ssl=off and exec-backend
mode, which already do initial seeding in backend.

The fact that PRNG behaviour is affected by complex set of compile-
and run-time switches makes current situation rather fragile and
hard to understand.

  What seems to me like it'd be
 most secure is to make the postmaster do RAND_add() with a gettimeofday
 result after each successful fork, say at the bottom of
 BackendStartup().  That way the randomness accumulates in the parent
 process, and there's no way to predict the initial state of any session
 without exact knowledge of every child process launch since postmaster
 start.  So it'd go something like

 #ifdef USE_SSL
 if (EnableSSL)
 {
 struct timeval tv;

 gettimeofday(tv, NULL);
 RAND_add(tv, sizeof(tv), 0);
 }
 #endif

If you decide against RAND_cleanup in backend and instead
do workarounds in backend or postmaster, then please
also to periodic RAND_cleanup in postmaster.  This should
make harder to exploit weaknesses in reused slowly moving state.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup from cascading standby after timeline switch

2012-12-23 Thread Fujii Masao
On Fri, Dec 21, 2012 at 9:54 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Yes, this should be backpatched to 9.2. I came up with the attached.

In this patch, if '-X stream' is specified in pg_basebackup, the timeline
history files are not backed up. We should change pg_backup background
process and walsender so that they stream also timeline history files,
for example, by using 'TIMELINE_HISTORY' replication command?
Or basebackup.c should send all timeline history files at the end of backup
even if '-X stream' is specified?

 However, thinking about this some more, there's a another bug in the way WAL
 files are included in the backup, when a timeline switch happens.
 basebackup.c includes all the WAL files on ThisTimeLineID, but when the
 backup is taken from a standby, the standby might've followed a timeline
 switch. So it's possible that some of the WAL files should come from
 timeline 1, while others should come from timeline 2. This leads to an error
 like requested WAL segment 0001000C has already been
 removed in pg_basebackup.

 Attached is a script to reproduce that bug, if someone wants to play with
 it. It's a bit sensitive to timing, and needs tweaking the paths at the top.

 One solution to that would be to pay more attention to the timelines to
 include WAL from. basebackup.c could read the timeline history file, to see
 exactly where the timeline switches happened, and then construct the
 filename of each WAL segment using the correct timeline id. Another approach
 would be to do readdir() on pg_xlog, and include all WAL files, regardless
 of timeline IDs, that fall in the right XLogRecPtr range. The latter seems
 easier to backpatch.

The latter sounds good to me. But how all WAL files with different timelines
are shipped in pg_basebackup -X stream mode?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-23 Thread Simon Riggs
On 22 December 2012 20:13, Kevin Grittner kgri...@mail.com wrote:

 I apologize again for coming in so late with strong opinions, but I
 thought I knew what row level security meant, and it was just a
 question of how to do it, but I can't reconcile what I thought the
 feature was about with the patch I'm seeing; perhaps it's just a
 lack of the hight level context  that's making it difficult.

Agreed, I think we're all feeling that. I'll do my best to accommodate
all viewpoints.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-12-23 Thread Michael Paquier
On Sat, Dec 22, 2012 at 4:07 AM, Phil Sorber p...@omniti.com wrote:


 Added new version with default verbose and quiet option. Also updated
 docs to reflect changes.

Thanks for the updated patches.

Here is the status about the binary patch:
- Code compiles without any warnings
- After testing the patch, it behaves as expected, default option is now
verbose, the output can be hidden using -q or --quiet
However, I still have the following comments:
- in pg_isready.c, the function help needs to be static.
- the list of options called with getopt_long should be classified in
alphabetical order (the option q is not at the correct position)
d:h:p:U:qV = d:h:p:qU:V

Then, about the doc patch:
- docs compile correctly (I did a manual check)
- I am not a native English speaker, but the docs look correct to me. There
are enough examples and description is enough. No problems either with the
sgml format.

Once the 2 small things I noticed are fixed, this patch can be marked as
ready for committer.
Thanks,
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] [WIP] pg_ping utility

2012-12-23 Thread Phil Sorber
On Sun, Dec 23, 2012 at 9:29 AM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Sat, Dec 22, 2012 at 4:07 AM, Phil Sorber p...@omniti.com wrote:


 Added new version with default verbose and quiet option. Also updated
 docs to reflect changes.

 Thanks for the updated patches.

 Here is the status about the binary patch:
 - Code compiles without any warnings
 - After testing the patch, it behaves as expected, default option is now
 verbose, the output can be hidden using -q or --quiet
 However, I still have the following comments:
 - in pg_isready.c, the function help needs to be static.

I have no objection to making this static, but curious what is your
reason for it here?

 - the list of options called with getopt_long should be classified in
 alphabetical order (the option q is not at the correct position)
 d:h:p:U:qV = d:h:p:qU:V

 Then, about the doc patch:
 - docs compile correctly (I did a manual check)
 - I am not a native English speaker, but the docs look correct to me. There
 are enough examples and description is enough. No problems either with the
 sgml format.

 Once the 2 small things I noticed are fixed, this patch can be marked as
 ready for committer.

Ok, I will get an updated version later today.

 Thanks,
 --
 Michael Paquier
 http://michael.otacoo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching timeline over streaming replication

2012-12-23 Thread Fujii Masao
On Fri, Dec 21, 2012 at 1:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Dec 15, 2012 at 9:36 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Dec 8, 2012 at 12:51 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 06.12.2012 15:39, Amit Kapila wrote:

 On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:

 On 05.12.2012 14:32, Amit Kapila wrote:

 On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:

 After some diversions to fix bugs and refactor existing code, I've
 committed a couple of small parts of this patch, which just add some
 sanity checks to notice incorrect PITR scenarios. Here's a new
 version of the main patch based on current HEAD.


 After testing with the new patch, the following problems are observed.

 Defect - 1:

   1. start primary A
   2. start standby B following A
   3. start cascade standby C following B.
   4. start another standby D following C.
   5. Promote standby B.
   6. After successful time line switch in cascade standby C   D,

 stop D.

   7. Restart D, Startup is successful and connecting to standby C.
   8. Stop C.
   9. Restart C, startup is failing.


 Ok, the error I get in that scenario is:

 C 2012-12-05 19:55:43.840 EET 9283 FATAL:  requested timeline 2 does not
 contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05
 19:55:43.841 EET 9282 LOG:  startup process (PID 9283) exited with exit
 code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG:  aborting startup due to
 startup process failure



 That mismatch causes the error. I'd like to fix this by always treating
 the checkpoint record to be part of the new timeline. That feels more
 correct. The most straightforward way to implement that would be to peek
 at the xlog record before updating replayEndRecPtr and replayEndTLI. If
 it's a checkpoint record that changes TLI, set replayEndTLI to the new
 timeline before calling the redo-function. But it's a bit of a
 modularity violation to peek into the record like that.

 Or we could just revert the sanity check at beginning of recovery that
 throws the requested timeline 2 does not contain minimum recovery point
 0/3023F08 on timeline 1 error. The error I added to redo of checkpoint
 record that says unexpected timeline ID %u in checkpoint record, before
 reaching minimum recovery point %X/%X on timeline %u checks basically
 the same thing, but at a later stage. However, the way
 minRecoveryPointTLI is updated still seems wrong to me, so I'd like to
 fix that.

 I'm thinking of something like the attached (with some more comments
 before committing). Thoughts?


 This has fixed the problem reported.
 However, I am not able to think will there be any problem if we remove
 check
 requested timeline 2 does not contain minimum recovery point

 0/3023F08 on timeline 1 at beginning of recovery and just update

 replayEndTLI with ThisTimeLineID?


 Well, it seems wrong for the control file to contain a situation like this:

 pg_control version number:932
 Catalog version number:   201211281
 Database system identifier:   5819228770976387006
 Database cluster state:   shut down in recovery
 pg_control last modified: pe  7. joulukuuta 2012 17.39.57
 Latest checkpoint location:   0/3023EA8
 Prior checkpoint location:0/260
 Latest checkpoint's REDO location:0/3023EA8
 Latest checkpoint's REDO WAL file:00020003
 Latest checkpoint's TimeLineID:   2
 ...
 Time of latest checkpoint:pe  7. joulukuuta 2012 17.39.49
 Min recovery ending location: 0/3023F08
 Min recovery ending loc's timeline:   1

 Note the latest checkpoint location and its TimelineID, and compare them
 with the min recovery ending location. The min recovery ending location is
 ahead of latest checkpoint's location; the min recovery ending location
 actually points to the end of the checkpoint record. But how come the min
 recovery ending location's timeline is 1, while the checkpoint record's
 timeline is 2.

 Now maybe that would happen to work if remove the sanity check, but it still
 seems horribly confusing. I'm afraid that discrepancy will come back to
 haunt us later if we leave it like that. So I'd like to fix that.

 Mulling over this for some more, I propose the attached patch. With the
 patch, we peek into the checkpoint record, and actually perform the timeline
 switch (by changing ThisTimeLineID) before replaying it. That way the
 checkpoint record is really considered to be on the new timeline for all
 purposes. At the moment, the only difference that makes in practice is that
 we set replayEndTLI, and thus minRecoveryPointTLI, to the new TLI, but it
 feels logically more correct to do it that way.

 This patch has already been included in HEAD. Right?

 I found another requested timeline does not contain minimum recovery point
 error scenario in HEAD:

 1. Set up the master 'M', 

Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2012-12-23 Thread Simon Riggs
On 20 December 2012 14:56, Amit Kapila amit.kap...@huawei.com wrote:

  1. There is no performance change for cloumns that have all valid
  values(non- NULLs).

I don't see any tests (at all) that measure this.

I'm particularly interested in lower numbers of columns, so we can
show no regression for the common case.


  2. There is a visible performance increase when number of columns
 containing
  NULLS are more than  60~70% in table have large number of columns.
 
  3. There are visible space savings when number of columns containing
 NULLS
  are more than  60~70% in table have large number of columns.

Agreed.

I would call that quite disappointing though and was expecting better.
Are we sure the patch works and the tests are correct?

The lack of any space saving for lower % values is strange and
somewhat worrying. There should be a 36? byte saving for 300 null
columns in an 800 column table - how does that not show up at all?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-12-23 Thread Erik Rijkers
On Sun, December 23, 2012 15:29, Michael Paquier wrote:

 Once the 2 small things I noticed are fixed, this patch can be marked as
 ready for committer.

I wasn't going to complain about it, but if we're going for small things 
anyway...

The output is now capitalised:

/tmp:6543 - Accepting Connections
/tmp:6000 - No Response

which looks unusual to me, could we please make it all lower-case?


TYhanks,

Erik Rijkers





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-12-23 Thread Phil Sorber
On Sun, Dec 23, 2012 at 9:57 AM, Erik Rijkers e...@xs4all.nl wrote:
 On Sun, December 23, 2012 15:29, Michael Paquier wrote:

 Once the 2 small things I noticed are fixed, this patch can be marked as
 ready for committer.

 I wasn't going to complain about it, but if we're going for small things 
 anyway...

 The output is now capitalised:

 /tmp:6543 - Accepting Connections
 /tmp:6000 - No Response

 which looks unusual to me, could we please make it all lower-case?

I'm not apposed to it in principal. Is that how it is done elsewhere
in the code? If there are no objections from anyone else I will roll
that change in with the others.



 TYhanks,

 Erik Rijkers





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature Request: pg_replication_master()

2012-12-23 Thread Fujii Masao
On Sat, Dec 22, 2012 at 5:14 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 21.12.2012 21:43, Simon Riggs wrote:

 On 21 December 2012 19:35, Bruce Momjianbr...@momjian.us  wrote:

 It's not too complex. You just want that to be true. The original
 developer has actually literally gone away, but not because of this.


 Well, Robert and I remember it differently.

 Anyway, I will ask for a vote now.


 And what will you ask for a vote on? Why not spend that effort on
 solving the problem? Why is it OK to waste so much time?

 Having already explained how to do this, I'll add backwards
 compatibility within 1 day of the commit of the patch you claim was
 blocked by this. I think it will take me about an hour and not be very
 invasive, just to prove what a load of hot air is being produced here.


 I haven't been following this.. Could you two post a link to the patch we're
 talking about, and the explanation of how to add backwards compatibility to
 it?

The latest patch is the following. Of course, this cannot be applied
cleanly in HEAD.
http://archives.postgresql.org/message-id/CAHGQGwHB==cjehme6jiuy-knutrx-k3ywqzieg1gpnb3ck6...@mail.gmail.com

 Just by looking at the last few posts, this seems like a no brainer. The
 impression I get is that there's a patch that's otherwise ready to be
 applied, but Simon and some others want to have backwards-compatiblity added
 to it first. And Simon has a plan on how to do it, and can do it in one day.
 The obvious solution is that Simon posts the patch, with the
 backwards-compatibility added. We can then discuss that, and assuming no
 surprises there, commit it. And everyone lives happily ever after.

Sounds reasonable.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-12-23 Thread Phil Sorber
On Sun, Dec 23, 2012 at 10:07 AM, Phil Sorber p...@omniti.com wrote:
 On Sun, Dec 23, 2012 at 9:57 AM, Erik Rijkers e...@xs4all.nl wrote:
 On Sun, December 23, 2012 15:29, Michael Paquier wrote:

 Once the 2 small things I noticed are fixed, this patch can be marked as
 ready for committer.

 I wasn't going to complain about it, but if we're going for small things 
 anyway...

 The output is now capitalised:

 /tmp:6543 - Accepting Connections
 /tmp:6000 - No Response

 which looks unusual to me, could we please make it all lower-case?

 I'm not apposed to it in principal. Is that how it is done elsewhere
 in the code? If there are no objections from anyone else I will roll
 that change in with the others.



 TYhanks,

 Erik Rijkers





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Updated patch attached.


pg_isready_bin_v9.diff
Description: Binary data


pg_isready_docs_v9.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2012-12-23 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 The lack of any space saving for lower % values is strange and
 somewhat worrying. There should be a 36? byte saving for 300 null
 columns in an 800 column table - how does that not show up at all?

You could only fit about 4 such rows in an 8K page (assuming the columns
are all int4s).  Unless the savings is enough to allow 5 rows to fit in
a page, the effective savings will be zilch.

This may well mean that the whole thing is a waste of time in most
scenarios --- the more likely it is to save anything, the more likely
that the savings will be lost anyway due to page alignment
considerations, because wider rows inherently pack less efficiently.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2012-12-23 Thread Simon Riggs
On 23 December 2012 17:38, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The lack of any space saving for lower % values is strange and
 somewhat worrying. There should be a 36? byte saving for 300 null
 columns in an 800 column table - how does that not show up at all?

 You could only fit about 4 such rows in an 8K page (assuming the columns
 are all int4s).  Unless the savings is enough to allow 5 rows to fit in
 a page, the effective savings will be zilch.

If that's the case, the use case is tiny, especially considering how
sensitive the saving is to the exact location of the NULLs.

 This may well mean that the whole thing is a waste of time in most
 scenarios --- the more likely it is to save anything, the more likely
 that the savings will be lost anyway due to page alignment
 considerations, because wider rows inherently pack less efficiently.

ISTM that we'd get a better gain and a wider use case by compressing
the whole block, with some bits masked out to allow updates/deletes.
The string of zeroes in the null bitmap would compress easily, but so
would other aspects also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 This is my first test like this against 9.3 development though, so the cause
 could be an earlier commit.  I'm just starting with the most recent work as
 the first suspect.  Next I think I'll try autovacuum=off and see if the
 crash goes away.  Other ideas are welcome.

 Something like the (untested) debug message below might be helpful:

It might also be interesting to know if there is more than one
still-pinned buffer --- that is, if you're going to hack the code, fix
it to elog(LOG) each pinned buffer and then panic after completing the
loop.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-23 Thread Kohei KaiGai
2012/12/22 Kevin Grittner kgri...@mail.com:
 Kohei KaiGai wrote:

 RLS entry of wiki has not been updated for long time, I'll try to
 update the entry for high-level design in a couple of days.

 Thanks, I think that is essential for a productive discussion of
 the issue.

I tried to update http://wiki.postgresql.org/wiki/RLS

I backed to the definition of feature for information security; that
requires to ensure confidentiality, integrity and availability (C.I.A)
of information asset managed by system.
Access control contributes the first two elements.
So, I'm inclined RLS feature eventually support reader-side and
writer-side, to prevent unprivileged rows are read or written.

If I could introduce the most conceptual stuff in one statement,
it shall be:
Overall, RLS prevents users to read and write rows that does not
satisfies the row-security policy being configured on the table by
the table owner. Reader-side ensures confidentiality of data,
writer-side ensures integrity of data.
Also note that, I believe this criteria never deny to have multiple
(asymmetric) row-security policy for each command type, as long
as we care about problematic scenario properly.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-23 Thread Simon Riggs
On 21 December 2012 16:51, Kevin Grittner kgri...@mail.com wrote:

 It would be easy enough to include read/write variable clauses by
 using a function similar to TG_OP for triggers, e.g. StatementType.
 That would make the security clause that applied only to reads into
 something like this (StatementType('INSERT, UPDATE') OR other-quals).

 In the case where the logic in encapsulated into a function, what
 are the logical differences from a BEFORE EACH ROW trigger?

Logically it is broadly similar to a CHECK constraint, which is also
similar to a trigger in a few ways.

Implemented as a BEFORE EACH ROW trigger it would need to be a new
class of trigger that always executed after all other BEFORE EACH ROW
triggers had executed, in the same way that security barrier views act
last. The act last bit seems to be the most important thing here,
just as it was for security barriers and by analogy a string enough
reason to add specific syntax to support this case. (Syntax as yet
undecided...)

 If
 none, and this is strictly an optimization, what are the benchmarks
 showing?

AFAIK its well known that a check constraint is much faster than a
trigger. The objection to using triggers is partially for that reason
and partially because of the admin overhead, as I already said.

Adding a trigger could be done automatically, just as the current
patch does with the check constraint approach. So if anyone has
benchmarks that show triggers are actually faster, then we could add
that automatically instead, I guess.

Anyway, hope you can make call on 28th so we can discuss this and
agree a way forwards you're happy with.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-23 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 21 December 2012 16:51, Kevin Grittner kgri...@mail.com wrote:
 If none, and this is strictly an optimization, what are the benchmarks
 showing?

 AFAIK its well known that a check constraint is much faster than a
 trigger.

I don't believe that that's well known at all, at least not for
apples-to-apples comparison cases.  A C-coded BEFORE trigger doesn't
have very much overhead; I suspect it's probably comparable to
expression evaluation setup overhead.  I think if you want to argue
for this on performance grounds, you need to actually prove there's
a significant performance advantage, not just assume there will be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-23 Thread Simon Riggs
On 23 December 2012 19:16, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 21 December 2012 16:51, Kevin Grittner kgri...@mail.com wrote:
 If none, and this is strictly an optimization, what are the benchmarks
 showing?

 AFAIK its well known that a check constraint is much faster than a
 trigger.

 I don't believe that that's well known at all, at least not for
 apples-to-apples comparison cases.  A C-coded BEFORE trigger doesn't
 have very much overhead; I suspect it's probably comparable to
 expression evaluation setup overhead.  I think if you want to argue
 for this on performance grounds, you need to actually prove there's
 a significant performance advantage, not just assume there will be.

If you want to see some tests, I'm sure those can be arranged, no
problem. But honestly, if its low enough, then which is the fastest
will likely be moot in comparison with the cost of a non-C coded
role-based security check. So I think our attention is best spent on
providing a few likely C-coded security checks, so we're able to
address the whole performance concern not just the constraint/trigger
debate.

That still leaves the points about ensuring the trigger/checks are
executed last and also that they are added automatically, rather than
requiring them to be added manually. As KaiGai points out, if they are
added automatically, it doesn't really matter which we pick.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Greg Smith

On 12/23/12 1:10 PM, Tom Lane wrote:


It might also be interesting to know if there is more than one
still-pinned buffer --- that is, if you're going to hack the code, fix
it to elog(LOG) each pinned buffer and then panic after completing the
loop.



Easy enough; I kept it so the actual source of panic is still an 
assertion failure:


diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c

index dddb6c0..df43643 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1697,11 +1697,21 @@ AtEOXact_Buffers(bool isCommit)
if (assert_enabled)
{
int i;
+   int RefCountErrors = 0;

for (i = 0; i  NBuffers; i++)
{
-   Assert(PrivateRefCount[i] == 0);
+
+   if (PrivateRefCount[i] != 0)
+   {
+   BufferDesc *bufHdr = BufferDescriptors[i];
+   elog(LOG, refcount of %s is %u should 
be 0, globally: %u,
+relpathbackend(bufHdr-tag.rnode, 
InvalidBackendId, bufHdr-tag.forkNum),

+PrivateRefCount[i], bufHdr-refcount);
+   RefCountErrors++;
+   }
}
+   Assert(RefCountErrors == 0);
}
 #endif


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-23 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not thrilled with the suggestion to do RAND_cleanup() after forking
 though, as that seems like it'll guarantee that each session starts out
 with only a minimal amount of entropy.

 No, that's actually the right fix - that forces OpenSSL to do new reseed
 with system randomness, thus making backend (including SSL_accept)
 maximally disconnected from static pool in postmaster.

I don't think that maximal disconnectedness is the deciding factor
here, or even a very important factor.  If we do RAND_cleanup() then
each new session will be forced to suck entropy from /dev/urandom
(immediately, if an SSL connection is attempted).  This opens the door
to quasi-denial-of-service attacks that bleed all the entropy out of the
system, forcing long delays for new PRNG-using processes, Postgres or
otherwise.  And long delays are the *best case* scenario --- worst case,
if the system lacks decent /dev/random support, is that new processes
are starting up with highly correlated PRNG states.

If such an approach were a good thing, the OpenSSL guys would have
implemented it internally --- it'd be easy enough to do, just by forcing
RAND_cleanup anytime getpid() reads differently than it did when the
pool was set up.

gettimeofday() might not be the very best way of changing the inherited
PRNG state from child to child, but I think it's a more attractive
solution than RAND_cleanup.

 This also makes behaviour equal to current ssl=off and exec-backend
 mode, which already do initial seeding in backend.

No, it's not equal, because when ssl=off seeding attempts won't happen
immediately after fork and thus an attacker doesn't have such an easy
way of draining entropy.  If we do what you're suggesting, the attacker
doesn't even need a valid database login to drain entropy --- he can
just fire-and-forget NEGOTIATE_SSL startup packets.  (The exec-backend
case will have such issues anyway, but who thought Windows was a secure
OS?)

 If you decide against RAND_cleanup in backend and instead
 do workarounds in backend or postmaster, then please
 also to periodic RAND_cleanup in postmaster.  This should
 make harder to exploit weaknesses in reused slowly moving state.

We could consider that, but it's not apparent to me that it has any
real value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-23 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote:
 I believe that we'd be better off doing something in postmaster.c to
 positively ensure that each session has a distinct seed value.  Notice
 that BackendRun() already takes measures to ensure that's the case for
 the regular libc random() function; it seems like a reasonable extension
 to also worry about OpenSSL's PRNG.

 #ifdef USE_SSL
 if (EnableSSL)
 {
  struct timeval tv;
 
  gettimeofday(tv, NULL);
  RAND_add(tv, sizeof(tv), 0);
 }
 #endif

 Take the caution one step further and make it independent of EnableSSL.  In a
 stock installation, a !EnableSSL postmaster will never seed its PRNG, and
 there's no vulnerability.  Add a shared_preload_libraries module that uses the
 OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again.

Meh.  In a postmaster that wasn't built with SSL support at all, such
a module is still dangerous (and I'm not convinced anybody would build
such a module anyway).  I think we should confine our ambitions to
preventing security issues caused by our own code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2012-12-22 14:20:56 -0500, Tom Lane wrote:
 I believe that we'd be better off doing something in postmaster.c to
 positively ensure that each session has a distinct seed value.

 I am entirely unconvinced that adding in a gettimeofday() provides more
 entropy than what openssl gathers internally after a
 RAND_cleanup().

No, it doesn't provide more entropy, but what it does do is avoid
draining entropy from the kernel (see my reply to Marko just now).

 gettimeofday() will yield the same result in close
 enough forks on a significant number of systems whereas openssl should
 use stuff like /dev/urandom to initialize its PRNG after a cleanup.

Well, in the first place, that doesn't matter much because the PRNG
state will still change from repeated mix-ins of gettimeofday, even if
we obtain the same reading successive times.  In the second place, if we
add it where I suggest, it would be easy to also include the child
process PID in the added entropy -- we could xor it into the tv_sec
value for instance -- thus further reducing the chance of identical
values getting added in.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Simon Riggs
On 23 December 2012 19:42, Greg Smith g...@2ndquadrant.com wrote:

 diff --git a/src/backend/storage/buffer/bufmgr.c
 b/src/backend/storage/buffer/bufmgr.c
 index dddb6c0..df43643 100644
 --- a/src/backend/storage/buffer/bufmgr.c
 +++ b/src/backend/storage/buffer/bufmgr.c
 @@ -1697,11 +1697,21 @@ AtEOXact_Buffers(bool isCommit)
 if (assert_enabled)
 {
 int i;
 +   int RefCountErrors = 0;


 for (i = 0; i  NBuffers; i++)
 {
 -   Assert(PrivateRefCount[i] == 0);
 +
 +   if (PrivateRefCount[i] != 0)
 +   {
 +   BufferDesc *bufHdr = BufferDescriptors[i];
 +   elog(LOG, refcount of %s is %u should be 0,
 globally: %u,

 +relpathbackend(bufHdr-tag.rnode,
 InvalidBackendId, bufHdr-tag.forkNum),
 +PrivateRefCount[i], bufHdr-refcount);
 +   RefCountErrors++;
 +   }
 }
 +   Assert(RefCountErrors == 0);
 }
  #endif


We already have PrintBufferLeakWarning() for this, which might be a bit neater.

If that last change was the cause, then its caused within VACUUM. I'm
running a thrash test with autovacuums set much more frequently but
nothing yet.

Are you by any chance running with unlogged tables? There was a change
there recently, something around checkpoint IIRC. Can you set
checkpoints more frequent also?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making view dump/restore safe at the column-alias level

2012-12-23 Thread Robert Haas
On Fri, Dec 21, 2012 at 9:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm having a hard time following this.  Can you provide a concrete example?

 regression=# create table t1 (x int, y int);
 CREATE TABLE
 regression=# create table t2 (x int, z int);
 CREATE TABLE
 regression=# create view v1 as select * from t1 join t2 using (x);
 CREATE VIEW
 regression=# \d+ v1
View public.v1
  Column |  Type   | Modifiers | Storage | Description
 +-+---+-+-
  x  | integer |   | plain   |
  y  | integer |   | plain   |
  z  | integer |   | plain   |
 View definition:
  SELECT t1.x, t1.y, t2.z
FROM t1
JOIN t2 USING (x);
 regression=# alter table t2 rename column x to q;
 ALTER TABLE
 regression=# \d+ v1
View public.v1
  Column |  Type   | Modifiers | Storage | Description
 +-+---+-+-
  x  | integer |   | plain   |
  y  | integer |   | plain   |
  z  | integer |   | plain   |
 View definition:
  SELECT t1.x, t1.y, t2.z
FROM t1
JOIN t2 USING (x);

 At this point the dumped view definition is wrong: if you try to execute
 it you get

 regression=# SELECT t1.x, t1.y, t2.z
 regression-#FROM t1
 regression-#JOIN t2 USING (x);
 ERROR:  column x specified in USING clause does not exist in right table

 I'm suggesting that we could fix this by emitting something that forces
 the right alias to be assigned to t2.q:

 SELECT t1.x, t1.y, t2.z
FROM t1
JOIN t2 AS t2(x,z)
USING (x);

Sneaky.  I didn't know that would even work, but it seems like a
sensible approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Greg Smith

On 12/23/12 3:17 PM, Simon Riggs wrote:

If that last change was the cause, then its caused within VACUUM. I'm
running a thrash test with autovacuums set much more frequently but
nothing yet.


I am not very suspicious of that VACUUM change; just pointed it out for 
completeness sake.



Are you by any chance running with unlogged tables? There was a change
there recently, something around checkpoint IIRC. Can you set
checkpoints more frequent also?


This is straight pgbench, modifying only scale, client count, workers, 
and duration.  No unlogged tables involved.


The crash is so intermittent that I'm trying not to change anything 
until I get more productive output out of one.  The last run I kicked 
off has been chugging for 14 hours without a hiccup yet.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Simon Riggs
On 23 December 2012 21:52, Greg Smith g...@2ndquadrant.com wrote:
 On 12/23/12 3:17 PM, Simon Riggs wrote:

 If that last change was the cause, then its caused within VACUUM. I'm
 running a thrash test with autovacuums set much more frequently but
 nothing yet.


 I am not very suspicious of that VACUUM change; just pointed it out for
 completeness sake.

Nothing seen here either after 100 minutes of thrash with ~1 VACUUM
per sec, at SF1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2012-12-23 Thread Robert Haas
On Fri, Dec 21, 2012 at 11:35 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 It's hard to devise exactly what kind of refactoring we need to do
 before trying to write a patch that benefits from it, in my experience.
 In some cases the refactoring will make things more complex, not less.

Perhaps.  I have a feeling there's some more simplification that can
be done here, somehow.

 That's a good idea. I only started quite late in that patch to work on
 the ObjectID piece of information, that's why I didn't split it before
 doing so.

That's fine.  I've committed that part.  I think the remainder of the
patch is really several different features that ought to be split
apart and considered independently.  We may want some but not others,
or some may be ready to go in but not others.

 Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP
 statements, so my current patch is still some bricks shy of a load… (I
 added ObjectID only in the commands I added rewrite support for, apart
 from DROP).

I shall rely on you to provide those bricks which are still missing.

 change to gram.y that affects any of the supported commands will
 require a compensating change to ddl_rewrite.c.  That is a whole lot
 of work and it is almost guaranteed that future patch authors will
 sometimes fail to do it correctly.   At an absolute bare minimum I
 think we would need some kind of comprehensive test suite for this
 functionality, as opposed to the less than 60 lines of new test cases
 this patch adds which completely fail to test this code in any way at
 all, but really I think we should just not have it.  It WILL get

 I had a more complete test suite last rounds and you did oppose to it
 for good reasons. I'm ok to revisit that now, and I think the test case
 should look like this:

   - install an auditing command trigger that will insert any DDL command
 string into a table with a sequence ordering

   - select * from audit

   - \d… of all objects created

   - drop schema cascade

   - DO $$ loop for sql in select command from audit do execute sql …

   - \d… of all objects created again

 Then any whacking of the grammar should alter the output here and any
 case that's not supported will fail too.

 We might be able to have a better way of testing the feature here, but I
 tried to stay into the realms of what I know pg_regress able to do.

I was thinking that we might need to go beyond what pg_regress can do
in this case.  For example, one idea would be to install an audit
trigger that records all the DDL that happens during the regression
tests.  Then, afterwards, replay that DDL into a new database.  Then
do a schema-only dump of the old and new databases and diff the dump
files.  That's a little wacky by current community standards but FWIW
EDB has a bunch of internal tests that we run to check our proprietary
stuff; some of them work along these lines and it's pretty effective
at shaking out bugs.

In this particular case, it would significantly reduce the maintenance
burden of putting a proper test framework in place, because we can
hope that the regression tests create (and leave around) at least one
object of any given type, and that any new DDL features will be
accompanied by similar regression tests.  We already rely on the
regression database for pg_upgrade testing, so if it's not complete,
it impacts pg_upgrade test coverage as well.

 broken (if it isn't already) and it WILL slow down future development
 of SQL features.  It also WILL have edge cases where it doesn't work
 properly, such as where the decision of the actual index name to use
 is only decided at execution time and cannot be inferred from the
 parse tree.  I know that you feel that all of these are tolerable

 The way to solve that, I think, as Tom said, is to only rewrite the
 command string when the objects exist in the catalogs. That's why we now
 have ddl_command_trace which is an alias to either start or end
 depending on whether we're doing a DROP or a CREATE or ALTER operation.

Hmm.  I have to study that more, maybe.  I certainly agree that if you
can look at the catalogs, you should be able to reliably reconstruct
what happened - which isn't possible just looking at the parse tree.
However, it feels weird to me to do something that's partly based on
the parse tree and partly based on the catalog contents.  Moreover,
the current pre-create hook isn't the right place to gather
information from the catalogs anyway as it happens before locks have
been taken.

Now, there's another thing that is bothering me here, too.  How
complete is the support you've implemented in this patch?  Does it
cover ALL CREATE/ALTER/DROP commands, including all options to and all
variants of those commands?  Because while I'm not very sanguine about
doing this at all, it somehow feels like doing it only halfway would
be even worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via 

[HACKERS] initdb and share/postgresql.conf.sample

2012-12-23 Thread Jeff Janes
In some of my git branches I have
editorialized src/backend/utils/misc/postgresql.conf.sample to contain my
configuration preferences for whatever it is that that branch is for
testing.  Then this gets copied to share/postgresql.conf.sample during
install and from there to data/postgresql.conf during initdb, and I don't
need to remember to go make the necessary changes.

Am I insane to be doing this?  Is there a better way to handle this
branch-specific configuration needs?

Anyway, I was recently astonished to discovery that the contents
of share/postgresql.conf.sample during the initdb affected the performance
of the server, even when the conf file was replaced with something else
before the server was started up.  To make a very long story short,
if share/postgresql.conf.sample is set up for archiving, then somewhere in
the initdb process some bootstrap process pre-creates a bunch of extra xlog
files.

Is this alarming?  It looks like initdb takes some pains, at least on one
place, to make an empty config file rather than using
postgresql.conf.sample, but it seems like a sub-process is not doing that.

Those extra log files then give the newly started server a boost (whether
it is started in archive mode or not) because it doesn't have to create
them itself.  It isn't so much a boost, as the absence of a new-server
penalty.  I want to remove that penalty to get better numbers from
benchmarking.  What I am doing now is this, between the initdb and the
pg_ctl start:

for g in `perl -e 'printf(000100%02X\n,$_) foreach
2..120'`; do cp /tmp/data/pg_xlog/00010001
/tmp/data/pg_xlog/$g -i  /dev/null;

The 120 comes from 2 * checkpoint_segments.  That's mighty ugly, is there
a better trick?

You could say that benchmarks should run long enough to average out such
changes, but needing to run a benchmark that long can make some kinds of
work (like git bisect) unrealistic rather than merely tedious.

Cheers,

Jeff


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-23 Thread Tomas Vondra
Hi!

On 22.12.2012 17:15, Alexander Korotkov wrote:
 I'm not saying this is a perfect benchmark, but the differences (of
 querying) are pretty huge. Not sure where this difference comes from,
 but it seems to be quite consistent (I usually get +-10% results, which
 is negligible considering the huge difference).
 
 Is this an expected behaviour that will be fixed by another patch?
 
  
 Another patches which significantly accelerate index search will be
 provided. This patch changes only GIN posting lists/trees storage.
 However, it wasn't expected that this patch significantly changes index
 scan speed in any direction.

That was exactly my expectation - probably not an improvement, but
definitely not a worse performance.

 
 The database contains ~680k messages from the mailing list archives,
 i.e. about 900 MB of data (in the table), and the GIN index on tsvector
 is about 900MB too. So the whole dataset nicely fits into memory (8GB
 RAM), and it seems to be completely CPU bound (no I/O activity at all).
 
 The configuration was exactly the same in both cases
 
 shared buffers = 1GB
 work mem = 64 MB
 maintenance work mem = 256 MB
 
 I can either upload the database somewhere, or provide the benchmarking
 script if needed.
 
 
 Unfortunately, I can't reproduce such huge slowdown on my testcases.
 Could you share both database and benchmarking script?

It's strange, but no matter what I do I can't reproduce those results
(with the significant performance decrease). So either I've done some
strange mistake when running those tests, or there was something wrong
with my system, or whatever :-(

But when running the benchmarks now (double-checked everything, properly
repeated the tests, ...), I've noticed a different behaviour. But first
some info about the scripts I use for testing.

All the scripts are available here:

  https://bitbucket.org/tvondra/archie

It's my hobby project implementing fulltext mbox archive. It should be
usable but it's still a bit WIP so let me know in case of any issues.

The README should give you all the instructions on how to setup and load
the database. I'm using ~1700 mbox files downloaded from
http://archives.postgresql.org/ for these lists (until 2012/11):

   pgadmin-hackers
   pgsql-advocacy
   pgsql-announce
   pgsql-bugs
   pgsql-general
   pgsql-hackers
   pgsql-jdbc
   pgsql-novice
   pgsql-odbc
   pgsql-patches
   pgsql-sql

which in the end gives ~677k rows in the 'messages' table, occupying
~5.5GB disk space (including all the indexes etc).

Once you have the data loaded, you need to warmup the database and then
start benchmarking it - I'm using the warmup.py script to both things.
The script is quite simple, it basically just

To warmup the DB, just run this

./warmup.py --db archie --duration 300

until the %util drops near 0 (assuming you have enough RAM to fit the
whole database into memory). Then I usually do this as a benchmarking

./warmup.py --db archie --duration 300 --no-hash \
--no-thread --words 1

./warmup.py --db archie --duration 300 --no-hash \
--no-thread --words 2

which runs 60-second tests and outputs one line for worker (by default
equal to the number of CPUs).

The script itself is very simple, it fetches a random message and uses
the tsvector column as a source of words for the actual benchmark. It
takes N words from the tsvector, splits them into groups and performs a
simple fulltext query using plainto_tsquery('word1 word2 ...'). At the
end it prints info including the number of queries per second.

I've run the tests on the current master with and without the v3 patch.
I've tested it with 1GB or 2GB shared buffers, and 32MB or 64MB work mem.

The tests were run for 1, 2, 3, 4 and 5 words, and I've repeated it five
times for each configuration. Duration of each run was 5-minutes.

These are the averages (from the 5 runs) of queries per second for each
combination of parameters:

  1 2 3 4 5

 master 1GB/32MB 19   179   165   12799
 patched1GB/32MB 19   175   163   12496

 master 1GB/64MB 20   181   165   12799
 patched1GB/64MB 19   174   159   12095

 master 2GB/32MB 27   181   165   12798
 patched2GB/32MB 25   176   156   12093

 master 2GB/64MB 27   180   166   128   102
 patched2GB/64MB 40   402   364   245   176

There's no significant difference in performance, except for the
2GB/64MB combination. And in that case it's actually the opposite
direction than I've reported before - i.e. this time it's up to 100%
faster than the unpatched master. The results are pretty consistent
(very small variance across the repeated runs), so I'm not sure about
the previous results.

Any idea what might cause such behavior? Why should it 

Re: [HACKERS] initdb and share/postgresql.conf.sample

2012-12-23 Thread Greg Stark
On Sun, Dec 23, 2012 at 11:11 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 You could say that benchmarks should run long enough to average out such
 changes

I would say any benchmark needs to be run long enough to reach a
steady state before the measurements are taken. The usual practice is
to run a series groups and observe the aggregate measurements for each
group. For instance run 10 runs with each run including of 1000
repetitions of the transaction. Then you can observe at which point
the averages for individual groups begin to behave consistently. If
the first three are outliers but the remaining 7 are stable then
discard the first three and take the average (or often median) of the
remaining 7.

If you include the early runs which are affected by non-steady-state
conditions such as cache effects or file fragmentation then it can
take a very long time for those effects to be erased by averaging with
later results. Worse, it's very difficult to tell whether you've
waited long enough.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-23 Thread Noah Misch
On Sun, Dec 23, 2012 at 02:49:08PM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote:
  #ifdef USE_SSL
  if (EnableSSL)
  {
 struct timeval tv;
  
 gettimeofday(tv, NULL);
 RAND_add(tv, sizeof(tv), 0);
  }
  #endif
 
  Take the caution one step further and make it independent of EnableSSL.  In 
  a
  stock installation, a !EnableSSL postmaster will never seed its PRNG, and
  there's no vulnerability.  Add a shared_preload_libraries module that uses 
  the
  OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again.
 
 Meh.  In a postmaster that wasn't built with SSL support at all, such
 a module is still dangerous (and I'm not convinced anybody would build
 such a module anyway).  I think we should confine our ambitions to
 preventing security issues caused by our own code.

You're adding lines of code to prematurely micro-optimize the backend fork
cycle.  If code introduced into the postmaster, by us or others, ever violates
the assumption behind that micro-optimization, certain users get a precipitous
loss of security with no clear alarm bells.  I don't like that trade.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-23 Thread Tomas Vondra
On 20.12.2012 12:33, Andres Freund wrote:
 On 2012-12-20 02:54:48 +0100, Tomas Vondra wrote:
 On 19.12.2012 02:18, Andres Freund wrote:
 On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote:

 I think except of the temp buffer issue mentioned below its ready.

 -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
 +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
  {
 -  int i;
 +  int i, j;
 +
 +  /* sort the list of rnodes */
 +  pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);

/* If it's a local relation, it's localbuf.c's problem. */
 -  if (RelFileNodeBackendIsTemp(rnode))
 +  for (i = 0; i  nnodes; i++)
{
 -  if (rnode.backend == MyBackendId)
 -  DropRelFileNodeAllLocalBuffers(rnode.node);
 -  return;
 +  if (RelFileNodeBackendIsTemp(rnodes[i]))
 +  {
 +  if (rnodes[i].backend == MyBackendId)
 +  DropRelFileNodeAllLocalBuffers(rnodes[i].node);
 +  }
}

 While you deal with local buffers here you don't anymore in the big loop
 over shared buffers. That wasn't needed earlier since we just returned
 after noticing we have local relation, but thats not the case anymore.

 Hmm, but that would require us to handle the temp relations explicitly
 wherever we call DropRelFileNodeAllBuffers. Currently there are two such
 places - smgrdounlink() and smgrdounlinkall().

 By placing it into DropRelFileNodeAllBuffers() this code is shared and I
 think it's a good thing.

 But that does not mean the code is perfect - it was based on the
 assumption that if there's a mix of temp and regular relations, the temp
 relations will be handled in the first part and the rest in the second one.

 Maybe it'd be better to improve it so that the temp relations are
 removed from the array after the first part (and skip the second one if
 there are no remaining relations).
 
 The problem is that afaik without the backend-local part a temporary
 relation could match the same relfilenode as a full relation which
 would have pretty bad consequences.

Attached is a patch with fixed handling of temporary relations. I've
chosen to keep the logic in DropRelFileNodeAllBuffers and rather do a
local copy without the local relations.

regards
Tomas
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 8dc622a..8c12a43 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -312,6 +312,11 @@ smgrDoPendingDeletes(bool isCommit)
PendingRelDelete *pending;
PendingRelDelete *prev;
PendingRelDelete *next;
+
+   SMgrRelation   *srels = palloc(sizeof(SMgrRelation));
+   int nrels = 0,
+   i = 0,
+   maxrels = 1;
 
prev = NULL;
for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -335,14 +340,31 @@ smgrDoPendingDeletes(bool isCommit)
SMgrRelation srel;
 
srel = smgropen(pending-relnode, 
pending-backend);
-   smgrdounlink(srel, false);
-   smgrclose(srel);
+   
+   /* extend the array if needed (double the size) 
*/
+   if (maxrels = nrels) {
+   maxrels *= 2;
+   srels = repalloc(srels, 
sizeof(SMgrRelation) * maxrels);
+   }
+   
+   srels[nrels++] = srel;
}
/* must explicitly free the list entry */
pfree(pending);
/* prev does not change */
}
}
+
+   if (nrels  0)
+   {
+   smgrdounlinkall(srels, nrels, false);
+   
+   for (i = 0; i  nrels; i++)
+   smgrclose(srels[i]);
+   }
+   
+   pfree(srels);
+
 }
 
 /*
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index dddb6c0..8a7190b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -62,6 +62,7 @@
 #define BUF_WRITTEN0x01
 #define BUF_REUSABLE   0x02
 
+#define BSEARCH_LIMIT  10
 
 /* GUC variables */
 bool   zero_damaged_pages = false;
@@ -108,6 +109,7 @@ static volatile BufferDesc *BufferAlloc(SMgrRelation smgr,
 static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln);
 static void AtProcExit_Buffers(int code, Datum arg);
 
+static int rnode_comparator(const void * p1, const void * p2);
 
 /*
  * PrefetchBuffer -- initiate asynchronous read of a block of a relation
@@ -2094,35 +2096,85 @@ 

[HACKERS] compliation error in 9.3 devel

2012-12-23 Thread Hari Babu
 

The following commit is causing some compilation errors. 

c504513f83a9ee8dce4a719746ca73102cae9f13 



Error: 
../../../src/include/catalog/pg_aggregate.h:246: error: previous declaration
of AggregateCreate was here 
make: *** [pg_aggregate.o] Error 1 




Author: Robert Haas rh...@postgresql.org  2012-12-24 04:55:03 
Committer: Robert Haas rh...@postgresql.org  2012-12-24 05:07:58 
Parent: 31bc839724439440b2e94ea616b28ce5be94e19c (Prevent failure when
RowExpr or XmlExpr is parse-analyzed twice.) 
Child:   (Local uncommitted changes,
not checked in to index) 
Branches: Truncate-trailing-null-columns, master 
Follows: REL9_2_BETA2 
Precedes: 

Adjust many backend functions to return OID rather than void. 

Extracted from a larger patch by Dimitri Fontaine.  It is hoped that 
this will provide infrastructure for enriching the new event trigger 
functionality, but it seems possibly useful for other purposes as 
well. 


Regards, 
Hari babu.