Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-08 Thread Simon Riggs
On 7 December 2012 23:51, Stephen Frost sfr...@snowman.net wrote:
 Jeff,

 * Jeff Davis (pg...@j-davis.com) wrote:
 Most of your concerns seem to be related to freezing, and I'm mostly
 interested in HEAP_XMIN_COMMITTED optimizations. So I think we're
 talking past each other.

 My concern is about this patch/capability as a whole.  I agree that the
 hint bit setting may be fine by itself, I'm not terribly concerned with
 that.  Now, if you (and others) aren't concerned about the overall
 behavior that is being introduced here, that's fine, but it was my
 understanding from previous discussions that making tuples visible to
 all transactions, even those started before the committing transaction
 which are set more strictly than 'read-committed', wasn't 'ok'.

 Now, what I've honestly been hoping for on this thread was for someone
 to speak up and point out why I'm wrong about this concern and explain
 how this patch addresses that issue.  If that's been done, I've missed
 it..

Visibility of pre-hinted data is an issue and because of that we
previously agreed that it would be allowed only when explicitly
requested by the user, which has been implemented as the FREEZE option
on COPY. This then makes it identical to TRUNCATE, where a separate
command differentiates MVCC-compliant row removal (DELETE) from
non-MVCC row removal (TRUNCATE).

So the committed feature does address the visibility issue. Jeff has
been speaking about an extension to that behaviour, which would be to
allow HEAP_XMIN_COMMITTED to be set in some cases also. The committed
feature specifically does not do that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-08 Thread Michael Paquier
On Fri, Dec 7, 2012 at 10:33 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 7 December 2012 12:37, Michael Paquier michael.paqu...@gmail.com
 wrote:
  - There is still a problem with toast indexes. If the concurrent reindex
 of
  a toast index fails for a reason or another, pg_relation will finish with
  invalid toast index entries. I am still wondering about how to clean up
  that. Any ideas?

 Build another toast index, rather than reindexing the existing one,
 then just use the new oid.

Hum? The patch already does that. It creates concurrently a new index which
is a duplicate of the existing one, then the old and new indexes are
swapped. Finally the old index is dropped concurrently.

The problem I still see is the following one:
If a toast index, or a relation having a toast index, is being reindexed
concurrently, and that the server crashes during the process, there will be
invalid toast indexes in the server. If the crash happens before the swap,
the new toast index is invalid. If the crash happens after the swap, the
old toast index is invalid.
I am not sure the user is able to clean up such invalid toast indexes
manually as they are not visible to him.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-08 Thread Michael Paquier
On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
  Simon Riggs si...@2ndquadrant.com writes:
   On 7 December 2012 12:37, Michael Paquier michael.paqu...@gmail.com
 wrote:
   - There is still a problem with toast indexes. If the concurrent
 reindex of
   a toast index fails for a reason or another, pg_relation will finish
 with
   invalid toast index entries. I am still wondering about how to clean
 up
   that. Any ideas?
 
   Build another toast index, rather than reindexing the existing one,
   then just use the new oid.

 Thats easier said than done in the first place. toast_save_datum()
 explicitly opens/modifies the one index it needs and updates it.

  Um, I don't think you can swap in a new toast index OID without taking
  exclusive lock on the parent table at some point.

 The whole swapping issue isn't solved satisfyingly as whole yet :(.

 If we just swap the index relfilenodes in the pg_index entries itself,
 we wouldn't need to modify the main table's pg_class at all.

I think you are mistaking here, relfilenode is a column of pg_class and not
pg_index.
So whatever the method used for swapping: relfilenode switch or relname
switch, you need to modify the pg_class entry of the old and new indexes.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] [WIP] pg_ping utility

2012-12-08 Thread Michael Paquier
On Fri, Dec 7, 2012 at 12:56 PM, Phil Sorber p...@omniti.com wrote:

 On Thu, Dec 6, 2012 at 8:54 PM, Michael Paquier
  OK. Let's do that and then mark this patch as ready for committer.
  Thanks,

 Those changes have been made.

Cool. Thanks.


 Something I was just thinking about while testing this again. I
 mentioned the issue before about someone meaning to put -v and putting
 -V instead and it being a potential source of problems. What about
 making verbose the default and removing -v and adding -q to make it
 quiet? This would also match other tools behavior. I want to get this
 wrapped up and I am fine with it as is, but just wanted to ask what
 others thought.

Bruce mentionned that pg_isready could be used directly by pg_ctl -w.
Default as being non-verbose would make sense. What are the other tools you
are thinking about? Some utilities in core?

Except if you change the default behavior, let's change this patch status
as ready to committer.

Thanks,
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-08 Thread Andres Freund
On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
 On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund and...@2ndquadrant.comwrote:

  On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
   Simon Riggs si...@2ndquadrant.com writes:
On 7 December 2012 12:37, Michael Paquier michael.paqu...@gmail.com
  wrote:
- There is still a problem with toast indexes. If the concurrent
  reindex of
a toast index fails for a reason or another, pg_relation will finish
  with
invalid toast index entries. I am still wondering about how to clean
  up
that. Any ideas?
  
Build another toast index, rather than reindexing the existing one,
then just use the new oid.
 
  Thats easier said than done in the first place. toast_save_datum()
  explicitly opens/modifies the one index it needs and updates it.
 
   Um, I don't think you can swap in a new toast index OID without taking
   exclusive lock on the parent table at some point.
 
  The whole swapping issue isn't solved satisfyingly as whole yet :(.
 
  If we just swap the index relfilenodes in the pg_index entries itself,
  we wouldn't need to modify the main table's pg_class at all.
 
 I think you are mistaking here, relfilenode is a column of pg_class and not
 pg_index.
 So whatever the method used for swapping: relfilenode switch or relname
 switch, you need to modify the pg_class entry of the old and new indexes.

The point is that with a relname switch the pg_class.oid of the index
changes. Which is a bad idea because it will possibly be referred to by
pg_depend entries. Relfilenodes - which certainly live in pg_class too,
thats not the point - aren't referred to externally though. So if
everything else in pg_class/pg_index stays the same a relfilenode switch
imo saves you a lot of trouble.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-12-08 Thread Phil Sorber
On Sat, Dec 8, 2012 at 7:50 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Dec 7, 2012 at 12:56 PM, Phil Sorber p...@omniti.com wrote:

 Something I was just thinking about while testing this again. I
 mentioned the issue before about someone meaning to put -v and putting
 -V instead and it being a potential source of problems. What about
 making verbose the default and removing -v and adding -q to make it
 quiet? This would also match other tools behavior. I want to get this
 wrapped up and I am fine with it as is, but just wanted to ask what
 others thought.

 Bruce mentionned that pg_isready could be used directly by pg_ctl -w.
 Default as being non-verbose would make sense. What are the other tools you
 are thinking about? Some utilities in core?

I think Bruce meant that PQPing() is used by pg_ctl -w, not that he
would use pg_isready.

I was just thinking that if someone is going to use it in a script
adding -q won't be a big deal. If someone wants to use it by hand
adding -v could become cumbersome.


 Except if you change the default behavior, let's change this patch status as
 ready to committer.

 Thanks,

 --
 Michael Paquier
 http://michael.otacoo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest 2012-11 Progress

2012-12-08 Thread Noah Misch
We're entering the last planned week of the CF.  Apart from a 72-hour period
in mid-November, some CF has remained in-progress continuously for the last
176 days.  With 64 out of the 82 current patches unresolved, we're on track to
again see no gap between CF 2012-11 and CF 2013-01.  The process as practiced
today is a shell of its former self[1][2].  Authors, reviewers, committers:
has the process changed to something better, worthy of formally replacing its
predecessor?  Or, rather, do we need a fresh process correction?

Thanks,
nm

[1] http://wiki.postgresql.org/wiki/CommitFest
[2] http://wiki.postgresql.org/wiki/Running_a_CommitFest


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-08 Thread Andres Freund
On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote:
 On 6.12.2012 05:47, Shigeru Hanada wrote:
  I've done a simple benchmark on my laptop with 2GB shared buffers, it's
  attached in the drop-test.py (it's a bit messy, but it works).
  [snip]
  With those parameters, I got these numbers on the laptop:
 
creating 1 tables
  all tables created in 3.298694 seconds
dropping 1 tables one by one ...
  all tables dropped in 32.692478 seconds
creating 1 tables
  all tables created in 3.458178 seconds
dropping 1 tables in batches by 100...
  all tables dropped in 3.28268 seconds
 
  So it's 33 seconds vs. 3.3 seconds, i.e. 10x speedup. On AWS we usually
  get larger differences, as we use larger shared buffers and the memory
  is significantly slower there.
 
  Do you have performance numbers of same test on not-patched PG?
  This patch aims to improve performance of bulk DROP, so 4th number in
  the result above should be compared to 4th number of not-patched PG?

 I've re-run the tests with the current patch on my home workstation, and
 the results are these (again 10k tables, dropped either one-by-one or in
 batches of 100).

 1) unpatched

 dropping one-by-one:15.5 seconds
 dropping in batches of 100: 12.3 sec

 2) patched (v3.1)

 dropping one-by-one:32.8 seconds
 dropping in batches of 100:  3.0 sec

 The problem here is that when dropping the tables one-by-one, the
 bsearch overhead is tremendous and significantly increases the runtime.
 I've done a simple check (if dropping a single table, use the original
 simple comparison) and I got this:

 3) patched (v3.2)

 dropping one-by-one:16.0 seconds
 dropping in batches of 100:  3.3 sec

 i.e. the best of both. But it seems like an unnecessary complexity to me
 - if you need to drop a lot of tables you'll probably do that in a
 transaction anyway.


Imo that's still a pretty bad performance difference. And your
single-table optimization will probably fall short as soon as the table
has indexes and/or a toast table...

 +
 +/*
 + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so
 + * that it's suitable for bsearch.
 + */
 +static int
 +rnode_comparator(const void * p1, const void * p2)
 +{
 + RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
 + RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
 +
 + if (n1.node.relNode  n2.node.relNode)
 + return -1;
 + else if (n1.node.relNode  n2.node.relNode)
 + return 1;
 +
 + if (n1.node.dbNode  n2.node.dbNode)
 + return -1;
 + else if (n1.node.dbNode  n2.node.dbNode)
 + return 1;
 +
 + if (n1.node.spcNode  n2.node.spcNode)
 + return -1;
 + else if (n1.node.spcNode  n2.node.spcNode)
 + return 1;
 + else
 + return 0;
 +}

ISTM that this whole comparator could be replaced by memcmp(). That
could quite possibly lower the overhead of the bsearch() in simple
cases. We already rely on the fact that RelFileNode's have no padding
atm (c.f. buf_internal.h), so a memcmp() seems fine to me.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-12-08 Thread Andres Freund
On 2012-10-18 22:40:15 +0200, Boszormenyi Zoltan wrote:
 2012-10-18 20:08 keltezéssel, Tom Lane írta:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Boszormenyi Zoltan escribió:
 this is the latest one, fixing a bug in the accounting
 of per-statement lock timeout handling and tweaking
 some comments.
 Tom, are you able to give this patch some more time on this commitfest?
 I'm still hoping to get to it, but I've been spending a lot of time on
 bug fixing rather than patch review lately :-(.  If you're hoping to
 close out the current CF soon, maybe we should just slip it to the next
 one.

 Fine by me. Thanks.

According to this the current state of the patch should be Ready for
Committer not Needs Review is that right? I changed the state for
now...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-08 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
 So whatever the method used for swapping: relfilenode switch or relname
 switch, you need to modify the pg_class entry of the old and new indexes.

 The point is that with a relname switch the pg_class.oid of the index
 changes. Which is a bad idea because it will possibly be referred to by
 pg_depend entries. Relfilenodes - which certainly live in pg_class too,
 thats not the point - aren't referred to externally though. So if
 everything else in pg_class/pg_index stays the same a relfilenode switch
 imo saves you a lot of trouble.

I do not believe that it is safe to modify an index's relfilenode *nor*
its OID without exclusive lock; both of those are going to be in use to
identify and access the index in concurrent sessions.  The only things
we could possibly safely swap in a REINDEX CONCURRENTLY are the index
relnames, which are not used for identification by the system itself.
(I think.  It's possible that even this breaks something.)

Even then, any such update of the pg_class rows is dependent on
switching to MVCC-style catalog access, which frankly is pie in the sky
at the moment; the last time pgsql-hackers talked seriously about that,
there seemed to be multiple hard problems besides mere performance.
If you want to wait for that, it's a safe bet that we won't see this
feature for a few years.

I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly.  Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.  Or just generate a new name using the
same rules that CREATE INDEX would when no name is specified.  Yeah,
it's a hack, but what about the CONCURRENTLY commands isn't a hack?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-12-08 Thread Andres Freund
Hi,

On 2012-11-21 14:52:18 -0800, Jeff Janes wrote:
 On Thu, Nov 15, 2012 at 11:16 AM, Robert Haas robertmh...@gmail.com wrote:
 
  So what's next here?  Do you want to work on these issue some more?
  Or does Jeff?

 This has been rewritten enough that I no longer feel much ownership of it.

 I'd prefer to leave it to Peter or Greg S., if they are willing to do it.

Is anybody planning to work on this? There hasn't been any activity
since the beginning of the CF and it doesn't look like there is much
work left?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-08 Thread Tomas Vondra
On 8.12.2012 15:26, Andres Freund wrote:
 On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote:
 I've re-run the tests with the current patch on my home workstation, and
 the results are these (again 10k tables, dropped either one-by-one or in
 batches of 100).

 1) unpatched

 dropping one-by-one:15.5 seconds
 dropping in batches of 100: 12.3 sec

 2) patched (v3.1)

 dropping one-by-one:32.8 seconds
 dropping in batches of 100:  3.0 sec

 The problem here is that when dropping the tables one-by-one, the
 bsearch overhead is tremendous and significantly increases the runtime.
 I've done a simple check (if dropping a single table, use the original
 simple comparison) and I got this:

 3) patched (v3.2)

 dropping one-by-one:16.0 seconds
 dropping in batches of 100:  3.3 sec

 i.e. the best of both. But it seems like an unnecessary complexity to me
 - if you need to drop a lot of tables you'll probably do that in a
 transaction anyway.

 
 Imo that's still a pretty bad performance difference. And your
 single-table optimization will probably fall short as soon as the table
 has indexes and/or a toast table...

Why? I haven't checked the code but either those objects are droppped
one-by-one (which seems unlikely) or they're added to the pending list
and then the new code will kick in, making it actually faster.

Yes, the original code might be faster even for 2 or 3 objects, but I
can't think of a simple solution to fix this, given that it really
depends on CPU/RAM speed and shared_buffers size.

 +
 +/*
 + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), 
 so
 + * that it's suitable for bsearch.
 + */
 +static int
 +rnode_comparator(const void * p1, const void * p2)
 +{
 +RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
 +RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
 +
 +if (n1.node.relNode  n2.node.relNode)
 +return -1;
 +else if (n1.node.relNode  n2.node.relNode)
 +return 1;
 +
 +if (n1.node.dbNode  n2.node.dbNode)
 +return -1;
 +else if (n1.node.dbNode  n2.node.dbNode)
 +return 1;
 +
 +if (n1.node.spcNode  n2.node.spcNode)
 +return -1;
 +else if (n1.node.spcNode  n2.node.spcNode)
 +return 1;
 +else
 +return 0;
 +}
 
 ISTM that this whole comparator could be replaced by memcmp(). That
 could quite possibly lower the overhead of the bsearch() in simple
 cases. We already rely on the fact that RelFileNode's have no padding
 atm (c.f. buf_internal.h), so a memcmp() seems fine to me.

Good point, I'll try that. The original code used a macro that simply
compared the fields and I copied that logic, but if we can remove the
code ...

Thanks for the review!

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-08 Thread Andres Freund
On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
  So whatever the method used for swapping: relfilenode switch or relname
  switch, you need to modify the pg_class entry of the old and new indexes.

  The point is that with a relname switch the pg_class.oid of the index
  changes. Which is a bad idea because it will possibly be referred to by
  pg_depend entries. Relfilenodes - which certainly live in pg_class too,
  thats not the point - aren't referred to externally though. So if
  everything else in pg_class/pg_index stays the same a relfilenode switch
  imo saves you a lot of trouble.

 I do not believe that it is safe to modify an index's relfilenode *nor*
 its OID without exclusive lock; both of those are going to be in use to
 identify and access the index in concurrent sessions.  The only things
 we could possibly safely swap in a REINDEX CONCURRENTLY are the index
 relnames, which are not used for identification by the system itself.
 (I think.  It's possible that even this breaks something.)

Well, the patch currently *does* take an exlusive lock in an extra
transaction just for the swapping. In that case it should actually be
safe.
Although that obviously removes part of the usefulness of the feature.

 Even then, any such update of the pg_class rows is dependent on
 switching to MVCC-style catalog access, which frankly is pie in the sky
 at the moment; the last time pgsql-hackers talked seriously about that,
 there seemed to be multiple hard problems besides mere performance.
 If you want to wait for that, it's a safe bet that we won't see this
 feature for a few years.

Yea :(

 I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
 preserve the index name exactly.  Something like adding or removing
 trailing underscores would probably serve to generate a nonconflicting
 name that's not too unsightly.  Or just generate a new name using the
 same rules that CREATE INDEX would when no name is specified.  Yeah,
 it's a hack, but what about the CONCURRENTLY commands isn't a hack?

I have no problem with ending up with a new name or something like
that. If that is what it takes: fine, no problem.

The issue I raised above is just about keeping the pg_depend entries
pointing to something valid... And not changing the indexes pg_class.oid
seems to be the easiest solution for that.

I have some vague schemes in my had that we can solve the swapping issue
with 3 entries for the index in pg_class, but they all only seem to come
to my head while I don't have anything to write them down, so they are
probably bogus.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gistchoose vs. bloat

2012-12-08 Thread Andres Freund
Hi,

On 2012-11-02 12:54:33 +0400, Alexander Korotkov wrote:
 On Sun, Oct 21, 2012 at 11:03 AM, Jeff Davis pg...@j-davis.com wrote:

  On Thu, 2012-10-18 at 15:09 -0300, Alvaro Herrera wrote:
   Jeff, do you think we need more review of this patch?
 
  In the patch, it refers to rd_options without checking for NULL first,
  which needs to be fixed.
 
  There's actually still one place where it says id rather than is.
  Just a nitpick.
 
  Regarding my point 4 from the previous email, I mildly disagree with the
  style, but I don't see a correctness problem there.
 
  If the first two items are fixed, then the patch is fine with me.
 

 First two items are fixed in attached version of the patch.

So the patch is ready for committer now?

I notice there's no documentation about the new reloption at all?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-12-08 Thread Andres Freund
On 2012-11-05 22:10:50 -0800, Jeff Davis wrote:
 On Mon, 2012-11-05 at 11:12 -0300, Alvaro Herrera wrote:
  What's going on with this patch?  I haven't seen any activity in a
  while.  Should I just move this to the next commitfest?

 Sorry, I dropped the ball here. I will still review it, whether it makes
 this commitfest or not.

Sorry to nag, but it starts to look like it might fall of the end of the
next CF...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-12-08 Thread Andres Freund
Hi Alexander,

On 2012-11-04 11:41:48 -0800, Jeff Davis wrote:
 On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote:

  Right version of patch is attached.
 
 * In bounds_adjacent, there's no reason to flip the labels back.
 * Comment should indicate more explicitly that bounds_adjacent is
 sensitive to the argument order.
 * In bounds_adjacent, it appears that bound1 corresponds to B in the
 comment above, and bound2 corresponds to A in the comment above. I
 would have guessed from reading the comment that bound1 corresponded to
 A. We should just use consistent names between the comment and the code
 (e.g. boundA and boundB).
 * I could be getting confused, but I think that line 645 of
 rangetypes_spgist.c should be inverted (!bounds_adjacent(...)).
 * I think needPrevious should have an explanatory comment somewhere. It
 looks like you are using it to store some state as you descend the tree,
 but it doesn't look like it's used to reconstruct the value (because we
 already have the value anyway). Since it's being used for a purpose
 other than what's intended, that should be explained.

Do you have time to address these comments or should the patch marked
as returned with Feedback? Afaics there hasn't been progress since
CF2...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-08 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 The issue I raised above is just about keeping the pg_depend entries
 pointing to something valid... And not changing the indexes pg_class.oid
 seems to be the easiest solution for that.

Yeah, we would have to update pg_depend, pg_constraint, maybe some other
places if we go with that.  I think that would be safe because we'd be
holding ShareRowExclusive lock on the parent table throughout, so nobody
else should be doing anything that's critically dependent on seeing such
rows.  But it'd be a lot of ugly code, for sure.

Maybe the best way is to admit that we need a short-term exclusive lock
for the swapping step.  Or we could wait for MVCC catalog access ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: auto updatable views [Review of Patch]

2012-12-08 Thread Tom Lane
Continuing to work on this ... I found multiple things I didn't like
about the permission-field update code.  Attached are some heavily
commented extracts from the code as I've changed it.  Does anybody
object to either the code or the objectives given in the comments?

regards, tom lane


/*
 * adjust_view_column_set - map a set of column numbers according to targetlist
 *
 * This is used with simply-updatable views to map column-permissions sets for
 * the view columns onto the matching columns in the underlying base relation.
 * The targetlist is expected to be a list of plain Vars of the underlying
 * relation (as per the checks above in view_is_auto_updatable).
 */
static Bitmapset *
adjust_view_column_set(Bitmapset *cols, List *targetlist)
{
Bitmapset  *result = NULL;
Bitmapset  *tmpcols;
AttrNumber  col;

tmpcols = bms_copy(cols);
while ((col = bms_first_member(tmpcols)) = 0)
{
/* bit numbers are offset by FirstLowInvalidHeapAttributeNumber 
*/
AttrNumber  attno = col + 
FirstLowInvalidHeapAttributeNumber;

if (attno == InvalidAttrNumber)
{
/*
 * There's a whole-row reference to the view.  For 
permissions
 * purposes, treat it as a reference to each column 
available from
 * the view.  (We should *not* convert this to a 
whole-row
 * reference to the base relation, since the view may 
not touch
 * all columns of the base relation.)
 */
ListCell   *lc;

foreach(lc, targetlist)
{
TargetEntry *tle = (TargetEntry *) lfirst(lc);
Var*var;

if (tle-resjunk)
continue;
var = (Var *) tle-expr;
Assert(IsA(var, Var));
result = bms_add_member(result,
 var-varattno - 
FirstLowInvalidHeapAttributeNumber);
}
}
else
{
/*
 * Views do not have system columns, so we do not 
expect to see
 * any other system attnos here.  If we do find one, 
the error
 * case will apply.
 */
TargetEntry *tle = get_tle_by_resno(targetlist, attno);

if (tle != NULL  IsA(tle-expr, Var))
{
Var*var = (Var *) tle-expr;

result = bms_add_member(result,
 var-varattno - 
FirstLowInvalidHeapAttributeNumber);
}
else
elog(ERROR, attribute number %d not found in 
view targetlist,
 attno);
}
}
bms_free(tmpcols);

return result;
}



/*
 * Mark the new target RTE for the permissions checks that we want to
 * enforce against the view owner, as distinct from the query caller.  
At
 * the relation level, require the same INSERT/UPDATE/DELETE permissions
 * that the query caller needs against the view.  We drop the ACL_SELECT
 * bit that is presumably in new_rte-requiredPerms initially.
 *
 * Note: the original view RTE remains in the query's rangetable list.
 * Although it will be unused in the query plan, we need it there so 
that
 * the executor still performs appropriate permissions checks for the
 * query caller's use of the view.
 */
new_rte-checkAsUser = view-rd_rel-relowner;
new_rte-requiredPerms = view_rte-requiredPerms;

/*
 * Now for the per-column permissions bits.
 *
 * Initially, new_rte contains selectedCols permission check bits for 
all
 * base-rel columns referenced by the view, but since the view is a 
SELECT
 * query its modifiedCols is empty.  We set modifiedCols to include all
 * the columns the outer query is trying to modify, adjusting the column
 * numbers as needed.  But we leave selectedCols as-is, so the view 
owner
 * must have read permission for all columns used in the view 
definition,
 * even if some of them are not read by the upper query.  We could try 
to
 * limit selectedCols to only columns used in the transformed query, but
 * that does not correspond to what happens in ordinary SELECT usage of 
a
  

Re: [HACKERS] review: pgbench - aggregation of info written into log

2012-12-08 Thread Andres Freund
Hi Tomas,

On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote:
  attached is a v4 of the patch. There are not many changes, mostly some
  simple tidying up, except for handling the Windows.

After a quick look I am not sure what all the talk about windows is
about? instr_time.h seems to provide all you need, even for windows? The
only issue of gettimeofday() for windows seems to be that it is that its
not all that fast an not too high precision, but that shouldn't be a
problem in this case?

Could you expand a bit on the problems?

  * I had a problem with doc

The current patch has conflict markers in the sgml source, there seems
to have been some unresolved merge. Maybe that's all that causes the
errors?

Whats your problem with setting up the doc toolchain?

 issues:

 * empty lines with invisible chars (tabs) + and sometimes empty lines
 after and before {}

 * adjustment of start_time

 +   * the desired interval */
 +   while (agg-start_time
 + agg_interval  INSTR_TIME_GET_DOUBLE(now))
 +
 agg-start_time = agg-start_time + agg_interval;

 can skip one interval - so when transaction time will be larger or
 similar to agg_interval - then results can be strange. We have to know
 real length of interval

Could you post a patch that adresses these issues?


Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel pg_dump

2012-12-08 Thread Andres Freund
Hi,

On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote:

 On 10/13/2012 10:46 PM, Andrew Dunstan wrote:
 
 On 09/17/2012 10:01 PM, Joachim Wieland wrote:
 On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland j...@mcknight.de
 wrote:
 Attached is a rebased version of the parallel pg_dump patch.
 Attached is another rebased version for the current commitfest.
 
 These did not apply cleanly, but I have fixed them up. The combined diff
 against git tip is attached. It can also be pulled from my parallel_dump
 branch on https://github.com/adunstan/postgresql-dev.git This builds and
 runs OK on Linux, which is a start ...

 Well, you would also need this piece if you're applying the patch (sometimes
 I forget to do git add ...)

The patch is marked as Ready for Committer in the CF app, but at least
the whole windows situation seems to be unresolved as of yet?

Is anybody working on this? I would *love* to get this...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: auto updatable views [Review of Patch]

2012-12-08 Thread Dean Rasheed
On 8 December 2012 15:21, Tom Lane t...@sss.pgh.pa.us wrote:
 Continuing to work on this ... I found multiple things I didn't like
 about the permission-field update code.  Attached are some heavily
 commented extracts from the code as I've changed it.  Does anybody
 object to either the code or the objectives given in the comments?

Thanks for looking at this. The new, adjusted code looks good to me.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing SSL connection due to weird interaction with openssl

2012-12-08 Thread Andres Freund
On 2012-11-26 21:45:32 -0500, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  I gather that this is supposed to be back-patched to all supported
  branches.

 FWIW, I don't like that patch any better than Robert does.  It seems
 as likely to do harm as good.  If there are places where libpq itself
 is leaving entries on the error stack, we should fix them -- retail.
 But it's not our business to clobber global state because there might
 be bugs in some other part of an application.

As there hasn't been any new input since this comment I am marking the
patch as Rejected in the CF application.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-08 Thread Tomas Vondra
On 8.12.2012 15:49, Tomas Vondra wrote:
 On 8.12.2012 15:26, Andres Freund wrote:
 On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote:
 I've re-run the tests with the current patch on my home workstation, and
 the results are these (again 10k tables, dropped either one-by-one or in
 batches of 100).

 1) unpatched

 dropping one-by-one:15.5 seconds
 dropping in batches of 100: 12.3 sec

 2) patched (v3.1)

 dropping one-by-one:32.8 seconds
 dropping in batches of 100:  3.0 sec

 The problem here is that when dropping the tables one-by-one, the
 bsearch overhead is tremendous and significantly increases the runtime.
 I've done a simple check (if dropping a single table, use the original
 simple comparison) and I got this:

 3) patched (v3.2)

 dropping one-by-one:16.0 seconds
 dropping in batches of 100:  3.3 sec

 i.e. the best of both. But it seems like an unnecessary complexity to me
 - if you need to drop a lot of tables you'll probably do that in a
 transaction anyway.


 Imo that's still a pretty bad performance difference. And your
 single-table optimization will probably fall short as soon as the table
 has indexes and/or a toast table...
 
 Why? I haven't checked the code but either those objects are droppped
 one-by-one (which seems unlikely) or they're added to the pending list
 and then the new code will kick in, making it actually faster.
 
 Yes, the original code might be faster even for 2 or 3 objects, but I
 can't think of a simple solution to fix this, given that it really
 depends on CPU/RAM speed and shared_buffers size.

I've done some test and yes - once there are other objects the
optimization falls short. For example for tables with one index, it
looks like this:

  1) unpatched

  one by one:  28.9 s
  100 batches: 23.9 s

  2) patched

  one by one:  44.1 s
  100 batches:  4.7 s

So the patched code is by about 50% slower, but this difference quickly
disappears with the number of indexes / toast tables etc.

I see this as an argument AGAINST such special-case optimization. My
reasoning is this:

* This difference is significant only if you're dropping a table with
  low number of indexes / toast tables. In reality this is not going to
  be very frequent.

* If you're dropping a single table, it really does not matter - the
  difference will be like 100 ms vs. 200 ms or something like that.

* If you're dropping a lot of tables, you should do than in a
  transaction anyway, or you should be aware that doing that in a
  transaction will be faster (we can add this info into DROP TABLE
  docs).

IMHO this is similar to doing a lot of INSERTs - doing all of them in a
single transaction is faster. The possibility that someone needs to drop
a lot of tables in separate transactions exists in theory, but I don't
know of a realistic real-world usage.

So I'd vote to ditch that special case optimization.

 +
 +/*
 + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), 
 so
 + * that it's suitable for bsearch.
 + */
 +static int
 +rnode_comparator(const void * p1, const void * p2)
 +{
 +   RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
 +   RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
 +
 +   if (n1.node.relNode  n2.node.relNode)
 +   return -1;
 +   else if (n1.node.relNode  n2.node.relNode)
 +   return 1;
 +
 +   if (n1.node.dbNode  n2.node.dbNode)
 +   return -1;
 +   else if (n1.node.dbNode  n2.node.dbNode)
 +   return 1;
 +
 +   if (n1.node.spcNode  n2.node.spcNode)
 +   return -1;
 +   else if (n1.node.spcNode  n2.node.spcNode)
 +   return 1;
 +   else
 +   return 0;
 +}

 ISTM that this whole comparator could be replaced by memcmp(). That
 could quite possibly lower the overhead of the bsearch() in simple
 cases. We already rely on the fact that RelFileNode's have no padding
 atm (c.f. buf_internal.h), so a memcmp() seems fine to me.
 
 Good point, I'll try that. The original code used a macro that simply
 compared the fields and I copied that logic, but if we can remove the
 code ...

Hmmm, I've replaced the comparator with this one:

static int
rnode_comparator(const void * p1, const void * p2)
{
return memcmp(p1, p2, sizeof(RelFileNode));
}

and while it's not significantly faster (actually it's often a bit
slower than the original comparator), it's a much simpler code.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel pg_dump

2012-12-08 Thread Andrew Dunstan


On 12/08/2012 11:01 AM, Andres Freund wrote:

Hi,

On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote:

On 10/13/2012 10:46 PM, Andrew Dunstan wrote:

On 09/17/2012 10:01 PM, Joachim Wieland wrote:

On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland j...@mcknight.de
wrote:

Attached is a rebased version of the parallel pg_dump patch.

Attached is another rebased version for the current commitfest.

These did not apply cleanly, but I have fixed them up. The combined diff
against git tip is attached. It can also be pulled from my parallel_dump
branch on https://github.com/adunstan/postgresql-dev.git This builds and
runs OK on Linux, which is a start ...

Well, you would also need this piece if you're applying the patch (sometimes
I forget to do git add ...)

The patch is marked as Ready for Committer in the CF app, but at least
the whole windows situation seems to be unresolved as of yet?

Is anybody working on this? I would *love* to get this...





I am working on it when I get a chance, but keep getting hammered. I'd 
love somebody else to review it too.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: pgbench - aggregation of info written into log

2012-12-08 Thread Tomas Vondra
On 8.12.2012 16:33, Andres Freund wrote:
 Hi Tomas,
 
 On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote:
 attached is a v4 of the patch. There are not many changes, mostly some
 simple tidying up, except for handling the Windows.
 
 After a quick look I am not sure what all the talk about windows is
 about? instr_time.h seems to provide all you need, even for windows? The
 only issue of gettimeofday() for windows seems to be that it is that its
 not all that fast an not too high precision, but that shouldn't be a
 problem in this case?
 
 Could you expand a bit on the problems?

Well, in the current pgbench.c, there's this piece of code

#ifndef WIN32
/* This is more than we really ought to know about instr_time */
fprintf(logfile, %d %d %.0f %d %ld %ld\n,
st-id, st-cnt, usec, st-use_file,
(long) now.tv_sec, (long) now.tv_usec);
#else
/* On Windows, instr_time doesn't provide a timestamp anyway */
fprintf(logfile, %d %d %.0f %d 0 0\n,
st-id, st-cnt, usec, st-use_file);
#endif

and the Windows-related discussion in this patch is about the same issue.

As I understand it the problem seems to be that INSTR_TIME_SET_CURRENT
does not provide the timestamp at all.

I was thinking about improving this by combining gettimeofday and
INSTR_TIME, but I have no Windows box to test it on :-(

 
 * I had a problem with doc
 
 The current patch has conflict markers in the sgml source, there seems
 to have been some unresolved merge. Maybe that's all that causes the
 errors?

Ah, I see. Probably my fault, I'll fix that.

 Whats your problem with setting up the doc toolchain?
 
 issues:

 * empty lines with invisible chars (tabs) + and sometimes empty lines
 after and before {}

 * adjustment of start_time

 +   * the desired interval */
 +   while (agg-start_time
 + agg_interval  INSTR_TIME_GET_DOUBLE(now))
 +
 agg-start_time = agg-start_time + agg_interval;

 can skip one interval - so when transaction time will be larger or
 similar to agg_interval - then results can be strange. We have to know
 real length of interval
 
 Could you post a patch that adresses these issues?

Yes, I'll work on it today/tomorrow and I'll send an improved patch.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2012-11 Progress

2012-12-08 Thread Andres Freund
Hi Noah,

On 2012-12-08 09:06:01 -0500, Noah Misch wrote:
 We're entering the last planned week of the CF.  Apart from a 72-hour period
 in mid-November, some CF has remained in-progress continuously for the last
 176 days.  With 64 out of the 82 current patches unresolved, we're on track to
 again see no gap between CF 2012-11 and CF 2013-01.  The process as practiced
 today is a shell of its former self[1][2].  Authors, reviewers, committers:
 has the process changed to something better, worthy of formally replacing its
 predecessor?  Or, rather, do we need a fresh process correction?

I agree that lately there seems to be minimal benefit of the commitfest
process. I don't see many disadvantages either, but...

No idea how to fix that, it seems most people are simply swamped with
work and stuff.

I made a pass through most of the commitfest entries to update their
state to something sensible and where it seemed necessary inquired the
newest status.
Not sure thats really welcome, but it was the only thing I could think
of helping to make some progress.


Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-12-08 Thread Boszormenyi Zoltan

2012-12-08 15:30 keltezéssel, Andres Freund írta:

On 2012-10-18 22:40:15 +0200, Boszormenyi Zoltan wrote:

2012-10-18 20:08 keltezéssel, Tom Lane írta:

Alvaro Herrera alvhe...@2ndquadrant.com writes:

Boszormenyi Zoltan escribió:

this is the latest one, fixing a bug in the accounting
of per-statement lock timeout handling and tweaking
some comments.

Tom, are you able to give this patch some more time on this commitfest?

I'm still hoping to get to it, but I've been spending a lot of time on
bug fixing rather than patch review lately :-(.  If you're hoping to
close out the current CF soon, maybe we should just slip it to the next
one.

Fine by me. Thanks.

According to this the current state of the patch should be Ready for
Committer not Needs Review is that right? I changed the state for
now...


Thanks. I just tried the patch on current GIT HEAD and
gives some offset warnings but no rejects. Also, it compiles
without warnings and still works as it should.
Should I post a new patch that applies cleanly?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-12-08 Thread Tom Lane
Boszormenyi Zoltan z...@cybertec.at writes:
 Thanks. I just tried the patch on current GIT HEAD and
 gives some offset warnings but no rejects. Also, it compiles
 without warnings and still works as it should.
 Should I post a new patch that applies cleanly?

Offsets are not a problem --- if you tried to keep them exact you'd just
be making a lot of unnecessary updates.  If there were fuzz warnings I'd
be more concerned.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Sketch of a Hook into the Logging Collector

2012-12-08 Thread Daniel Farina
Hello all,

I have some needs that seem to support changing Postgres slightly to
give user programs a lot more power over how to process logging output
that neither the log collector nor the syslog output can well-satisfy
as-is.

I am approaching this from the angle of increasing power by exposing
the log collector (syslogger) pipe protocol.

Included is a patch whose general aesthetic is to touch as little of
the logging code as possible while achieving maximum power in a
contrib and stand-alone sample program.  However, a more satisfying
treatment that may be slightly more invasive to the logging system is
very welcome.

The general idea here is that the logging collector pipe protocol is a
pretty reasonable one for non-Postgres programs to consume as-is
pretty easily.  The proof of concept consists of three things:

* A hook into the log collector.  It is small but a little weird
  compared to the other hooks in the system.

* A contrib, pg_logcollectdup.  This contrib lets one forward logs to
  a named pipe specified in postgresql.conf.

* A stand-alone program, pg_logcollect_sample, that renders protocol
  traffic newline separated and with its binary fields converted to
  readable text.

The patch can also be seen from https://github.com/fdr/postgres.git in
the branch 'logdup'.  I do rebase this at-will for the time being.

I have a few detailed dissatisfactions with this approach, but I'd
rather hear your own dissatisfactions.  I also don't like the name.

A demo of configuring all of the above and seeing some output follows.
I can paste this in one go on my machine.  It creates /tmp/pg-logdup
and /tmp/pgdata-logdup.  All spawned processes can be listed
afterwards with 'jobs'.

# Install postgres and pg_logcollectdup contrib
./configure --prefix=/tmp/pg-logdup
make -sj16 install
cd contrib/pg_logcollectdup/
make -s install
/tmp/pg-logdup/bin/initdb -D /tmp/pgdata-logdup
mkfifo /tmp/pgdata-logdup/log-pipe

# Configure postgresql.conf

echo logging_collector = on  \
 /tmp/pgdata-logdup/postgresql.conf

echo shared_preload_libraries = 'pg_logcollectdup'  \
/tmp/pgdata-logdup/postgresql.conf

echo logcollectdup.destination = '/tmp/pgdata-logdup/log-pipe'  \
/tmp/pgdata-logdup/postgresql.conf

# Set up destination pipe
mkfifo /tmp/pgdata-logdup/log-pipe

# Build sample pipe formatter
make pg_logcollect_sample

# Run it in the background
./pg_logcollect_sample\
 /tmp/pgdata-logdup/log-pipe \
 /tmp/pgdata-logdup/duplogs.txt 

# Run Postgres with a non-default port for convenience
/tmp/pg-logdup/bin/postgres -D /tmp/pgdata-logdup --port=2345 

# Prevent race against file creation, then look at the logs
sleep 1
cat /tmp/pgdata-logdup/duplogs.txt

--
fdr


log-collector-extension-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sketch of a Hook into the Logging Collector

2012-12-08 Thread Daniel Farina
On Sat, Dec 8, 2012 at 10:40 AM, Daniel Farina dan...@heroku.com wrote:
 * A contrib, pg_logcollectdup.  This contrib lets one forward logs to
   a named pipe specified in postgresql.conf.

I have revised this part in the attached patch.  It's some error
handling in a case of user error, and the previous demo script and
narrative precepts are still the same.

--
fdr


log-collector-extension-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel pg_dump

2012-12-08 Thread Bruce Momjian
On Sat, Dec  8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote:
 
 On 12/08/2012 11:01 AM, Andres Freund wrote:
 Hi,
 
 On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote:
 On 10/13/2012 10:46 PM, Andrew Dunstan wrote:
 On 09/17/2012 10:01 PM, Joachim Wieland wrote:
 On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland j...@mcknight.de
 wrote:
 Attached is a rebased version of the parallel pg_dump patch.
 Attached is another rebased version for the current commitfest.
 These did not apply cleanly, but I have fixed them up. The combined diff
 against git tip is attached. It can also be pulled from my parallel_dump
 branch on https://github.com/adunstan/postgresql-dev.git This builds and
 runs OK on Linux, which is a start ...
 Well, you would also need this piece if you're applying the patch (sometimes
 I forget to do git add ...)
 The patch is marked as Ready for Committer in the CF app, but at least
 the whole windows situation seems to be unresolved as of yet?
 
 Is anybody working on this? I would *love* to get this...
 
 
 
 
 I am working on it when I get a chance, but keep getting hammered.
 I'd love somebody else to review it too.

FYI, I will be posting pg_upgrade performance numbers using Unix
processes.  I will try to get the Windows code working but will also
need help.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-08 Thread Stephen Frost
Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
 Visibility of pre-hinted data is an issue and because of that we
 previously agreed that it would be allowed only when explicitly
 requested by the user, which has been implemented as the FREEZE option
 on COPY. This then makes it identical to TRUNCATE, where a separate
 command differentiates MVCC-compliant row removal (DELETE) from
 non-MVCC row removal (TRUNCATE).

To be honest, I really don't buy off on this.  Yes, TRUNCATE (a
top-level, individually permissioned command) can violate MVCC and make
things which might have been visible to a given transaction no longer
visible to that transaction.  I find that very different from making
rows which should *not* be visible suddenly appear.

 So the committed feature does address the visibility issue. 

Not hardly.  It lets a user completely violate the basic rules of the
overall database.  The *correct* solution to this problem is to actually
*fix* it, by setting it up such that tables created after a particular
transaction starts aren't visible and similar.  Not by punting to the
user with very little control over it (or so I'm guessing).  Will we
accept a patch to do an INSERT or UPDATE FREEZE...?  I realize this
might be a bit of a stretch, but why not allow February 31st to be
accepted as a date also, should the user request it?  This is quite a
slippery slope, in my view.

 Jeff has
 been speaking about an extension to that behaviour, which would be to
 allow HEAP_XMIN_COMMITTED to be set in some cases also. The committed
 feature specifically does not do that.

It's not clear to me why you *wouldn't* just go ahead and set everything
you can at the same time...?  It hardly seems like it could be worse
than what is apparently going to happen with this command...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-08 Thread Jan Wieck

On 12/6/2012 12:45 PM, Robert Haas wrote:

On Wed, Dec 5, 2012 at 10:16 PM, Jan Wieck janwi...@yahoo.com wrote:

That sort of dynamic approach would indeed be interesting. But I fear that
it is going to be complex at best. The amount of time spent in scanning
heavily depends on the visibility map. The initial vacuum scan of a table
can take hours or more, but it does update the visibility map even if the
vacuum itself is aborted later. The next vacuum may scan that table in
almost no time at all, because it skips all blocks that are marked all
visible.


Well, if that's true, then there's little reason to worry about giving
up quickly, because the next autovacuum a minute later won't consume
many resources.


Almost no time is of course relative to what an actual scan and dead 
tuple removal cost. Looking at a table with 3 GB of dead tuples at the 
end, the initial vacuum scan takes hours. When vacuum comes back to this 
table, cleaning up a couple megabytes of newly deceased tuples and then 
skipping over the all visible pages may take a minute.


Based on the discussion and what I feel is a consensus I have created an 
updated patch that has no GUC at all. The hard coded parameters in 
include/postmaster/autovacuum.h are


AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL  20 /* ms */
AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL   50 /* ms */
AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */

I gave that the worst workload I can think of. A pgbench (style) 
application that throws about 10 transactions per second at it, so that 
there is constantly the need to give up the lock due to conflicting lock 
requests and then reacquiring it again. A cleanup process is 
periodically moving old tuples from the history table to an archive 
table, making history a rolling window table. And a third job that 2-3 
times per minute produces a 10 second lasting transaction, forcing 
autovacuum to give up on the lock reacquisition.


Even with that workload autovacuum slow but steady is chopping away at 
the table.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c9253a9..fd3e91e 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 52,62 
--- 52,64 
  #include storage/bufmgr.h
  #include storage/freespace.h
  #include storage/lmgr.h
+ #include storage/proc.h
  #include utils/lsyscache.h
  #include utils/memutils.h
  #include utils/pg_rusage.h
  #include utils/timestamp.h
  #include utils/tqual.h
+ #include portability/instr_time.h
  
  
  /*
*** typedef struct LVRelStats
*** 103,108 
--- 105,111 
  	ItemPointer dead_tuples;	/* array of ItemPointerData */
  	int			num_index_scans;
  	TransactionId latestRemovedXid;
+ 	bool		lock_waiter_detected;
  } LVRelStats;
  
  
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 193,198 
--- 196,203 
  	vacrelstats-old_rel_pages = onerel-rd_rel-relpages;
  	vacrelstats-old_rel_tuples = onerel-rd_rel-reltuples;
  	vacrelstats-num_index_scans = 0;
+ 	vacrelstats-pages_removed = 0;
+ 	vacrelstats-lock_waiter_detected = false;
  
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, RowExclusiveLock, nindexes, Irel);
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 259,268 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/* report results to the stats collector, too */
! 	pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
--- 264,281 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/*
! 	 * report results to the stats collector, too.
! 	 * An early terminated lazy_truncate_heap attempt 
! 	 * suppresses the message and also cancels the
! 	 * execution of ANALYZE, if that was ordered.
! 	 */
! 	if (!vacrelstats-lock_waiter_detected)
! 		pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
+ 	else
+ 		vacstmt-options = ~VACOPT_ANALYZE;
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
*** lazy_truncate_heap(Relation onerel, LVRe
*** 1255,1334 
  	BlockNumber old_rel_pages = vacrelstats-rel_pages;
  	BlockNumber new_rel_pages;
  	PGRUsage	ru0;
  
  	pg_rusage_init(ru0);
  
  	/*
! 	 * We need full exclusive lock on the relation in order to do truncation.
! 	 * If we can't get it, give up rather than waiting --- we don't want to
! 	 * block other backends, and we don't want to deadlock (which is quite
! 	 * possible considering we already hold a lower-grade lock).
! 	 */
! 	if (!ConditionalLockRelation(onerel, AccessExclusiveLock))
! 		return;
! 
! 	/*
! 	 * Now that 

Re: [HACKERS] Proof of concept: auto updatable views [Review of Patch]

2012-12-08 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 Attached is a rebased patch using new OIDs.

Applied after a fair amount of hacking.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PGCon 2013 - call for papers

2012-12-08 Thread Dan Langille
PGCon 2013 will be on 23-24 May 2013 at University of Ottawa.

This year, we are planning to have an un-conference day around PGCon.
This is currently being scheduled.  More information on the
un-conference will be available within a few weeks.

NOTE: the un-conference day content will be set on the day by those turning up
on that day.  We expect heavy attendance by developers and users of PostgreSQL.

We are now accepting proposals for the main part of the conference (23-24 May).
Proposals can be quite simple. We do not require academic-style papers.

If you are doing something interesting with PostgreSQL, please submit
a proposal.  You might be one of the backend hackers or work on a
PostgreSQL related project and want to share your know-how with
others. You might be developing an interesting system using PostgreSQL
as the foundation. Perhaps you migrated from another database to
PostgreSQL and would like to share details.  These, and other stories
are welcome. Both users and developers are encouraged to share their
experiences.

Here are a some ideas to jump start your proposal process:

- novel ways in which PostgreSQL is used
- migration of production systems from another database
- data warehousing
- tuning PostgreSQL for different work loads
- replication and clustering
- hacking the PostgreSQL code
- PostgreSQL derivatives and forks
- applications built around PostgreSQL
- benchmarking and performance engineering
- case studies
- location-aware and mapping software with PostGIS
- The latest PostgreSQL features and features in development
- research and teaching with PostgreSQL
- things the PostgreSQL project could do better
- integrating PostgreSQL with 3rd-party software

Both users and developers are encouraged to share their experiences.

The schedule is:

1 Dec 2012 Proposal acceptance begins
19 Jan 2013 Proposal acceptance ends
19 Feb 2013 Confirmation of accepted proposals

NOTE: the call for lightning talks will go out very close to the conference.
Do not submit lightning talks proposals until then.

See also http://www.pgcon.org/2013/papers.php

Instructions for submitting a proposal to PGCon 2013 are available
from: http://www.pgcon.org/2013/submissions.php

-- 
Dan Langille - http://langille.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: pgbench - aggregation of info written into log

2012-12-08 Thread Tomas Vondra
Hi,

attached is a v5 of this patch. Details below:

On 8.12.2012 16:33, Andres Freund wrote:
 Hi Tomas,
 
 On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote:
 attached is a v4 of the patch. There are not many changes, mostly some
 simple tidying up, except for handling the Windows.
 
 After a quick look I am not sure what all the talk about windows is
 about? instr_time.h seems to provide all you need, even for windows? The
 only issue of gettimeofday() for windows seems to be that it is that its
 not all that fast an not too high precision, but that shouldn't be a
 problem in this case?
 
 Could you expand a bit on the problems?

As explained in the previous message, this is an existing problem with
unavailable timestamp. I'm not very fond of adding Linux-only features,
but fixing that was not the goal of this patch.

 * I had a problem with doc
 
 The current patch has conflict markers in the sgml source, there seems
 to have been some unresolved merge. Maybe that's all that causes the
 errors?
 
 Whats your problem with setting up the doc toolchain?

Yeah, my fault because of incomplete merge. But the real culprit was a
missing refsect2. Fixed.

 
 issues:

 * empty lines with invisible chars (tabs) + and sometimes empty lines
 after and before {}

Fixed (I've removed the lines etc.)


 * adjustment of start_time

 +   * the desired interval */
 +   while (agg-start_time
 + agg_interval  INSTR_TIME_GET_DOUBLE(now))
 +
 agg-start_time = agg-start_time + agg_interval;

 can skip one interval - so when transaction time will be larger or
 similar to agg_interval - then results can be strange. We have to know
 real length of interval
 
 Could you post a patch that adresses these issues?

So, in the end I've rewritten the section advancing the start_time. Now
it works so that when skipping an interval (because of a very long
transaction), it will print lines even for those empty intervals.

So for example with a transaction file containing a single query

   SELECT pg_sleep(1.5);

and an interval length of 1 second, you'll get something like this:

1355009677 0 0 0 0 0
1355009678 1 1501028 2253085056784 1501028 1501028
1355009679 0 0 0 0 0
1355009680 1 1500790 2252370624100 1500790 1500790
1355009681 1 1500723 2252169522729 1500723 1500723
1355009682 0 0 0 0 0
1355009683 1 1500719 2252157516961 1500719 1500719
1355009684 1 1500703 2252109494209 1500703 1500703
1355009685 0 0 0 0 0

which is IMHO the best way to deal with this.

I've fixed several minor issues, added a few comments.

regards
Tomas
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..5a03796 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -150,6 +150,7 @@ char   *index_tablespace = NULL;
 #define naccounts  10
 
 bool   use_log;/* log transaction latencies to 
a file */
+intagg_interval;   /* log aggregates instead of 
individual transactions */
 bool   is_connect; /* establish connection for 
each transaction */
 bool   is_latencies;   /* report per-command latencies */
 intmain_pid;   /* main process id used 
in log filename */
@@ -245,6 +246,18 @@ typedef struct
char   *argv[MAX_ARGS]; /* command word list */
 } Command;
 
+typedef struct
+{
+
+   longstart_time; /* when does the interval start 
*/
+   int cnt;/* number of transactions */
+   double  min_duration;   /* min/max durations */
+   double  max_duration;
+   double  sum;/* sum(duration), 
sum(duration^2) - for estimates */
+   double  sum2;
+   
+} AggVals;
+
 static Command **sql_files[MAX_FILES]; /* SQL script files */
 static int num_files;  /* number of script files */
 static int num_commands = 0;   /* total number of Command structs */
@@ -377,6 +390,10 @@ usage(void)
 -l   write transaction times to log file\n
 --sampling-rate NUM\n
  fraction of transactions to log (e.g. 0.01 
for 1%% sample)\n
+#ifndef WIN32
+--aggregate-interval NUM\n
+ aggregate data over NUM seconds\n
+#endif
 -M simple|extended|prepared\n
  protocol for submitting queries to server 
(default: simple)\n
 -n   do not run VACUUM before tests\n
@@ -830,9 +847,25 @@ clientDone(CState *st, bool ok)
return false;   /* always false */
 }
 
+static
+void agg_vals_init(AggVals * aggs, instr_time start)
+{
+   /* basic counters */
+   aggs-cnt = 0;  /* number of transactions */
+   aggs-sum = 

Re: [HACKERS] too much pgbench init output

2012-12-08 Thread Tomas Vondra
On 20.11.2012 08:22, Jeevan Chalke wrote:
 Hi,
 
 
 On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 On 19.11.2012 11:59, Jeevan Chalke wrote:
  Hi,
 
  I gone through the discussion for this patch and here is my review:
 
  The main aim of this patch is to reduce the number of log lines. It is
  also suggested to use an options to provide the interval but few of us
  are not much agree on it. So final discussion ended at keeping 5 sec
  interval between each log line.
 
  However, I see, there are two types of users here:
  1. Who likes these log lines, so that they can troubleshoot some
  slowness and all
  2. Who do not like these log lines.
 
 Who likes these lines / needs them for something useful?
 
 
 No idea. I fall in second category.
 
 But from the discussion, I believe some people may need detailed (or lot
 more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

  (1) it's rather pointless to print a message every 100k rows, as it
  usually fills the console with garbabe

  (2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
-q) to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

  So keeping these in mind, I rather go for an option which will control
  this. People falling in category one can set this option to very low
  where as users falling under second category can keep it high.
 
 So what option(s) would you expect? Something that tunes the interval
 length or something else?
 
 
 Interval length.

Well, I can surely imagine something like --interval N.

 A switch that'd choose between the old and new behavior might be a good
 idea, but I'd strongly vote against automagic heuristics. It makes the
 behavior very difficult to predict and I really don't want to force the
 users to wonder whether the long delay is due to general slowness of the
 machine or some clever logic that causes long delays between log
 messages.
 
 That's why I choose a very simple approach with constant time interval.
 It does what I was aiming for (less logs) and it's easy to predict.
 Sure, we could choose different interval (or make it an option).
 
 
 I am preferring an option for choosing an interval, say from 1 second to
 10 seconds.

U, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

 BTW, what if, we put one log message every 10% (or 5%) with time taken
 (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
 This will have only 10 (or 20) lines per pgbench initialisation.
 And since we are showing time taken for each block, if any slowness
 happens, one can easily find a block by looking at the timings and
 troubleshoot it.
 Though 10% or 5% is again a debatable number, but keeping it constant
 will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

Tomas
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..334ce4c 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -135,6 +135,11 @@ intunlogged_tables = 0;
 double sample_rate = 0.0;
 
 /*
+ * logging steps (seconds between log messages)
+ */
+intlog_step_seconds = 5;
+
+/*
  * tablespace selection
  */
 char  *tablespace = NULL;
@@ -1362,6 +1367,11 @@ init(bool is_no_vacuum)
charsql[256];
int i;
 
+   /* used to track elapsed time and estimate of the remaining time */
+   instr_time  start, diff;
+   double  elapsed_sec, remaining_sec;
+   int log_interval = 1;
+
if ((con = doConnect()) == NULL)
exit(1);
 
@@ -1430,6 +1440,8 @@ init(bool is_no_vacuum)
}
PQclear(res);
 
+   INSTR_TIME_SET_CURRENT(start);
+
for (i = 0; i  naccounts * scale; i++)
{
int j = i + 1;
@@ -1441,10 +1453,27 @@ init(bool is_no_vacuum)
exit(1);
}
 
-   if (j % 10 == 0)
-   fprintf(stderr, %d of %d tuples (%d%%) done.\n,
-   j, naccounts * scale,
-   

Re: [HACKERS] parallel pg_dump

2012-12-08 Thread Joachim Wieland
On Sat, Dec 8, 2012 at 3:05 PM, Bruce Momjian br...@momjian.us wrote:

 On Sat, Dec  8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote:
  I am working on it when I get a chance, but keep getting hammered.
  I'd love somebody else to review it too.

 FYI, I will be posting pg_upgrade performance numbers using Unix
 processes.  I will try to get the Windows code working but will also
 need help.


Just let me know if there's anything I can help you guys with.


Joachim


Re: [HACKERS] Review: Patch to compute Max LSN of Data Pages

2012-12-08 Thread Amit kapila
On Saturday, December 08, 2012 9:44 AM Tom Lane wrote:
Amit kapila amit.kap...@huawei.com writes:
 On Friday, December 07, 2012 7:43 PM Muhammad Usama wrote:
 Although I am thinking why are you disallowing the absolute path of file. 
 Any particular reason?

 The reason to disallow absolute path is that, we need to test on multiple 
 platforms and to keep the scope little less.

This argument seems to me to be completely nuts.  What's wrong with an
absolute path?  Wouldn't you have to go out of your way to disallow it?

There's nothing wrong with absolute path. I have updated the patch to work for 
absolute path as well.
Updated patch attached with this mail.


With Regards,
Amit Kapila.

pg_computemaxlsn_v4.patch
Description: pg_computemaxlsn_v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-08 Thread Kohei KaiGai
2012/12/7 Simon Riggs si...@2ndquadrant.com:
 On 5 December 2012 11:16, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 * TRUNCATE works, and allows you to remove all rows of a table, even
 ones you can't see to run a DELETE on. Er...

 It was my oversight. My preference is to rewrite TRUNCATE command
 with DELETE statement in case when row-security policy is active on
 the target table.
 In this case, a NOTICE message may be helpful for users not to assume
 the table is always empty after the command.

 I think the default must be to throw an ERROR, since part of the
 contract with TRUNCATE is that it is fast and removes storage.

OK. Does the default imply you are suggesting configurable
behavior using GUC or something?
I think both of the behaviors are reasonable from security point
of view, as long as user cannot remove unprivileged rows.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-08 Thread Kohei KaiGai
2012/12/7 Simon Riggs si...@2ndquadrant.com:
 On 5 December 2012 11:16, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Oracle defaults to putting VPD on all event types: INSERT, UPDATE,
 DELETE, SELECT. ISTM we should be doing the same, not just say we can
 add an INSERT trigger if you want.

 Adding a trigger just begs the question as to why we are bothering in
 the first place, since this functionality could already be added by
 INSERT, UPDATE or DELETE triggers, if they are a full replacement for
 this feature. The only answer is ease of use

 We can easily add syntax like this

 [ROW SECURITY CHECK (  ) [ON [ ALL | INSERT, UPDATE, DELETE, SELECT 
 [..,

 with the default being ALL

 I think it is flaw of Oracle. :-)

 Agreed

 In case when user can define leakable function, it enables to leak contents
 of invisible rows at the timing when executor fetch the rows, prior to
 modification
 stage, even if we allows to configure individual row-security policies
 for SELECT
 and DELETE or UPDATE commands.
 My preference is one policy on a particular table for all the commands.

 Yes, only one security policy allowed.

 Question is, should we offer the option to enforce it on a subset of
 command types.

 That isn't anything I can see a need for myself.

It is not hard to support a feature not to apply security policy on
particular command types, from implementation perspective.
So, my preference is to support only the behavior corresponding
to above ALL option, then support per commands basis when
we got strong demands.
How about your thought?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers