Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-04-14 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 14 Apr 2015 12:10:35 +0900, Etsuro Fujita fujita.ets...@lab.ntt.co.jp 
wrote in 552c852b.2050...@lab.ntt.co.jp
 On 2015/04/13 23:25, Jim Nasby wrote:
  On 4/13/15 4:58 AM, Etsuro Fujita wrote:
  On 2015/04/10 21:40, Etsuro Fujita wrote:
  On 2015/04/09 12:07, Etsuro Fujita wrote:
  I'll update the patch, which will contain only an infrastructure for
  this in the PG core, and will not contain any postgres_fdw change.
 
  I updated the patch based on your comments.  Updated patch attached.  In
  the patch the following FDW APIs have been proposed:
 
  + RowMarkType
  + GetForeignRowMarkType (LockClauseStrength strength);
 
  + bool
  + LockForeignRow (EState *estate,
  + ExecRowMark *erm,
  + ItemPointer tupleid);
 
  + HeapTuple
  + FetchForeignRow (EState *estate,
  +  ExecRowMark *erm,
  +  ItemPointer tupleid);
 
  I think that these APIs allow the FDW that has TIDs to use the rowmark
  options such as ROW_MARK_REFERENCE, ROW_MARK_SHARE and
  ROW_MARK_EXCLUSIVE for its foreign tables so as to match the local
  semantics exactly, for example.
 
  As you mentioned, it would be better to add helper functions to see
  whether the foreign table is referenced by any ExecRowMarks.  ISTM that
  an easy way to do that is to modify ExecFindRowMark() so that it allows
  for the missing case.  I didn't contain such functions in the patch, 
  though.
 
  I added that function and modified docs a bit.  Please find attached an
  updated version of the patch.
  
  Why aren't we allowing SELECT FOR KEY SHARE?
 
 I omitted that mode (and the FOR NO KEY UPDATE mode) for simplicity, but
 both modes have been allowed.  However, I'm not sure if those modes are
 useful because we don't have information about keys for a remote table.

If I understand this correctly, the two lock modes are no
relation with key column definitions, and are provided as a
weaker lock in exchange for some risks. Like advisory locks. we
can FOR_NO_KEY_UPDATE in the trunsactions that evetually update
key columns but also should accept the unexpected result. In
other words, the key in the context of FOR NO KEY UPDATE and
FOR KEY SHARE is only in the programmers' minds.

Apart from feasibility, I don't see no resason to omit them if
this is correct.

As an example, the following operations cause an unexpected
result.

Ex. 1
Session A=# create table t (a int primary key, b int);
Session A=# insert into t (select a, 1 from generate_series(0, 99) a);
Session A=# begin;
Session A=# select * from t where a = 1 for no key update;

Session B=# begin;
Session B=# select * from t where a = 1 for key share;
Session B=# update t set a = -a where a = 1;
session B is blocked

Ex. 2
Session A=# create table t (a int, b int); -- a is the key in mind.
Session A=# insert into t (select a, 1 from generate_series(0, 99) a);
Session A=# begin;
Session A=# select * from t where a = 1 for no key update;

Session B=# begin;
Session B=# select * from t where a = 1 for key share;

Session A=# update t set a = -a where a = 1;
UPDATE 1
Session A=# commit;

Session B=# select * from t where a = 1;
(0 rows)  -- Woops.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-04-14 Thread Shigeru HANADA
KaiGai-san,

2015/04/14 14:04、Kouhei Kaigai kai...@ak.jp.nec.com のメール:
 
 * Fix typos
 
 Please review the v11 patch, and mark it as “ready for committer” if it’s ok.
 
 It's OK for me, and wants to be reviewed by other people to get it committed.
 

Thanks!

 In addition to essential features, I tried to implement relation listing in 
 EXPLAIN
 output.
 
 Attached explain_forein_join.patch adds capability to show join combination 
 of
 a ForeignScan in EXPLAIN output as an additional item “Relations”.  I thought
 that using array to list relations is a good way too, but I chose one string 
 value
 because users would like to know order and type of joins too.
 
 A bit different from my expectation... I expected to display name of the local
 foreign tables (and its alias), not remote one, because all the local join 
 logic
 displays local foreign tables name.
 Is it easy to adjust isn't it? Probably, all you need to do is, putting a 
 local
 relation name on the text buffer (at deparseSelectSql) instead of the deparsed
 remote relation.

Oops, that’s right.  Attached is the revised version.  I chose fully qualified 
name, schema.relname [alias] for the output.  It would waste some cycles during 
planning if that is not for EXPLAIN, but it seems difficult to get a list of 
name of relations in ExplainForeignScan() phase, because planning information 
has gone away at that time.



--
Shigeru HANADA
shigeru.han...@gmail.com


explain_foreign_join_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread Michael Paquier
Hi all,

I noticed that src/bin/initdb/t/001_initdb.pl uses directly rm via a
system() call like that:
system_or_bail rm -rf '$tempdir'/*;

This way of doing is not portable, particularly on platforms that do
not have rm like... Windows where the equivalent is del. And we could
actually use remove_tree with its option keep_root to get the same
effect in pure perl as mentioned here:
http://perldoc.perl.org/File/Path.html
With this formulation:
remove_tree($tempdir, {keep_root = 1});

Attached is a patch doing that.
Regards,
-- 
Michael
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index d12be84..85db9ff 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -1,6 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
+use File::Path qw(remove_tree);
 use Test::More tests = 19;
 
 my $tempdir = TestLib::tempdir;
@@ -18,27 +19,27 @@ command_fails([ 'initdb', '-S', $tempdir/data3 ],
 mkdir $tempdir/data4 or BAIL_OUT($!);
 command_ok([ 'initdb', $tempdir/data4 ], 'existing empty data directory');
 
-system_or_bail rm -rf '$tempdir'/*;
+remove_tree($tempdir, {keep_root = 1});
 
 command_ok([ 'initdb', '-X', $tempdir/pgxlog, $tempdir/data ],
 	'separate xlog directory');
 
-system_or_bail rm -rf '$tempdir'/*;
+remove_tree($tempdir, {keep_root = 1});
 command_fails(
 	[ 'initdb', $tempdir/data, '-X', 'pgxlog' ],
 	'relative xlog directory not allowed');
 
-system_or_bail rm -rf '$tempdir'/*;
+remove_tree($tempdir, {keep_root = 1});
 mkdir $tempdir/pgxlog;
 command_ok([ 'initdb', '-X', $tempdir/pgxlog, $tempdir/data ],
 	'existing empty xlog directory');
 
-system_or_bail rm -rf '$tempdir'/*;
+remove_tree($tempdir, {keep_root = 1});
 mkdir $tempdir/pgxlog;
 mkdir $tempdir/pgxlog/lost+found;
 command_fails([ 'initdb', '-X', $tempdir/pgxlog, $tempdir/data ],
 	'existing nonempty xlog directory');
 
-system_or_bail rm -rf '$tempdir'/*;
+remove_tree($tempdir, {keep_root = 1});
 command_ok([ 'initdb', '-T', 'german', $tempdir/data ],
 	'select default dictionary');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2015-04-14 Thread Etsuro Fujita
On 2015/03/23 2:57, Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 [ fdw-inh-8.patch ]
 
 I've committed this with some substantial rearrangements, notably:
 
 * I thought that if we were doing this at all, we should go all the way
 and allow foreign tables to be both inheritance parents and children.

I found that when setting a foreign table to be the parent of an
inheritance set that only contains foreign tables, SELECT FOR UPDATE on
the inheritance parent fails with a can't-happen error condition.  Here
is an example:

$ createdb mydb
$ psql mydb
psql (9.5devel)
Type help for help.

mydb=# create table t1 (c1 int);
CREATE TABLE
mydb=# create table t2 (c1 int);
CREATE TABLE

$ psql postgres
psql (9.5devel)
Type help for help.

postgres=# create extension postgres_fdw;
CREATE EXTENSION
postgres=# create server myserver foreign data wrapper postgres_fdw
options (dbname 'mydb');
CREATE SERVER
postgres=# create user mapping for current_user server myserver;
CREATE USER MAPPING
postgres=# create foreign table ft1 (c1 int) server myserver options
(table_name 't1');
CREATE FOREIGN TABLE
postgres=# create foreign table ft2 (c1 int) server myserver options
(table_name 't2');
CREATE FOREIGN TABLE
postgres=# alter foreign table ft2 inherit ft1;
ALTER FOREIGN TABLE
postgres=# select * from ft1 for update;
ERROR:  could not find junk tableoid1 column

I think this is a bug.  Attached is a patch fixing this issue.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 3193,3218  select * from bar where f1 in (select f1 from foo) for update;
QUERY PLAN  
  --
   LockRows
!Output: bar.f1, bar.f2, bar.ctid, bar.tableoid, bar.*, foo.ctid, foo.tableoid, foo.*
 -  Hash Join
!  Output: bar.f1, bar.f2, bar.ctid, bar.tableoid, bar.*, foo.ctid, foo.tableoid, foo.*
   Hash Cond: (bar.f1 = foo.f1)
   -  Append
 -  Seq Scan on public.bar
!  Output: bar.f1, bar.f2, bar.ctid, bar.tableoid, bar.*
 -  Foreign Scan on public.bar2
!  Output: bar2.f1, bar2.f2, bar2.ctid, bar2.tableoid, bar2.*
   Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
   -  Hash
!Output: foo.ctid, foo.tableoid, foo.*, foo.f1
 -  HashAggregate
!  Output: foo.ctid, foo.tableoid, foo.*, foo.f1
   Group Key: foo.f1
   -  Append
 -  Seq Scan on public.foo
!  Output: foo.ctid, foo.tableoid, foo.*, foo.f1
 -  Foreign Scan on public.foo2
!  Output: foo2.ctid, foo2.tableoid, foo2.*, foo2.f1
   Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
  (22 rows)
  
--- 3193,3218 
QUERY PLAN  
  --
   LockRows
!Output: bar.f1, bar.f2, bar.ctid, bar.*, bar.tableoid, foo.ctid, foo.*, foo.tableoid
 -  Hash Join
!  Output: bar.f1, bar.f2, bar.ctid, bar.*, bar.tableoid, foo.ctid, foo.*, foo.tableoid
   Hash Cond: (bar.f1 = foo.f1)
   -  Append
 -  Seq Scan on public.bar
!  Output: bar.f1, bar.f2, bar.ctid, bar.*, bar.tableoid
 -  Foreign Scan on public.bar2
!  Output: bar2.f1, bar2.f2, bar2.ctid, bar2.*, bar2.tableoid
   Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
   -  Hash
!Output: foo.ctid, foo.*, foo.tableoid, foo.f1
 -  HashAggregate
!  Output: foo.ctid, foo.*, foo.tableoid, foo.f1
   Group Key: foo.f1
   -  Append
 -  Seq Scan on public.foo
!  Output: foo.ctid, foo.*, foo.tableoid, foo.f1
 -  Foreign Scan on public.foo2
!  Output: foo2.ctid, foo2.*, foo2.tableoid, foo2.f1
   Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
  (22 rows)
  
***
*** 3230,3255  select * from bar where f1 in (select f1 from foo) for share;
QUERY PLAN  
  --
   LockRows
!Output: bar.f1, bar.f2, bar.ctid, bar.tableoid, bar.*, foo.ctid, foo.tableoid, foo.*

Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Magnus Hagander
On Tue, Apr 14, 2015 at 4:50 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 04/14/2015 05:42 AM, Robert Haas wrote:

 On Sun, Apr 12, 2015 at 8:38 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 As to RLS - yeah, that's where I think a lot of the possible covert

 channel attacks are.  But it doesn't have to be RLS per se.  It can
 be, for example, a table some of whose contents you expose via a
 security barrier view.  It can be a security-definer function that you
 call and it returns some data that you don't have rights to view
 directly.  Basically, it can be any table to which you have some
 access, but not complete access.  Then you can use timing attacks,
 scrutinize EXPLAIN plans for clues, look at CTIDs, and so on.

 Basically, I'm worried that it's going to be completely impractical to
 prevent a certain amount of information leakage when you have partial
 access to a table, and that in a largely-futile effort to try to
 prevent it, we'll end up making a whole bunch of things (like the WAL
 insert position) super-user only, and that this will in fact be a net
 reduction in security because it'll encourage people to use the
 superuser account more.

 Perhaps that concern is ill-founded, but that's what I'm worried about.


 I'm not a big fan of locking down WAL position information either. If we
 have to treat the current WAL position is security-sensitive information,
 we're doing something wrong.

 But I don't want to just give up either. One option is to make this
 controllable on a per-table basis, as Amit suggested. Then we could always
 disable it for pg_authid, add a suitable warning to the docs, and let the
 DBA enable/disable it for other tables. It's a bit of a cop-out, but would
 fix the immediate problem.


I think it's a fairly narrow attack vector. As long as we document it
clearly, and make it easy enough to turn it off, I think that's definitely
enough. Most people will not care at all, I'm sure - but we need to cater
to those that do.

I think we could probably even get away with just documenting the risk and
having people turn off the compression *completely* if they care about it,
but if we can do it at a table level, that's obviously a lot better.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 On Tue, Apr 14, 2015 at 4:50 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
  I'm not a big fan of locking down WAL position information either. If we
  have to treat the current WAL position is security-sensitive information,
  we're doing something wrong.

I really don't see the downside to limiting who can see that
information.  I meant to mention it in my email last night with the
patch allowing the GRANT system to work for other functions, but I added
the functions suggested by Michael to the list of ones which don't have
the default ACL of 'execute to PUBLIC' to that set.

  But I don't want to just give up either. One option is to make this
  controllable on a per-table basis, as Amit suggested. Then we could always
  disable it for pg_authid, add a suitable warning to the docs, and let the
  DBA enable/disable it for other tables. It's a bit of a cop-out, but would
  fix the immediate problem.
 
 I think it's a fairly narrow attack vector. As long as we document it
 clearly, and make it easy enough to turn it off, I think that's definitely
 enough. Most people will not care at all, I'm sure - but we need to cater
 to those that do.

I agree that it's something to document, but I don't think it's sensible
to have the default be at risk.  That said, as long as there's a way
to restrict access to these functions (without having to fight with the
system on upgrades, etc, to have that be preserved..) then I'm not going
to push too hard about the default.  We need a proper hardening
guide anyway, this would just need to be included in that documentation.

 I think we could probably even get away with just documenting the risk and
 having people turn off the compression *completely* if they care about it,
 but if we can do it at a table level, that's obviously a lot better.

I *really* don't like the idea that the fix for such an information
disclosure risk is to completely disable an extremely useful feature.
In fact, I find the suggestion of it rather ridiculous.

I do like the idea of being able to control this on a per-table basis
as that's a useful feature, but that's completely independent from this
concern.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Use Intel SSE 4.2 CRC instructions where available.

2015-04-14 Thread Simon Riggs
On 14 April 2015 at 11:34, Heikki Linnakangas hlinn...@iki.fi wrote:

 Or we can punt and always build the version with
 the runtime check, unless overridden manually at configure command line.

That seems safer as the default, which is what the original patch did.

We can optimise that for known good builds later if desired.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread David E. Wheeler
On Apr 14, 2015, at 9:05 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 http://perldoc.perl.org/File/Path.html
 With this formulation:
 remove_tree($tempdir, {keep_root = 1});
 
 Does Perl 5.8 have this?

Yes, it does.

  http://cpansearch.perl.org/src/NWCLARK/perl-5.8.9/lib/File/Path.pm

Best,

David

smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] deparsing utility commands

2015-04-14 Thread David Steele
On 4/9/15 12:14 PM, Alvaro Herrera wrote:
 Alvaro Herrera wrote:
 Executive summary:

 There is now a CommandDeparse_hook;
 deparse_utility_command is provided as an extension, intended for 9.6;
 rest of patch would be pushed to 9.5.
 
 Actually here's another approach I like better: use a new pseudotype,
 pg_ddl_command, which internally is just a pointer to a CollectedCommand
 struct.  We cannot give access to the pointer directly in SQL, so much
 like type internal or pg_node_tree the in/out functions should just
 error out.  But the type can be returned as a column in
 pg_event_trigger_ddl_command.  An extension can create a function that
 receives that type as argument, and return a different representation of
 the command; the third patch creates a function ddl_deparse_to_json()
 which does that.

I've tested this approach and it returns the same results as the
previous approach for all calls to pg_event_trigger_ddl_command() made
by the pg_audit regression tests.  For my testing I was generally only
applying patch 0001 and 0002 (although 0001 on it's own also worked for
my case).  I applied patch 0003 only for the purpose of being sure it
did not break anything, and did not specifically test the functionality
of 0002 or 0003.

However, I've tested 0001 over a very wide range of commands and have
not found any anomalous behaviors.

I've reviewed the 0001 and 0002 patches and don't have anything to add
beyond what has been mentioned in previous reviews.  You've applied your
pattern consistently across a wide variety of utility commands which is
something of a feat.

Since 0003 will not be committed to core, I was more interested in the
mechanism used to connect 0003 to 0001 and 0002.  I like the new
approach better than the old one in that there is no connecting hook
now.  As you say, it provides more flexibility in terms of using the
command data in as many ways as you like.  Even better, the user has
access to three different representations and can choose which one is
best for them.  And, of course, they can write code for their own
representation using pg_ddl_command, JSON, or SQL.

I also like the way the patch has been broken up.  The core
functionality will finally make event triggers truly useful in a
high-level language like pl/pgsql.  I can see many uses for them now
that pg_event_trigger_ddl_command() is present.

From my review and testing, I see no barriers to committing patches 0001
and 0002.  I'd love to see this functionality in 9.5.

Regards,
-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] BRIN range operator class

2015-04-14 Thread Emre Hasegeli
 Judging from a quick look, I think patches 1 and 5 can be committed
 quickly; they imply no changes to other parts of BRIN.  (Not sure why 1
 and 5 are separate.  Any reason for this?)  Also patch 2.

Not much reason except that 1 includes only functions, but 5 includes operators.

 Patch 4 looks like a simple bugfix (or maybe a generalization) of BRIN
 framework code; should also be committable right away.  Needs a closer
 look of course.

 Patch 3 is a problem.  That code is there because the union proc is only
 used in a corner case in Minmax, so if we remove it, user-written Union
 procs are very likely to remain buggy for long.  If you have a better
 idea to test Union in Minmax, or some other way to turn that stuff off
 for the range stuff, I'm all ears.  Just lets make sure the support
 procs are tested to avoid stupid bugs.  Before I introduced that, my
 Minmax Union proc was all wrong.

I removed this test because I don't see a way to support it.  I
believe any other implementation that is more complicated than minmax
will fail in there.  It is better to cache them with the regression
tests, so I tried to improve them.  GiST, SP-GiST and GIN don't have
similar checks, but they have more complicated user defined functions.

 Patch 7 I don't understand.  Will have to look closer.  Are you saying
 Minmax will depend on Btree opclasses?  I remember thinking in doing it
 that way at some point, but wasn't convinced for some reason.

No, there isn't any additional dependency.  It makes minmax operator
classes use the procedures from the pg_amop instead of adding them to
pg_amproc.

It also makes the operator class safer for cross data type usage.
Actually, I just checked and find out that we got wrong answers from
index on the current master without this patch.  You can reproduce it
with this query on the regression database:

select * from brintest where timestampcol = '1979-01-29 11:05:09'::timestamptz;

inclusion-opclasses patch make it possible to add cross type brin
regression tests.  I will add more of them on the next version.

 Patch 6 seems the real meat of your own stuff.  I think there should be
 a patch 8 also but it's not attached ... ??

I had another commit not to intended to be sent.  Sorry about that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Use Intel SSE 4.2 CRC instructions where available.

2015-04-14 Thread Heikki Linnakangas

On 04/14/2015 07:01 PM, Simon Riggs wrote:

On 14 April 2015 at 11:34, Heikki Linnakangas hlinn...@iki.fi wrote:


Or we can punt and always build the version with
the runtime check, unless overridden manually at configure command line.


That seems safer as the default, which is what the original patch did.

We can optimise that for known good builds later if desired.


I committed the __SSE_4_2__ test for now, so we still build the SSE-only 
version if the target allows that. We'll see how that works..


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2015-04-14 Thread Jim Nasby
On 4/14/15 5:49 AM, Etsuro Fujita wrote:
 postgres=# create foreign table ft1 (c1 int) server myserver options
 (table_name 't1');
 CREATE FOREIGN TABLE
 postgres=# create foreign table ft2 (c1 int) server myserver options
 (table_name 't2');
 CREATE FOREIGN TABLE
 postgres=# alter foreign table ft2 inherit ft1;
 ALTER FOREIGN TABLE
 postgres=# select * from ft1 for update;
 ERROR:  could not find junk tableoid1 column
 
 I think this is a bug.  Attached is a patch fixing this issue.

What happens when the foreign side breaks the inheritance? Does the FDW
somehow know to check that fact for each query?

What do you gain from having the local table have inheritance?

Basically, I think we have to be very careful about implementing
features that imply the local side knows something about the persisted
state of the remote side (in this case, whether there's inheritance).
Anything like that sets us up for synchronization problems.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-04-14 Thread Jim Nasby

On 4/14/15 1:05 AM, Kyotaro HORIGUCHI wrote:

As an example, the following operations cause an unexpected
result.

Ex. 1
Session A=# create table t (a int primary key, b int);
Session A=# insert into t (select a, 1 from generate_series(0, 99) a);
Session A=# begin;
Session A=# select * from t where a = 1 for no key update;

Session B=# begin;
Session B=# select * from t where a = 1 for key share;
Session B=# update t set a = -a where a = 1;
session B is blocked

Ex. 2
Session A=# create table t (a int, b int); -- a is the key in mind.
Session A=# insert into t (select a, 1 from generate_series(0, 99) a);
Session A=# begin;
Session A=# select * from t where a = 1 for no key update;

Session B=# begin;
Session B=# select * from t where a = 1 for key share;

Session A=# update t set a = -a where a = 1;
UPDATE 1
Session A=# commit;

Session B=# select * from t where a = 1;
(0 rows)  -- Woops.


Those results are indeed surprising, but since we allow it in a direct 
connection I don't see why we wouldn't allow it in the Postgres FDW...


As for the FDW not knowing about keys, why would it need to? If you try 
to do something illegal it's the remote side that should throw the 
error, not the FDW.


Of course, if you try to do a locking operation on an FDW that doesn't 
support it, the FDW should throw an error... but that's different.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread Alvaro Herrera
Michael Paquier wrote:
 Hi all,
 
 I noticed that src/bin/initdb/t/001_initdb.pl uses directly rm via a
 system() call like that:
 system_or_bail rm -rf '$tempdir'/*;
 
 This way of doing is not portable, particularly on platforms that do
 not have rm like... Windows where the equivalent is del. And we could
 actually use remove_tree with its option keep_root to get the same
 effect in pure perl as mentioned here:
 http://perldoc.perl.org/File/Path.html
 With this formulation:
 remove_tree($tempdir, {keep_root = 1});

Does Perl 5.8 have this?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Heikki Linnakangas

On 04/14/2015 05:42 AM, Robert Haas wrote:

On Sun, Apr 12, 2015 at 8:38 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

Care to name some? This is certainly quite cumbersome to exploit, but it's
doable.

We've talked a lot about covert channels and timing attacks on RLS, but this
makes me more worried because you can attack passwords stored in pg_authid.


Well, one thing to think about is that, if we're going to take this
kind of attack seriously, it could be used on user data, too.  I mean,
you've just got to be able to get a row into the same block as the row
you want to find out something about, and there's no reason that need
be true only for pg_authid.


Sure.


And, even if you ban access to information on the pg_xlog insert
location, what's to prevent someone from gleaning similar information
via a timing attack on the compression itself?


Yep, that's another vector. Even more fiddly than the record size, but 
nevertheless.



You can even get some information from figuring
out, by trial and error, how much you need to expand a row to get it
to spill over into another block.  If there happens to be enough
free-space on the page where the row is located to allow a HOT update,
you can repeatedly update it until it gets moved to some unrelated
page.  Granted, knowing the length of an unseen row isn't as good as
knowing the contents, but it's still potentially useful information.


True as well. That's not an issue for the MD5 hashes stored in 
pg_authid, they all have the same length, but it can be for application 
data.



As to RLS - yeah, that's where I think a lot of the possible covert
channel attacks are.  But it doesn't have to be RLS per se.  It can
be, for example, a table some of whose contents you expose via a
security barrier view.  It can be a security-definer function that you
call and it returns some data that you don't have rights to view
directly.  Basically, it can be any table to which you have some
access, but not complete access.  Then you can use timing attacks,
scrutinize EXPLAIN plans for clues, look at CTIDs, and so on.

Basically, I'm worried that it's going to be completely impractical to
prevent a certain amount of information leakage when you have partial
access to a table, and that in a largely-futile effort to try to
prevent it, we'll end up making a whole bunch of things (like the WAL
insert position) super-user only, and that this will in fact be a net
reduction in security because it'll encourage people to use the
superuser account more.

Perhaps that concern is ill-founded, but that's what I'm worried about.


I'm not a big fan of locking down WAL position information either. If we 
have to treat the current WAL position is security-sensitive 
information, we're doing something wrong.


But I don't want to just give up either. One option is to make this 
controllable on a per-table basis, as Amit suggested. Then we could 
always disable it for pg_authid, add a suitable warning to the docs, and 
let the DBA enable/disable it for other tables. It's a bit of a cop-out, 
but would fix the immediate problem.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Use Intel SSE 4.2 CRC instructions where available.

2015-04-14 Thread Heikki Linnakangas

On 04/14/2015 06:28 PM, Simon Riggs wrote:

On 14 April 2015 at 10:09, Heikki Linnakangas heikki.linnakan...@iki.fi wrote:


Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.


Did the heavy modifications have any affect on the patch behaviour, or
was this just related to where you would like to put the code?


Didn't affect behaviour.

Hmm, the buildfarm animals using Intel C compiler didn't like this 
patch. The problem seems to be that unlike on gcc and clang, icc always 
has the SSE 4.2 intrinsics (_mm_crc32_u64, _mm_crc32_u8 etc.), even when 
the target CPU architecture is not SSE 4.2. On gcc/clang, those 
intrinsics are not defined unless you build with -msse4.2.


I'll try to find a fix. I think we could use the __SSE4_2__ define to 
check whether SSE4.2 is targeted. Or we can punt and always build the 
version with the runtime check, unless overridden manually at configure 
command line.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-14 Thread David Steele
On 4/14/15 7:13 PM, Tatsuo Ishii wrote:
 This patch does not apply cleanly due to the moving of pgbench (patch
 to filelist.sgml failed).

Thank you for pointing that out!

Ironic that it was the commit directly after the one I was testing with
that broke the patch.  It appears the end of the last CF is a very bad
time to be behind HEAD.

Fixed in attached v8 patch.

-- 
- David Steele
da...@pgmasters.net
diff --git a/contrib/Makefile b/contrib/Makefile
index d63e441..ed9cf6a 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -28,6 +28,7 @@ SUBDIRS = \
oid2name\
pageinspect \
passwordcheck   \
+   pg_audit\
pg_buffercache  \
pg_freespacemap \
pg_prewarm  \
diff --git a/contrib/pg_audit/.gitignore b/contrib/pg_audit/.gitignore
new file mode 100644
index 000..a5267cf
--- /dev/null
+++ b/contrib/pg_audit/.gitignore
@@ -0,0 +1,5 @@
+log/
+results/
+tmp_check/
+regression.diffs
+regression.out
diff --git a/contrib/pg_audit/Makefile b/contrib/pg_audit/Makefile
new file mode 100644
index 000..7b36011
--- /dev/null
+++ b/contrib/pg_audit/Makefile
@@ -0,0 +1,21 @@
+# pg_audit/Makefile
+
+MODULE = pg_audit
+MODULE_big = pg_audit
+OBJS = pg_audit.o
+
+EXTENSION = pg_audit
+REGRESS = pg_audit
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/pg_audit/pg_audit.conf
+DATA = pg_audit--1.0.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_audit
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_audit/expected/pg_audit-deparse.out 
b/contrib/pg_audit/expected/pg_audit-deparse.out
new file mode 100644
index 000..66fd20d
--- /dev/null
+++ b/contrib/pg_audit/expected/pg_audit-deparse.out
@@ -0,0 +1,897 @@
+-- Load pg_audit module
+create extension pg_audit;
+--
+-- Create a superuser role that we know the name of for testing
+CREATE USER super SUPERUSER;
+\connect contrib_regression super;
+--
+-- Create auditor role
+CREATE ROLE auditor;
+--
+-- Create first test user
+CREATE USER user1;
+ALTER ROLE user1 SET pg_audit.log = 'ddl, ROLE';
+ALTER ROLE user1 SET pg_audit.log_notice = on;
+--
+-- Create, select, drop (select will not be audited)
+\connect contrib_regression user1
+CREATE TABLE public.test (id INT);
+NOTICE:  AUDIT: SESSION,1,1,DDL,CREATE TABLE,TABLE,public.test,CREATE TABLE 
public.test (id INT);
+SELECT * FROM test;
+ id 
+
+(0 rows)
+
+DROP TABLE test;
+NOTICE:  AUDIT: SESSION,2,1,DDL,DROP TABLE,TABLE,public.test,DROP TABLE test;
+--
+-- Create second test user
+\connect contrib_regression super
+CREATE USER user2;
+ALTER ROLE user2 SET pg_audit.log = 'Read, writE';
+ALTER ROLE user2 SET pg_audit.log_notice = on;
+ALTER ROLE user2 SET pg_audit.role = auditor;
+\connect contrib_regression user2
+CREATE TABLE test2 (id INT);
+GRANT SELECT ON TABLE public.test2 TO auditor;
+--
+-- Role-based tests
+CREATE TABLE test3
+(
+   id INT
+);
+SELECT count(*)
+  FROM
+(
+   SELECT relname
+ FROM pg_class
+ LIMIT 1
+) SUBQUERY;
+ count 
+---
+ 1
+(1 row)
+
+SELECT *
+  FROM test3, test2;
+NOTICE:  AUDIT: SESSION,1,1,READ,SELECT,,,SELECT *
+  FROM test3, test2;
+NOTICE:  AUDIT: OBJECT,1,1,READ,SELECT,TABLE,public.test2,SELECT *
+  FROM test3, test2;
+ id | id 
++
+(0 rows)
+
+GRANT INSERT
+   ON TABLE public.test3
+   TO auditor;
+--
+-- Object logged because of:
+-- insert on test3
+-- select on test2
+WITH CTE AS
+(
+   SELECT id
+ FROM test2
+)
+INSERT INTO test3
+SELECT id
+  FROM cte;
+NOTICE:  AUDIT: SESSION,2,1,WRITE,INSERT,,,WITH CTE AS
+(
+   SELECT id
+ FROM test2
+)
+INSERT INTO test3
+SELECT id
+  FROM cte;
+NOTICE:  AUDIT: OBJECT,2,1,WRITE,INSERT,TABLE,public.test3,WITH CTE AS
+(
+   SELECT id
+ FROM test2
+)
+INSERT INTO test3
+SELECT id
+  FROM cte;
+NOTICE:  AUDIT: OBJECT,2,1,READ,SELECT,TABLE,public.test2,WITH CTE AS
+(
+   SELECT id
+ FROM test2
+)
+INSERT INTO test3
+SELECT id
+  FROM cte;
+--
+-- Object logged because of:
+-- insert on test3
+WITH CTE AS
+(
+   INSERT INTO test3 VALUES (1)
+  RETURNING id
+)
+INSERT INTO test2
+SELECT id
+  FROM cte;
+NOTICE:  AUDIT: SESSION,3,1,WRITE,INSERT,,,WITH CTE AS
+(
+   INSERT INTO test3 VALUES (1)
+  RETURNING id
+)
+INSERT INTO test2
+SELECT id
+  FROM cte;
+NOTICE:  AUDIT: OBJECT,3,1,WRITE,INSERT,TABLE,public.test3,WITH CTE AS
+(
+   INSERT INTO test3 VALUES (1)
+  RETURNING id
+)
+INSERT INTO test2
+SELECT id
+  FROM cte;
+GRANT UPDATE ON TABLE public.test2 TO auditor;
+--
+-- Object logged because of:
+-- insert on test3
+-- update on test2
+WITH CTE AS
+(
+   UPDATE test2
+  SET id = 1
+   RETURNING id
+)
+INSERT INTO test3
+SELECT id
+  FROM cte;
+NOTICE:  

Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread Michael Paquier
On Wed, Apr 15, 2015 at 5:29 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 David E. Wheeler wrote:
 On Apr 14, 2015, at 1:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

  Castoroides has 5.8.4.  Oops.

 WUT.

 Yeah, eh?  Anyway I don't think it matters much: just don't enable TAP
 tests on machines with obsolete Perl.  I think this is fine since 5.8's
 latest release is supported.

And castoroides is not using --enable-tap-tests in the buildfarm btw.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2015-04-14 Thread Kyotaro HORIGUCHI
Hi,

Before suppressing the symptom, I doubt the necessity and/or
validity of giving foreign tables an ability to be a parent. Is
there any reasonable usage for the ability?

I think we should choose to inhibit foreign tables from becoming
a parent rather than leaving it allowed then taking measures for
the consequent symptom.


regards,

At Tue, 14 Apr 2015 15:52:18 -0300, Alvaro Herrera alvhe...@2ndquadrant.com 
wrote in 20150414185218.gx4...@alvh.no-ip.org
 Jim Nasby wrote:
  On 4/14/15 5:49 AM, Etsuro Fujita wrote:
   postgres=# create foreign table ft1 (c1 int) server myserver options
   (table_name 't1');
   CREATE FOREIGN TABLE
   postgres=# create foreign table ft2 (c1 int) server myserver options
   (table_name 't2');
   CREATE FOREIGN TABLE
   postgres=# alter foreign table ft2 inherit ft1;
   ALTER FOREIGN TABLE
   postgres=# select * from ft1 for update;
   ERROR:  could not find junk tableoid1 column
   
   I think this is a bug.  Attached is a patch fixing this issue.
  
  What happens when the foreign side breaks the inheritance? Does the FDW
  somehow know to check that fact for each query?
 
 This is a meaningless question.  The remote tables don't have to have an
 inheritance relationship already; only the local side sees them as
 connected.
 
 I think the real question is whether we're now (I mean after this patch)
 emitting useless tableoid columns that we didn't previously have.  I
 think the answer is yes, and if so I think we need a smaller comb to fix
 the problem.  This one seems rather large.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-14 Thread Tatsuo Ishii
 Thank you for pointing that out!
 
 Ironic that it was the commit directly after the one I was testing with
 that broke the patch.  It appears the end of the last CF is a very bad
 time to be behind HEAD.
 
 Fixed in attached v8 patch.

Thank you for your quick response.

BTW, in my understanding pg_audit allows to track a table access even
if it's used in a view. I think this is a nice feature and it would be
better explicitly stated in the document and the test case is better
included in the regression test.

Here is a sample session:

CREATE TABLE test2 (id INT);
CREATE VIEW vtest2 AS SELECT * FROM test2;
GRANT SELECT ON TABLE public.test2 TO auditor;
GRANT SELECT ON TABLE public.vtest2 TO auditor;
SELECT * FROM vtest2;
NOTICE:  AUDIT: SESSION,1,1,READ,SELECT,,,SELECT * FROM vtest2;
NOTICE:  AUDIT: OBJECT,1,1,READ,SELECT,VIEW,public.vtest2,SELECT * FROM vtest2;
NOTICE:  AUDIT: OBJECT,1,1,READ,SELECT,TABLE,public.test2,SELECT * FROM vtest2;

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Michael Paquier
On Wed, Apr 15, 2015 at 11:10 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 12:00 AM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Apr 14, 2015 at 4:50 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 04/14/2015 05:42 AM, Robert Haas wrote:

 On Sun, Apr 12, 2015 at 8:38 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 As to RLS - yeah, that's where I think a lot of the possible covert

 channel attacks are.  But it doesn't have to be RLS per se.  It can
 be, for example, a table some of whose contents you expose via a
 security barrier view.  It can be a security-definer function that you
 call and it returns some data that you don't have rights to view
 directly.  Basically, it can be any table to which you have some
 access, but not complete access.  Then you can use timing attacks,
 scrutinize EXPLAIN plans for clues, look at CTIDs, and so on.

 Basically, I'm worried that it's going to be completely impractical to
 prevent a certain amount of information leakage when you have partial
 access to a table, and that in a largely-futile effort to try to
 prevent it, we'll end up making a whole bunch of things (like the WAL
 insert position) super-user only, and that this will in fact be a net
 reduction in security because it'll encourage people to use the
 superuser account more.

 Perhaps that concern is ill-founded, but that's what I'm worried about.


 I'm not a big fan of locking down WAL position information either. If we
 have to treat the current WAL position is security-sensitive information,
 we're doing something wrong.

 But I don't want to just give up either. One option is to make this
 controllable on a per-table basis, as Amit suggested. Then we could always
 disable it for pg_authid, add a suitable warning to the docs, and let the
 DBA enable/disable it for other tables. It's a bit of a cop-out, but would
 fix the immediate problem.


 I think it's a fairly narrow attack vector. As long as we document it
 clearly, and make it easy enough to turn it off, I think that's definitely
 enough. Most people will not care at all, I'm sure - but we need to cater to
 those that do.

 I think we could probably even get away with just documenting the risk and
 having people turn off the compression *completely* if they care about it,
 but if we can do it at a table level, that's obviously a lot better.

 +1

OK. I am fine to implement anything required here if needed, meaning
the following:
1) Doc patch to mention that it is possible that compression can give
hints to attackers when working on sensible fields that have a
non-fixed size.
2) Switch at relation level to control wal_compression. This needs to
change XLogRecordAssemble by adding some new logic to check if a
relation is enforcing WAL compression or not. As a reloption, there
are three possible values: true, false and fallback to system default.
Also, I think that we should simply extend XLogRegisterBuffer() and
pass to it the reloption flag that is then registered in
registered_buffer, and XLogRecordAssemble() decides with this flag if
block is compressed or not. Do we want to add this reloption switch to
indexes as well? Or only tables? For indexes things will get heavier
as we would need to add a parameter for all the index types.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Michael Paquier
On Wed, Apr 15, 2015 at 12:15 AM, Stephen Frost wrote:
 We need a proper hardening
 guide anyway, this would just need to be included in that documentation.

+1. I am sure that many users would like a hardening manual in the
official documentation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Fujii Masao
On Wed, Apr 15, 2015 at 12:00 AM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Apr 14, 2015 at 4:50 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 04/14/2015 05:42 AM, Robert Haas wrote:

 On Sun, Apr 12, 2015 at 8:38 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 As to RLS - yeah, that's where I think a lot of the possible covert

 channel attacks are.  But it doesn't have to be RLS per se.  It can
 be, for example, a table some of whose contents you expose via a
 security barrier view.  It can be a security-definer function that you
 call and it returns some data that you don't have rights to view
 directly.  Basically, it can be any table to which you have some
 access, but not complete access.  Then you can use timing attacks,
 scrutinize EXPLAIN plans for clues, look at CTIDs, and so on.

 Basically, I'm worried that it's going to be completely impractical to
 prevent a certain amount of information leakage when you have partial
 access to a table, and that in a largely-futile effort to try to
 prevent it, we'll end up making a whole bunch of things (like the WAL
 insert position) super-user only, and that this will in fact be a net
 reduction in security because it'll encourage people to use the
 superuser account more.

 Perhaps that concern is ill-founded, but that's what I'm worried about.


 I'm not a big fan of locking down WAL position information either. If we
 have to treat the current WAL position is security-sensitive information,
 we're doing something wrong.

 But I don't want to just give up either. One option is to make this
 controllable on a per-table basis, as Amit suggested. Then we could always
 disable it for pg_authid, add a suitable warning to the docs, and let the
 DBA enable/disable it for other tables. It's a bit of a cop-out, but would
 fix the immediate problem.


 I think it's a fairly narrow attack vector. As long as we document it
 clearly, and make it easy enough to turn it off, I think that's definitely
 enough. Most people will not care at all, I'm sure - but we need to cater to
 those that do.

 I think we could probably even get away with just documenting the risk and
 having people turn off the compression *completely* if they care about it,
 but if we can do it at a table level, that's obviously a lot better.

+1

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Michael Paquier
On Wed, Apr 15, 2015 at 11:55 AM, Michael Paquier wrote:
 OK. I am fine to implement anything required here if needed, meaning
 the following:
 1) Doc patch to mention that it is possible that compression can give
 hints to attackers when working on sensible fields that have a
 non-fixed size.
 2) Switch at relation level to control wal_compression. This needs to
 change XLogRecordAssemble by adding some new logic to check if a
 relation is enforcing WAL compression or not. As a reloption, there
 are three possible values: true, false and fallback to system default.
 Also, I think that we should simply extend XLogRegisterBuffer() and
 pass to it the reloption flag that is then registered in
 registered_buffer, and XLogRecordAssemble() decides with this flag if
 block is compressed or not. Do we want to add this reloption switch to
 indexes as well? Or only tables? For indexes things will get heavier
 as we would need to add a parameter for all the index types.

After looking at reloptions.c, what we are going to need as well is a
new relopt_type, like RELOPT_TYPE_ENUM for this purpose to be able to
define the 'default' value. We could as well have things using
RELOPT_TYPE_STRING. Any input on this matter is welcome.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-14 Thread Sawada Masahiko
On Wed, Apr 15, 2015 at 8:57 AM, David Steele da...@pgmasters.net wrote:
 On 4/14/15 7:13 PM, Tatsuo Ishii wrote:
 This patch does not apply cleanly due to the moving of pgbench (patch
 to filelist.sgml failed).

 Thank you for pointing that out!

 Ironic that it was the commit directly after the one I was testing with
 that broke the patch.  It appears the end of the last CF is a very bad
 time to be behind HEAD.

 Fixed in attached v8 patch.

Thank you for updating the patch!

I applied the patch and complied them successfully without WARNING.

I tested v8 patch with CURSOR case I mentioned before, and got
segmentation fault again.
Here are log messages in my environment,

=# select test();
 LOG:  server process (PID 29730) was terminated by signal 11:
Segmentation fault
DETAIL:  Failed process was running: select test();
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
FATAL:  the database system is in recovery mode

I hope that these messages helps you to address this problem.
I will also try to address this.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-14 Thread Robert Haas
On Tue, Apr 14, 2015 at 6:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 11 March 2015 at 20:55, Peter Eisentraut pete...@gmx.net wrote:
 I don't know how to move forward.  We could give users a knob: This
 might make your queries faster or not -- good luck.  But of course
 nobody will like that either.

 What is clear is that large SELECT queries are doing the work VACUUM
 should do. We should not be doing large background tasks (block
 cleanup) during long running foreground tasks. But there is no need
 for changing behaviour during small SELECTs. So the setting of 4 gives
 current behaviour for small SELECTs and new behaviour for larger
 SELECTs.

Peter commented previously that README.HOT should get an update.  The
relevant section seems to be When can/should we prune or
defragment?.

I wonder if it would be a useful heuristic to still prune pages if
those pages are already dirty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Robert Haas
On Tue, Apr 14, 2015 at 6:22 PM, Peter Geoghegan p...@heroku.com wrote:
 Why is that good?

We did discuss this before.  I've recapped some of what I believe to
be the most salient points below.

 I think that people were all too quick to dismiss the idea of a wall
 time interval playing some role here (at least as a defense against
 correlated references, as a correlated reference period). I suppose
 that that's because it doesn't fit with an intuition that says that
 that kind of interval ought to be derived algebraically - magic delay
 settings are considered suspect.

Yep, Tom gave that reason here:

http://www.postgresql.org/message-id/11258.1397673...@sss.pgh.pa.us

But there was also this point from Andres - gettimeofday is not free:

http://www.postgresql.org/message-id/20140416075307.gc3...@awork2.anarazel.de

And this point from me - this can degrade to random eviction under
high pressure:

http://www.postgresql.org/message-id/ca+tgmoayuxr55zueapp6d2xbyicjwacc9myyn5at4tindsj...@mail.gmail.com

You'll notice that my proposal avoids all three of those objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Amit Kapila
On Wed, Apr 15, 2015 at 2:55 AM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Apr 16, 2014 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Merlin Moncure mmonc...@gmail.com writes:
  Anyways, I'm still curious if you can post similar numbers basing the
  throttling on gross allocation counts instead of time.  Meaning: some
  number of buffer allocations has to have occurred before you consider
  eviction.  Besides being faster I think it's a better implementation:
  an intermittently loaded server will give more consistent behavior.
 
  Yeah --- I think wall-clock-based throttling is fundamentally the wrong
  thing anyway.  Are we going to start needing a CPU speed measurement to
  tune the algorithm with?  Not the place to be going.  But driving it off
  the number of allocations that've been done could be sensible.  (OTOH,
  that means you need a central counter, which itself would be a
  bottleneck.)

 So, I was thinking about this a little bit more today, prodded by my
 coworker John Gorman.  I'm wondering if we could drive this off of the
 clock sweep; that is, every time the clock sweep touches a buffer, its
 usage count goes down by one, but we also set two flag bits.  Every
 time the buffer gets touched thereafter, we check whether any flag
 bits are set; if so, we clear one and increase the usage count, else
 we do nothing.  So the usage count can increase at most twice per
 clock sweep.  The advantage of that is that, as with Peter's approach,
 it is harder for the usage count of a buffer to max out - to get
 there, you need sustained access over a longer period of time.  But
 it's not time-dependent, so replaying the same workload at twice the
 speed or half the speed on faster or slower hardware doesn't change
 the choice of which buffer to evict, which seems good.  And it will
 cause us to prefer buffers which are accessed repeatedly over a period
 of time rather than buffers that are accessed a bunch of times in
 quick succession and then not touched again for a while, which seems
 like a good bet.


IIUC, this will allow us to increase usage count only when the buffer
is touched by clocksweep to decrement the usage count.
I think such a solution will be good for the cases when many evictions
needs to be performed to satisfy the workload,  OTOH when there are
not too many evictions that needs to be done, in such a case some of
the buffers that are accessed much more will have equal probability to
get evicted as compare to buffers which are less accessed.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread Noah Misch
On Tue, Apr 14, 2015 at 05:29:36PM -0300, Alvaro Herrera wrote:
 David E. Wheeler wrote:
  On Apr 14, 2015, at 1:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
  wrote:
  
   Castoroides has 5.8.4.  Oops.
  
  WUT.
 
 Yeah, eh?  Anyway I don't think it matters much: just don't enable TAP
 tests on machines with obsolete Perl.  I think this is fine since 5.8's
 latest release is supported.

Solaris 10 ships Perl 5.8.4, and RHEL 5.11 ships Perl 5.8.8.  Therefore, Perl
installations lacking this File::Path feature will receive vendor support for
years to come.  Replacing the use of keep_root with rmtree+mkdir will add 2-10
lines of code, right?  Let's do that and not raise the system requirements.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Robert Haas
On Tue, Apr 14, 2015 at 7:10 PM, Greg Stark st...@mit.edu wrote:
 The way the clock sweep algorithm is meant to be thought about is that it's
 an approximate lru. Each usage count corresponds to an ntile of the lru. So
 we don't know which buffer is least recently used but it must be in the set
 of buffers with usage count 0 and that should be 1/nth of all the buffers.

Agreed.

 In order for that property to be maintained though the usage count for some
 buffer should be getting decremented every time we touch a buffer. That is,
 every time we promote one buffer to the most recently moved ntile we should
 be demoting some other buffer.

Agreed.  It's not easy to get this behavior exactly, though, because
the buffer you kick out necessarily has a usage count of 0 and the one
you bring in probably shouldn't.  And we don't wanna have to run the
clock sweep every time somebody touches a non-maximal usage count.
But I think it is still true that this is, to some degree, what our
algorithm is trying to approximate, and I also think it's pretty clear
that our current approximation isn't that great.

 The way our cache works we promote when a buffer is accessed but we only
 demote when a buffer is flushed. We flush a lot less often than we touch
 buffers so it's not surprising that the cache ends up full of buffers that
 are all in the most recently used section.

This isn't really correct.  We promote when it's accessed, but we
demote it when the clock sweep hand passes over it, which happens each
time we consider it for eviction.  It does not have to do with
flushing dirty date to disk, and it does not happen only when the
buffer is actually evicted.

 Now it's complicated by the fact that we aren't promoting buffers directly
 to the most recently used ntile. We're incrementing the usage count by one.
 That makes it more of a least frequently used list rather than a lru. I
 think that's a mistake but I recall some debate about that when it first
 went in.

Note that the discussion of 2Q, LRU(k), and perhaps others ask not
only how recently the page was used, but how frequently it was used.
The recency of the next-to-last access is often used as a proxy for
frequency.  Consider two buffers. One gets touched 5 times in a row,
once a day.  The other gets touched 5 times per day, at equal
intervals.  In general, the second buffer is a better choice to retain
than the first, even if it has been touched less recently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FPW compression leaks information

2015-04-14 Thread Fujii Masao
On Wed, Apr 15, 2015 at 11:55 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 11:10 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 12:00 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 On Tue, Apr 14, 2015 at 4:50 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 04/14/2015 05:42 AM, Robert Haas wrote:

 On Sun, Apr 12, 2015 at 8:38 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 As to RLS - yeah, that's where I think a lot of the possible covert

 channel attacks are.  But it doesn't have to be RLS per se.  It can
 be, for example, a table some of whose contents you expose via a
 security barrier view.  It can be a security-definer function that you
 call and it returns some data that you don't have rights to view
 directly.  Basically, it can be any table to which you have some
 access, but not complete access.  Then you can use timing attacks,
 scrutinize EXPLAIN plans for clues, look at CTIDs, and so on.

 Basically, I'm worried that it's going to be completely impractical to
 prevent a certain amount of information leakage when you have partial
 access to a table, and that in a largely-futile effort to try to
 prevent it, we'll end up making a whole bunch of things (like the WAL
 insert position) super-user only, and that this will in fact be a net
 reduction in security because it'll encourage people to use the
 superuser account more.

 Perhaps that concern is ill-founded, but that's what I'm worried about.


 I'm not a big fan of locking down WAL position information either. If we
 have to treat the current WAL position is security-sensitive information,
 we're doing something wrong.

 But I don't want to just give up either. One option is to make this
 controllable on a per-table basis, as Amit suggested. Then we could always
 disable it for pg_authid, add a suitable warning to the docs, and let the
 DBA enable/disable it for other tables. It's a bit of a cop-out, but would
 fix the immediate problem.


 I think it's a fairly narrow attack vector. As long as we document it
 clearly, and make it easy enough to turn it off, I think that's definitely
 enough. Most people will not care at all, I'm sure - but we need to cater to
 those that do.

 I think we could probably even get away with just documenting the risk and
 having people turn off the compression *completely* if they care about it,
 but if we can do it at a table level, that's obviously a lot better.

 +1

 OK. I am fine to implement anything required here if needed, meaning
 the following:
 1) Doc patch to mention that it is possible that compression can give
 hints to attackers when working on sensible fields that have a
 non-fixed size.

I think that this patch is enough as the first step.

 2) Switch at relation level to control wal_compression.

ALTER TABLE SET is not allowed on system catalog like pg_authid. So should we
change it so that a user can change the flag even on system catalog? I'm afraid
that the change might cause another problem, though. Probably we can disable
the compression on every system catalogs by default. But I can imagine that
someone wants to enable the compression even on system catalog. For example,
pg_largeobject may cause lots of FPW.

 This needs to
 change XLogRecordAssemble by adding some new logic to check if a
 relation is enforcing WAL compression or not. As a reloption, there
 are three possible values: true, false and fallback to system default.
 Also, I think that we should simply extend XLogRegisterBuffer() and
 pass to it the reloption flag that is then registered in
 registered_buffer, and XLogRecordAssemble() decides with this flag if
 block is compressed or not. Do we want to add this reloption switch to
 indexes as well?

Maybe.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Robert Haas
On Wed, Apr 15, 2015 at 12:15 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 IIUC, this will allow us to increase usage count only when the buffer
 is touched by clocksweep to decrement the usage count.

Yes.

 I think such a solution will be good for the cases when many evictions
 needs to be performed to satisfy the workload,  OTOH when there are
 not too many evictions that needs to be done, in such a case some of
 the buffers that are accessed much more will have equal probability to
 get evicted as compare to buffers which are less accessed.

Possibly, but I think it's even worse under the current algorithm.
Under this proposal, if we go for a long time without any buffer
evictions, every buffer usage's count will top out at 2 more than
wherever it was after the last clock sweep.   In the worst case, every
buffer (or most of them) could end up with the same usage count.  But
under the status quo, they'll all go to 5, which is an even bigger
loss of information, and which will make the first eviction much more
expensive than if they are all pegged at 2 or 3.

There could be ways to improve things further, such as by slowly
advancing the clock sweep even when no eviction is required.  But
that's a bit tricky to do, too.  You'd like (perhaps) to advance it
one step for every buffer allocation, but you don't want a centralized
counter, or unnecessary contention on the clock sweep when no eviction
is necessary in the first place.  There's probably some way to make it
work, though, if we put our mind to it.  I'm inclined to try the
simpler approach first and see what it buys.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2015-04-14 Thread Heikki Linnakangas

On 04/03/2015 05:28 AM, Abhijit Menon-Sen wrote:

At 2015-04-03 00:33:10 +0300, hlinn...@iki.fi wrote:


I came up with the attached.


I like it very much.

src/port/Makefile has (note src/srv):

 +# pg_crc32c_sse42.o and its _src.o version need CFLAGS_SSE42
 +pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_SSE42)
 +pg_crc32c_sse42_srv.o: CFLAGS+=$(CFLAGS_SSE42)

Other than that, this looks great. Thank you.


BTW, we might want to move the traditional and legacy crc32
implementations out of src/common. They are only used in backend
code now.


I agree.


Committed this now, after some further cleanup and reorganizing the 
autoconf stuff.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2015-04-14 Thread Robert Haas
On Tue, Apr 14, 2015 at 2:38 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 As I said, I'm still writing the first pieces of this so I'm not sure
 what other ramifications it will have.  If there are any thoughts, I
 would appreciate them.  (Particularly useful input on whether it is
 acceptable to omit lognums/physnums from _outRangeTblEntry.)

I think the general rule is that an RTE should not contain any
structure information about the underlying relation that can
potentially change: the OID is OK because it's immutable for any given
relation.  The relkind is not quite immutable because you can create a
_SELECT rule on a table and turn it into a view; I'm not sure how we
handle that, but it's a fairly minor exception anyhow.  Everything
else in the RTE, with the new and perhaps-unfortunate exception of
security quals, is stuff derived from what's in the query, not the
table.  I think it would be good for this to work the same way: the
structural information about the table should be found in the
relcache, not the RTE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2015-04-14 Thread Alvaro Herrera
I've been looking at this again.  It has become apparent to me that what
we're doing in parse analysis is wrong, and the underlying node
representation is wrong too.  Here's a different approach, which I hope
will give better fruits.  I'm still working on implementing the ideas
here (and figuring out what the fallout is).

Currently, the patch stores RangeTblEntry-eref-colnames in logical
order; and it also adds a map from logical colnums to attnum (called
lognums).  Now this is problematic for two reasons:

1. the lognums map becomes part of the stored representation of a rule;
any time you modified the logical ordering of a table underlying some
view, the view's _RETURN rule would have to be modified as well.  Not
good.

2. RTE-eref-colnames is in attlognum order and thus can only be sanely
interpreted if RTE-lognums is available, so not only lognums would have
to be modified, but colnames as well.

I think the solution to both these issues is to store colnames in attnum
ordering not logical, and *not* output RTE-lognums as part of
_outRangeTblEntry.  This means that every time we read the RTE for the
table we need to obtain lognums from its tupledesc.  RTE-eref-colnames
can then be sorted appropriately at plan time.

At RTE creation time (addRangeTableEntry and siblings) we can obtain
lognums and physnums.  Both these arrays are available for later
application in setrefs.c, avoiding the need of the obviously misplaced
relation_open() call we currently have there.

There is one gotcha, which is that expandTupleDesc (and, really,
everything from expandRTE downwards) will need to be split in two
somehow: one part needs to fill in the colnames array in attnum order,
and the other part needs to expand the attribute array into Var nodes in
logical order.

(If you recall, we need attphysnums at setrefs.c time so that we can
fix-up any TupleDesc created from a targetlist so that it contains the
proper attphysnum values.  The attphysnum values for each attribute do
not propagate properly there, and I believe this is the mechanism to do
so.)

As I said, I'm still writing the first pieces of this so I'm not sure
what other ramifications it will have.  If there are any thoughts, I
would appreciate them.  (Particularly useful input on whether it is
acceptable to omit lognums/physnums from _outRangeTblEntry.)

An alternative idea would be to add lognums and physnums to RelOptInfo
instead of RangeTblEntry (we would do so during get_relation_info).  I'm
not sure how this works for setrefs.c though, if at all; the advantage
is that RelOptInfo is not part of stored rules so we don't have to worry
about not saving them there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2015-04-14 Thread Alvaro Herrera
Jim Nasby wrote:
 On 4/14/15 5:49 AM, Etsuro Fujita wrote:
  postgres=# create foreign table ft1 (c1 int) server myserver options
  (table_name 't1');
  CREATE FOREIGN TABLE
  postgres=# create foreign table ft2 (c1 int) server myserver options
  (table_name 't2');
  CREATE FOREIGN TABLE
  postgres=# alter foreign table ft2 inherit ft1;
  ALTER FOREIGN TABLE
  postgres=# select * from ft1 for update;
  ERROR:  could not find junk tableoid1 column
  
  I think this is a bug.  Attached is a patch fixing this issue.
 
 What happens when the foreign side breaks the inheritance? Does the FDW
 somehow know to check that fact for each query?

This is a meaningless question.  The remote tables don't have to have an
inheritance relationship already; only the local side sees them as
connected.

I think the real question is whether we're now (I mean after this patch)
emitting useless tableoid columns that we didn't previously have.  I
think the answer is yes, and if so I think we need a smaller comb to fix
the problem.  This one seems rather large.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread David Fetter
On Tue, Apr 14, 2015 at 09:25:33AM -0700, David E. Wheeler wrote:
 On Apr 14, 2015, at 9:05 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 
  http://perldoc.perl.org/File/Path.html
  With this formulation:
  remove_tree($tempdir, {keep_root = 1});
  
  Does Perl 5.8 have this?
 
 Yes, it does.
 
   http://cpansearch.perl.org/src/NWCLARK/perl-5.8.9/lib/File/Path.pm

One way to avoid a lot of these questions would be to use
Test::MinimumVersion or similar in a git hook.  If we're promising
5.8.mumble, we should be able to keep that promise always rather than
search case by case, at least up to bugs in the version-checking
software.

http://search.cpan.org/~rjbs/Test-MinimumVersion-0.101081/lib/Test/MinimumVersion.pm

What say?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread Alvaro Herrera
David E. Wheeler wrote:
 On Apr 14, 2015, at 9:05 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 
  http://perldoc.perl.org/File/Path.html
  With this formulation:
  remove_tree($tempdir, {keep_root = 1});
  
  Does Perl 5.8 have this?
 
 Yes, it does.
 
   http://cpansearch.perl.org/src/NWCLARK/perl-5.8.9/lib/File/Path.pm

Uh.  5.8.9 does, but 5.8.8 does not.  Not sure if this is a problem.
Ancient buildfarm member pademelon (Tom's HPUX stuff) has 5.8.9 -- but
that was probably installed by hand.  Brolga (really ancient Cygwin
stuff) has 5.10.1.  As far as I can tell (perldoc perldelta5101) it was
upgraded to File::Path 2.07_03 which should be fine.  Spoonbill has
5.12.2; 5.12.0 already included the fixes in 5.10.1 so it should be fine
also.  Friarbird has 5.12.4.

Castoroides has 5.8.4.  Oops.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread Alvaro Herrera
David E. Wheeler wrote:
 On Apr 14, 2015, at 1:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 
  Castoroides has 5.8.4.  Oops.
 
 WUT.

Yeah, eh?  Anyway I don't think it matters much: just don't enable TAP
tests on machines with obsolete Perl.  I think this is fine since 5.8's
latest release is supported.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make more portable TAP tests of initdb

2015-04-14 Thread David E. Wheeler
On Apr 14, 2015, at 1:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Castoroides has 5.8.4.  Oops.

WUT.

smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Jim Nasby

On 4/14/15 5:22 PM, Peter Geoghegan wrote:

As long as we're doing random brainstorming, I'd suggest looking at
making clocksweep actually approximate LRU-K/LRU-2 (which, again, to
be clear, my prototype did not do). The clocksweep could maintain
statistics about the recency of the second-to-last access across all
buffers, and discriminate against buffers according to what bucket of
the population they fit in to. Not sure how aggressively we'd penalize
those buffers that had very old penultimate references (or credit
those that had very recent penultimate references), or what the bucket
partitioning scheme is, but that's probably where'd I'd take it next.
For example, buffers with a penultimate reference that is more than a
standard deviation below the mean would be double penalized (and maybe
the opposite, for those buffers with penultimate accesses a stddev
above the mean). If that didn't work so well, then I'd look into an
ARC style recency and frequency list (while remembering things about
already evicted blocks, which LRU-K does not doalthough that paper
is from the early 1990s).


Along the lines of brainstorming... why do we even allow usage_count  
1? Clocksweep was used pretty successfully by at least FreeBSD, but they 
simply used a bit to indicate recently used. Anything that wasn't 
recently used moved from the active pull to the inactive pool (which 
tended to be far larger than the active pool with decent amounts of 
memory), and a small number of buffers were keep on the 'free' list by 
pulling them out of the inactive pool and writing them if they were 
dirty. All of this was done on an LRU basis.


Given how common it is for the vast bulk of shared_buffers in an install 
to be stuck at 5, I'd think the first thing we should try is a 
combination of greatly reducing the max for usage_count (maybe to 2 
instead of 1 to simulate 2 pools), and running the clock sweep a lot 
more aggressively in a background process.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-14 Thread Simon Riggs
On 11 March 2015 at 20:55, Peter Eisentraut pete...@gmx.net wrote:

 I don't know how to move forward.  We could give users a knob: This
 might make your queries faster or not -- good luck.  But of course
 nobody will like that either.

What is clear is that large SELECT queries are doing the work VACUUM
should do. We should not be doing large background tasks (block
cleanup) during long running foreground tasks. But there is no need
for changing behaviour during small SELECTs. So the setting of 4 gives
current behaviour for small SELECTs and new behaviour for larger
SELECTs.

The OP said this...
op
We also make SELECT clean up blocks as it goes. That is useful in OLTP
workloads, but it means that large SQL queries and pg_dump effectively
do much the same work as VACUUM, generating huge amounts of I/O and
WAL on the master, the cost and annoyance of which is experienced
directly by the user. That is avoided on standbys.

Effects of that are that long running statements often run much longer
than we want, increasing bloat as a result. It also produces wildly
varying response times, depending upon extent of cleanup required.
/op

This is not a performance patch. This is about one user doing the
cleanup work for another. People running large SELECTs should not be
penalised. The patch has been shown to avoid that and no further
discussion should be required.

I don't really care whether we have a parameter for this or not. As
long as we have the main feature.

It's trivial to add/remove a parameter to control this. Currently
there isn't one.

I'd like to commit this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


hot_disable.v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Peter Geoghegan
On Tue, Apr 14, 2015 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:
 So, I was thinking about this a little bit more today, prodded by my
 coworker John Gorman.  I'm wondering if we could drive this off of the
 clock sweep; that is, every time the clock sweep touches a buffer, its
 usage count goes down by one, but we also set two flag bits.  Every
 time the buffer gets touched thereafter, we check whether any flag
 bits are set; if so, we clear one and increase the usage count, else
 we do nothing.  So the usage count can increase at most twice per
 clock sweep.  The advantage of that is that, as with Peter's approach,
 it is harder for the usage count of a buffer to max out - to get
 there, you need sustained access over a longer period of time.  But
 it's not time-dependent, so replaying the same workload at twice the
 speed or half the speed on faster or slower hardware doesn't change
 the choice of which buffer to evict, which seems good.

Why is that good?

My little prototype basically implemented the LRU-K approach to
preventing correlated references (which tpc-b has some of, in a really
obvious way - not sure how significant that is). This would have
accidentally made it weight frequency better, but the prototype did
not pretend to actually implement something like LRU-K (which is not
the state of the art in any case), because it did not consider the
recency of the second-to-last access of the buffer at all.

The prototype's performance started off well, but regressed in later
pgbench-collector iterations (due to the influence of VACUUM, I
think). If I wanted to cheat, I could have only done one large 45
minute run, which would have made the prototype look much better
still.

 And it will
 cause us to prefer buffers which are accessed repeatedly over a period
 of time rather than buffers that are accessed a bunch of times in
 quick succession and then not touched again for a while, which seems
 like a good bet.

I think that people were all too quick to dismiss the idea of a wall
time interval playing some role here (at least as a defense against
correlated references, as a correlated reference period). I suppose
that that's because it doesn't fit with an intuition that says that
that kind of interval ought to be derived algebraically - magic delay
settings are considered suspect. However, the same can be said of the
Five-minute rule, which is a highly influential rule of thumb that
Jim Gray came up with. The Five minute rule is now obsolete, but that
took a long time, and the fundamental observation still applies
(Wikipedia says it depends on what type of disks you have these days,
but the fact remains that rule of thumbs like this can be more or less
robust).

I think that correlated reference type delays *are* used effectively
in real world systems without it being fragile/overly workload
dependent.

 I can't convince myself that this fully solves the (currently
 existing) problem of usage counts increasing too fast.  In theory,
 every time the clock sweep hand advances, it decrements the usage
 count of one buffer by one, but also makes it possible for the usage
 count of that buffer to increase by up to two (or by only one if the
 buffer previously had a usage count of five).  So with the right
 access pattern, it seems like you could still get to a place where a
 typical buffer eviction has to do a lot of scanning before it finds a
 victim buffer.  As with the present algorithm, we'd be most likely to
 have that problem when the buffer hit ratio is very high, so that
 there are many opportunities for usage counts to go up between each
 opportunity to push usage counts back down.  But it seems like it
 would at least limit the damage.

 I haven't tried this or anything, so this is just random brainstorming.

That's what it'll take, I think -- random brainstorming, and crude
prototypes to test theories.

As long as we're doing random brainstorming, I'd suggest looking at
making clocksweep actually approximate LRU-K/LRU-2 (which, again, to
be clear, my prototype did not do). The clocksweep could maintain
statistics about the recency of the second-to-last access across all
buffers, and discriminate against buffers according to what bucket of
the population they fit in to. Not sure how aggressively we'd penalize
those buffers that had very old penultimate references (or credit
those that had very recent penultimate references), or what the bucket
partitioning scheme is, but that's probably where'd I'd take it next.
For example, buffers with a penultimate reference that is more than a
standard deviation below the mean would be double penalized (and maybe
the opposite, for those buffers with penultimate accesses a stddev
above the mean). If that didn't work so well, then I'd look into an
ARC style recency and frequency list (while remembering things about
already evicted blocks, which LRU-K does not doalthough that paper
is from the early 1990s).

There are approaches to relieving lock 

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Robert Haas
On Wed, Apr 16, 2014 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 Anyways, I'm still curious if you can post similar numbers basing the
 throttling on gross allocation counts instead of time.  Meaning: some
 number of buffer allocations has to have occurred before you consider
 eviction.  Besides being faster I think it's a better implementation:
 an intermittently loaded server will give more consistent behavior.

 Yeah --- I think wall-clock-based throttling is fundamentally the wrong
 thing anyway.  Are we going to start needing a CPU speed measurement to
 tune the algorithm with?  Not the place to be going.  But driving it off
 the number of allocations that've been done could be sensible.  (OTOH,
 that means you need a central counter, which itself would be a
 bottleneck.)

So, I was thinking about this a little bit more today, prodded by my
coworker John Gorman.  I'm wondering if we could drive this off of the
clock sweep; that is, every time the clock sweep touches a buffer, its
usage count goes down by one, but we also set two flag bits.  Every
time the buffer gets touched thereafter, we check whether any flag
bits are set; if so, we clear one and increase the usage count, else
we do nothing.  So the usage count can increase at most twice per
clock sweep.  The advantage of that is that, as with Peter's approach,
it is harder for the usage count of a buffer to max out - to get
there, you need sustained access over a longer period of time.  But
it's not time-dependent, so replaying the same workload at twice the
speed or half the speed on faster or slower hardware doesn't change
the choice of which buffer to evict, which seems good.  And it will
cause us to prefer buffers which are accessed repeatedly over a period
of time rather than buffers that are accessed a bunch of times in
quick succession and then not touched again for a while, which seems
like a good bet.

I can't convince myself that this fully solves the (currently
existing) problem of usage counts increasing too fast.  In theory,
every time the clock sweep hand advances, it decrements the usage
count of one buffer by one, but also makes it possible for the usage
count of that buffer to increase by up to two (or by only one if the
buffer previously had a usage count of five).  So with the right
access pattern, it seems like you could still get to a place where a
typical buffer eviction has to do a lot of scanning before it finds a
victim buffer.  As with the present algorithm, we'd be most likely to
have that problem when the buffer hit ratio is very high, so that
there are many opportunities for usage counts to go up between each
opportunity to push usage counts back down.  But it seems like it
would at least limit the damage.

I haven't tried this or anything, so this is just random brainstorming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2015-04-14 Thread Greg Stark
I've been meaning to write this since PGConf and now isn't a great time
since I'm on my phone but I think it's time.

The way the clock sweep algorithm is meant to be thought about is that it's
an approximate lru. Each usage count corresponds to an ntile of the lru. So
we don't know which buffer is least recently used but it must be in the set
of buffers with usage count 0 and that should be 1/nth of all the buffers.

In order for that property to be maintained though the usage count for some
buffer should be getting decremented every time we touch a buffer. That is,
every time we promote one buffer to the most recently moved ntile we should
be demoting some other buffer.

The way our cache works we promote when a buffer is accessed but we only
demote when a buffer is flushed. We flush a lot less often than we touch
buffers so it's not surprising that the cache ends up full of buffers that
are all in the most recently used section.

Now it's complicated by the fact that we aren't promoting buffers directly
to the most recently used ntile. We're incrementing the usage count by one.
That makes it more of a least frequently used list rather than a lru. I
think that's a mistake but I recall some debate about that when it first
went in.


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-14 Thread David Steele
On 4/14/15 6:07 PM, Simon Riggs wrote:
 On 11 March 2015 at 20:55, Peter Eisentraut pete...@gmx.net wrote:
 
 I don't know how to move forward.  We could give users a knob: This
 might make your queries faster or not -- good luck.  But of course
 nobody will like that either.
 
 What is clear is that large SELECT queries are doing the work VACUUM
 should do. We should not be doing large background tasks (block
 cleanup) during long running foreground tasks. But there is no need
 for changing behaviour during small SELECTs. So the setting of 4 gives
 current behaviour for small SELECTs and new behaviour for larger
 SELECTs.
 
 The OP said this...
 op
 We also make SELECT clean up blocks as it goes. That is useful in OLTP
 workloads, but it means that large SQL queries and pg_dump effectively
 do much the same work as VACUUM, generating huge amounts of I/O and
 WAL on the master, the cost and annoyance of which is experienced
 directly by the user. That is avoided on standbys.
 
 Effects of that are that long running statements often run much longer
 than we want, increasing bloat as a result. It also produces wildly
 varying response times, depending upon extent of cleanup required.
 /op
 
 This is not a performance patch. This is about one user doing the
 cleanup work for another. People running large SELECTs should not be
 penalised. The patch has been shown to avoid that and no further
 discussion should be required.
 
 I don't really care whether we have a parameter for this or not. As
 long as we have the main feature.
 
 It's trivial to add/remove a parameter to control this. Currently
 there isn't one.
 
 I'd like to commit this.

+1 from me.  One of the last databases I worked on had big raw
partitions that were written to and then sequentially scanned exactly
once before being dropped.  It was painful to see all those writes
happening for nothing.

In other cases there were sequential scans that happened directly after
the main writes, but then the next read might be days in the future (if
ever) and the system was basically idle for a while which would have
allowed vacuum to come in and do the job without affecting performance
of the main job.

I think that in batch-oriented databases this patch will definitely be a
boon to performance.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-14 Thread Tatsuo Ishii
This patch does not apply cleanly due to the moving of pgbench (patch
to filelist.sgml failed).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

 Attached is the v7 pg_audit patch.
 
 I've tried to address Peter's documentation concerns by cleaning up the
 terminology and adding a real-world case plus usage recommendations.
 The word auditing has been expunged from the docs in favor of the term
 audit logging.
 
 Per Simon's request, there is now a pg_audit.log_relation setting that
 makes session audit logging exhaustively log all relations as it did
 before.  The ROLE logging class is back as well.
 
 Simon also suggested a way that pg_audit could be tested with standard
 regression so I have converted all tests over and removed test.pl.
 
 Sawada, I'd certainly appreciate it if you'd try again and see if you
 are still getting a segfault with your test code (which you can find in
 the regression tests).
 
 Currently the patch will compile on master (I tested with b22a36a) or
 optionally with Alvaro's deparse patches applied (only 0001  0002
 needed).  I've supplied a different regression test out file
 (expected/pg_audit-deparse.out) which can be copied over the standard
 out file (expected/pg_audit.out) if you'd like to do regression on
 pg_audit with deparse.  The small section of code that calls
 pg_event_trigger_ddl_commands() can be compiled by defining DEPARSE or
 removed the #ifdefs around that block.
 
 Please let me know if I've missed anything and I look forward to
 comments and questions.
 
 Thanks,
 -- 
 - David Steele
 da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers