date:20150512

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Pavel Stehule

2015-05-13 7:50 GMT+02:00 Jim Nasby :

> On 5/12/15 11:32 PM, Pavel Stehule wrote:
>
>> I would not to store state on this level - so "at" should be
>> implemented on higher level. There is very high number of
>> possible strategies, what can be done with failed tasks - and I
>> would not to open this topic. I believe with proposed scheduler,
>> anybody can simply implement what need in PLpgSQL with dynamic
>> SQL. But on second hand "run once" can be implemented with
>> proposed API too.
>>
>>
>> That seems reasonable in a v1, so long as there's room to easily
>> extend it without pain to add "at"-like one-shot commands,
>> at-startup commands, etc.
>>
>
> Yeah, being able to run things after certain system events would be nice.
>
>  I'd prefer to see a scheduling interface that's a close match for
>> cron's or that leaves room for it - so things like "*/5" for every
>> five minutes, ranges like "Mon-Fri", etc. If there's a way to
>> express similar capabilities more cleanly using PostgreSQL's types
>> and conventions that makes sense, but I'm not sure a composite type
>> of arrays fits that.
>>
>
> It seems unfortunate to go with cron's limited syntax when we have such
> fully capable timestamp and interval capabilities already in the database.
> :/
>

It is next option - MySQL event scheduler use it. The usage is trivial -
but it is little bit weak - it is hard to describe some asymmetric events -
like run in working days only - but if I use named parameters and axillary
constructor function I am thinking so it can be supported too.

make_scheduled_time(at => '2015-014-05 10:00:0', repeat => '1day',
stop_after => '...')

>
> Is there anything worth stealing from pgAgent?
>

Surely not - although I have little bit different goals - pgAgent is top
end scheduler - little bit complex due support jobs/steps. My target is
implementation of low end scheduler. pgAgent and others can be implemented
as next layer. It should be strong enough for some simple admin tasks, and
strong enough for base for implementation some complex scheduler and
workflow systems - but it should be simple as possible. In this moment
PLpgSQL is strong enough for implementation very complex workflow system -
but missing the low end scheduling functionality.

>  I though about it too - but the parser for this cron time will be longer
>> than all other code probably. I see a possibility to write constructors
>> that simplify creating a value of this type. Some like
>>
>> make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
>> make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);
>>
>
> Wouldn't that be just as bad as writing the parser in the first place?
>

yes - I am thinking about special type, where input function will be empty
and value has to be created with constructor function - it can simplify
parser lot.

>
>  1. don't hold a states, results of commands
>>
> ...
> > 3. When command fails, it writes info to log only
> Unfortunate, but understandable in a first pass.
>
>  4. When command runs too long (over specified timeout), it is killed.
>>
>
> I think that needs to be optional.
>

you can specify timeout for any command - so if you specify timeout 0, then
it will run without timeout.

>
>  5. When command waits to free worker, write to log
>> 6. When command was not be executed due missing workers (and max_workers
>>  > 0), write to log
>>
>
> Also unfortunate. We already don't provide enough monitoring capability
> and this just makes that worse.
>

theoretically it can be supported some pg_stat_ view - but I would not to
implement a some history table for commands. Again it is task for higher
layers.

>
> Perhaps it would be better to put something into PGXN first; this doesn't
> really feel like it's baked enough for contrib yet. (And I say that as
> someone who's really wanted this ability in the past...)
>

It is plan B. I am thinking so PostgreSQL missing some lowend scheduler so
I am asking here. Some features can be implemented later, some features can
be implemented elsewhere. I have to specify limit, borders, what is simple
scheduler, and what is not. The full functionality scheduler is relatively
heavy application - so it should not be a contrib module. But simple
generic scheduler can be good enough for 50% and with some simple plpgsql
code for other 40%

Regards

Pavel

> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-12 Thread Shigeru Hanada

2015-05-12 10:24 GMT+09:00 Kouhei Kaigai :
> option-2)
> Tom's suggestion. Add a new list field of Path nodes on CustomPath,
> then create_customscan_plan() will call static create_plan_recurse()
> function to construct child plan nodes.
> Probably, the attached patch will be an image of this enhancement,
> but not tested yet, of course. Once we adopt this approach, I'll
> adjust my PG-Strom code towards the new interface within 2 weeks
> and report problems if any.

+1.  This design achieves the functionality required for custom join
by Kaigai-san's use case, instantiating sub-plans of CustomScan plan
recursively, without exposing create_plan_recurse.  CSP can use the
index number to distinguish information, like FDWs do with
fdw_private.

IMO it isn't necessary to apply the change onto
ForeignPath/ForeignScan.  FDW would have a way to keep-and-use sub
paths as private information.  In fact, I wrote postgres_fdw to use
create_plan_recurse to generate SQL statements of inner/outer
relations to be joined, but as of now SQL construction for join
push-down is accomplished by calling private function deparseSelectSql
recursively at the top of a join tree.

Some FDW might hope to use sub-plan generation, but we don't have any
concrete use case as of now.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Jim Nasby

On 5/12/15 11:32 PM, Pavel Stehule wrote:

I would not to store state on this level - so "at" should be
implemented on higher level. There is very high number of
possible strategies, what can be done with failed tasks - and I
would not to open this topic. I believe with proposed scheduler,
anybody can simply implement what need in PLpgSQL with dynamic
SQL. But on second hand "run once" can be implemented with
proposed API too.

That seems reasonable in a v1, so long as there's room to easily
extend it without pain to add "at"-like one-shot commands,
at-startup commands, etc.

Yeah, being able to run things after certain system events would be nice.

I'd prefer to see a scheduling interface that's a close match for
cron's or that leaves room for it - so things like "*/5" for every
five minutes, ranges like "Mon-Fri", etc. If there's a way to
express similar capabilities more cleanly using PostgreSQL's types
and conventions that makes sense, but I'm not sure a composite type
of arrays fits that.

It seems unfortunate to go with cron's limited syntax when we have such 
fully capable timestamp and interval capabilities already in the 
database. :/

Is there anything worth stealing from pgAgent?

I though about it too - but the parser for this cron time will be longer
than all other code probably. I see a possibility to write constructors
that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

Wouldn't that be just as bad as writing the parser in the first place?

1. don't hold a states, results of commands

...
> 3. When command fails, it writes info to log only
Unfortunate, but understandable in a first pass.

4. When command runs too long (over specified timeout), it is killed.

I think that needs to be optional.

5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and max_workers
 > 0), write to log

Also unfortunate. We already don't provide enough monitoring capability 
and this just makes that worse.

Perhaps it would be better to put something into PGXN first; this 
doesn't really feel like it's baked enough for contrib yet. (And I say 
that as someone who's really wanted this ability in the past...)

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Pavel Stehule

2015-05-13 4:08 GMT+02:00 Craig Ringer :

>
>
> On 13 May 2015 at 00:31, Pavel Stehule  wrote:
>
>>
>>
>> 2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski :
>>
>>> On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
>>> > create type scheduled_time as (second int[], minute int[], hour int[],
>>> dow
>>> > int[], month int[]);
>>> >  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>>> >  (,"{5}",,,) .. run once per hour
>>> > Comments, notices?
>>>
>>> First, please note that I'm definitely not a hacker, just a user.
>>>
>>> One comment that I'd like to make, is that since we're at planning
>>> phase, I think it would be great to add capability to limit number of
>>> executions of given command.
>>> This would allow running things like "at" in unix - run once, at given
>>> time, and that's it.
>>>
>>
>> I would not to store state on this level - so "at" should be implemented
>> on higher level. There is very high number of possible strategies, what can
>> be done with failed tasks - and I would not to open this topic. I believe
>> with proposed scheduler, anybody can simply implement what need in PLpgSQL
>> with dynamic SQL. But on second hand "run once" can be implemented with
>> proposed API too.
>>
>
> That seems reasonable in a v1, so long as there's room to easily extend it
> without pain to add "at"-like one-shot commands, at-startup commands, etc.
>
> I'd prefer to see a scheduling interface that's a close match for cron's
> or that leaves room for it - so things like "*/5" for every five minutes,
> ranges like "Mon-Fri", etc. If there's a way to express similar
> capabilities more cleanly using PostgreSQL's types and conventions that
> makes sense, but I'm not sure a composite type of arrays fits that.
>

I though about it too - but the parser for this cron time will be longer
than all other code probably. I see a possibility to write constructors
that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

There are two possible ways - composite with arrays or custom composite.
I'll decide later.

There are basic points:

1. don't hold a states, results of commands
2. It execute task immediately in related time window once (from start to
next start), when necessary worker is available
3. When command fails, it writes info to log only
4. When command runs too long (over specified timeout), it is killed.
5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and max_workers >
0), write to log


> How do you plan to manage the bgworkers?
>

I am thinking about one static supervisor, that will hold a calendar in
shared memory, that will start dynamic bgworkers for commands per database.
The scheduler is enabled in all databases, where the proposed extension is
installed.

For working with prototype I am planning to use SPI, but maybe it is not
necessary - so commands like VACUUM, CREATE DATABASE, DROP DATABASE can be
supported too. But I didn't tested it and I don't know if it is possible or
not. It can define new hooks too. So some other extensions can be based on
it.


>
>
> In BDR, where we have a similar need to have workers across multiple
> databases, and where each database contains a list of workers to launch, we
> have:
>
> * A single static "supervisor" bgworker. In 9.5 this will connect with
> InvalidOid as the target database so it can only access shared catalogs. In
> 9.4 this isn't possible in the bgworker API so we have to connect to a
> dummy database.
>
> * A dynamic background worker for each database in which BDR is enabled,
> which is launched from the supervisor. We check which DBs are BDR-enabled
> by (ab)using database security labels and checking pg_shseclabel from the
> supervisor worker so we only launch bgworkers on BDR-enabled DBs.
>
> * A dynamic background worker for each peer node, launched by the
> per-database worker based on the contents of that database's
> bdr.bdr_connections table.
>
>
> What I suspect you're going to want is:
>
> * A static worker launched by your extension when it starts, which
> launches per-db workers for each DB in which the scheduler is enabled. You
> could use a GUC listing scheduler-enabled DBs in postgresql.conf and have
> an on-reload hook to update it, you don't need to do the security label
> hack.
>
> * A DB scheduler worker, which looks up the scheduled tasks list, finds
> the next scheduled event, and sleeps on a long latch timeout until then,
> resetting it when interrupted. When it reaches the scheduled event it would
> launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL
> procedure over the SPI.
>
> * A task runner worker, which gets launched by the db scheduler to
> actually run a task using the SPI.
>
>
> Does that match your current thinking?
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Develop

Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-12 Thread Alvaro Herrera

Craig Ringer wrote:
> Hi all
> 
> For some time I've wanted a way to "SET SESSION AUTHORISATION" or "SET
> ROLE" in a way that cannot simply be RESET, so that a connection may be
> handed to a less-trusted service or application to do some work with.

Some years back, I checked the SQL standard for insight on how they
handle this stuff (courtesy of Jim Nasby IIRC).  It took me a while to
figure out that the way they do it is not to have a RESET command in the
first place!  In their model, you enter a secure execution context (for
example, an SQL function) by calling SET SESSION AUTHORIZATION; and once
there, the only way to revert to the original session authorization is
to exit the execution context -- and once that happens, the "attacker"
no longer has control.  Since they have reduced privileges, they can't
call SET SESSION AUTHORIZATION themselves to elevate their access.  In
this model, you're automatically protected.

I mentioned this in some developer meeting; got blank stares back, IIRC.
I mentioned it to Stephen in hallway track, and as I recall he was in
agreement with what I was proposing.  Biggest problem is, I can't recall
in detail what it was.

Not sure this helps you any ...  Chilean $2 which are probably not worth
much [currently], I guess.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-12 Thread Abhijit Menon-Sen

At 2015-05-13 03:04:11 +0200, and...@anarazel.de wrote:
>
> I can live with that, although I'd personally go with
> pgstattuple_approx()...

I could live with that too. Here's an incremental patch to rename the
function. (I'm not resubmitting the whole thing since you said you've
made some other changes too.)

Thank you.

-- Abhijit
>From 57d597f176294ebfe30efa6d73569505cbb41e31 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 13 May 2015 09:19:07 +0530
Subject: Rename pgstatapprox to pgstattuple_approx

---
 contrib/pgstattuple/pgstatapprox.c|  4 ++--
 contrib/pgstattuple/pgstattuple--1.2--1.3.sql |  4 ++--
 contrib/pgstattuple/pgstattuple--1.3.sql  |  4 ++--
 doc/src/sgml/pgstattuple.sgml | 12 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 46f5bc0..f3b5671 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -21,7 +21,7 @@
 #include "utils/tqual.h"
 #include "commands/vacuum.h"
 
-PG_FUNCTION_INFO_V1(pgstatapprox);
+PG_FUNCTION_INFO_V1(pgstattuple_approx);
 
 typedef struct output_type
 {
@@ -234,7 +234,7 @@ build_tuple(output_type *stat, FunctionCallInfo fcinfo)
  */
 
 Datum
-pgstatapprox(PG_FUNCTION_ARGS)
+pgstattuple_approx(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
diff --git a/contrib/pgstattuple/pgstattuple--1.2--1.3.sql b/contrib/pgstattuple/pgstattuple--1.2--1.3.sql
index bfcf464..99301a2 100644
--- a/contrib/pgstattuple/pgstattuple--1.2--1.3.sql
+++ b/contrib/pgstattuple/pgstattuple--1.2--1.3.sql
@@ -3,7 +3,7 @@
 -- complain if script is sourced in psql, rather than via ALTER EXTENSION
 \echo Use "ALTER EXTENSION pgstattuple UPDATE TO '1.3'" to load this file. \quit
 
-CREATE FUNCTION pgstatapprox(IN reloid regclass,
+CREATE FUNCTION pgstattuple_approx(IN reloid regclass,
 OUT table_len BIGINT,   -- physical table length in bytes
 OUT scanned_percent FLOAT8, -- what percentage of the table's pages was scanned
 OUT approx_tuple_count BIGINT,  -- estimated number of live tuples
@@ -14,5 +14,5 @@ CREATE FUNCTION pgstatapprox(IN reloid regclass,
 OUT dead_tuple_percent FLOAT8,  -- dead tuples in % (based on estimate)
 OUT approx_free_space BIGINT,   -- estimated free space in bytes
 OUT approx_free_percent FLOAT8) -- free space in % (based on estimate)
-AS 'MODULE_PATHNAME', 'pgstatapprox'
+AS 'MODULE_PATHNAME', 'pgstattuple_approx'
 LANGUAGE C STRICT;
diff --git a/contrib/pgstattuple/pgstattuple--1.3.sql b/contrib/pgstattuple/pgstattuple--1.3.sql
index f69b619..f3996e7 100644
--- a/contrib/pgstattuple/pgstattuple--1.3.sql
+++ b/contrib/pgstattuple/pgstattuple--1.3.sql
@@ -80,7 +80,7 @@ LANGUAGE C STRICT;
 
 /* New stuff in 1.3 begins here */
 
-CREATE FUNCTION pgstatapprox(IN reloid regclass,
+CREATE FUNCTION pgstattuple_approx(IN reloid regclass,
 OUT table_len BIGINT,   -- physical table length in bytes
 OUT scanned_percent FLOAT8, -- what percentage of the table's pages was scanned
 OUT approx_tuple_count BIGINT,  -- estimated number of live tuples
@@ -91,5 +91,5 @@ CREATE FUNCTION pgstatapprox(IN reloid regclass,
 OUT dead_tuple_percent FLOAT8,  -- dead tuples in % (based on estimate)
 OUT approx_free_space BIGINT,   -- estimated free space in bytes
 OUT approx_free_percent FLOAT8) -- free space in % (based on estimate)
-AS 'MODULE_PATHNAME', 'pgstatapprox'
+AS 'MODULE_PATHNAME', 'pgstattuple_approx'
 LANGUAGE C STRICT;
diff --git a/doc/src/sgml/pgstattuple.sgml b/doc/src/sgml/pgstattuple.sgml
index 4cba006..cca6dc9 100644
--- a/doc/src/sgml/pgstattuple.sgml
+++ b/doc/src/sgml/pgstattuple.sgml
@@ -361,19 +361,19 @@ pending_tuples | 0

 
  
-  pgstatapprox
+  pgstattuple_approx
  
- pgstatapprox(regclass) returns record
+ pgstattuple_approx(regclass) returns record
 
 
 
  
-  pgstatapprox is a faster alternative to
+  pgstattuple_approx is a faster alternative to
   pgstattuple that returns approximate results.
   The argument is the target relation's OID.
   For example:
 
-test=> SELECT * FROM pgstatapprox('pg_catalog.pg_proc'::regclass);
+test=> SELECT * FROM pgstattuple_approx('pg_catalog.pg_proc'::regclass);
 -[ RECORD 1 ]+---
 table_len| 573440
 scanned_percent  | 2
@@ -392,7 +392,7 @@ approx_free_percent  | 2.09
  
  Whereas pgstattuple always performs a
  full-table scan and returns an exact count of live and dead tuples
- (and their sizes) and free space, pgstatapprox
+ (and their sizes) and free space, pgstattuple_approx
  tries to avoid the full-table scan and returns exact dead tuple
  statistics along with an approximation of the number and
  size of live tuples and free space.
@@ -416,7 +416,7 @@ approx_free_percent  | 2.09

Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_basebackup -F t now succeeds with a long symlink target

2015-05-12 Thread Amit Kapila

On Tue, May 12, 2015 at 10:51 PM, Alvaro Herrera 
wrote:
>
>
> You have a memory error in this patch -- clobbered by pfree:
>
> > Details
> > ---
> >
http://git.postgresql.org/pg/commitdiff/97e0aa697983cf7f7f79e69f2dc248fdefb7dbf6
>   here
>

If we want to support long symlink targets, then shouldn't we
first tackle the changes done in commit-23a78352 ?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-05-12 Thread Etsuro Fujita


On 2015/05/13 0:55, Stephen Frost wrote:

Etsuro,

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:

Here is an updated version.  In this version, the bug has been
fixed, but any regression tests for that hasn't been added, because
I'm not sure that the above way is a good idea and don't have any
other ideas.

The EXPLAIN output has also been improved as discussed in [1].


While the EXPLAIN output changed, the structure hasn't really changed
from what was discussed previously and there's not been any real
involvment from the core code in what's happening here.

Clearly, the documentation around how to use the FDW API hasn't changed
at all and there's been no additions to it for handling bulk work.
Everything here continues to be done inside of postgres_fdw, which
essentially ignores the prescribed "Update/Delete one tuple" interface
for ExecForeignUpdate/ExecForeignDelete.

I've spent the better part of the past two days trying to reason my way
around that while reviewing this patch and I haven't come out the other
side any happier with this approach than I was back in
20140911153049.gc16...@tamriel.snowman.net.

There are other things that don't look right to me, such as what's going
on at the bottom of push_update_down(), but I don't think there's much
point going into it until we figure out what the core FDW API here
should look like.  It might not be all that far from what we have now,
but I don't think we can just ignore the existing, documented, API.


OK, I'll try to introduce the core FDW API for this (and make changes to 
the core code) to address your previous comments.


Thanks for taking the time to review the patch!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Roles (was: Additional role attributes)

2015-05-12 Thread Stephen Frost

All,

This patch gets smaller and smaller.

Upon reflection I realized that, with default roles, it's entirely
unnecssary to change how the permission checks happen today- we can
simply add checks to them to be looking at role membership also.  That's
removed the last of my concerns regarding any API breakage for existing
use-cases and has greatly simplified things overall.

This does change the XLOG functions to require pg_monitor, as discussed
on the other thread where it was pointed out by Heikki that the XLOG
location information could be used to extract sensitive information
based on what happens during compression.  Adding docs explaining that
is on my to-do list for tomorrow.

* Stephen Frost (sfr...@snowman.net) wrote:
> Andres suggested that we drop the REPLICATION role attribute and just
> use membership in pg_replication instead.  That's turned out quite
> fantastically as we can now handle upgrades without breaking anything-
> CREATE ROLE and ALTER ROLE still accept the attribute but simply grant
> pg_replication to the role instead, and postinit.c has been changed to
> check role membership similar to other pg_hba role membership checks
> when a replication connection comes in.  Hat's off to Andres for his
> suggestion.

It's also unnecessary to change how the REPLICATION role attribute
functions today.  This patch does add the pg_replication role, but it's
only allowed to execute the various pg_logical and friends functions and
not to actually connect as a REPLICATION user.  Connecting as a
REPLICATION user allows you to stream the entire contents of the
backend, after all, so it makes sense to have that be independent.

I added another default role which allows the user to view
pg_show_file_settings, as that seemed useful to me.  The diffstat for
that being something like 4 additions without docs and maybe 10 with.
More documentation would probably be good though and I'll look at adding
to it.

Most of the rest of what I've done has simply been reverting back to
what we had.  The patch is certainly far easier to verify by reading
through it now, as the changes are right next to each other, and the
regression output changes are much smaller.

Thoughts?  Comments?  Suggestions?

Thanks!

Stephen
From 698a74ea2b627bf1a75babe177817da8dfcae464 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Thu, 7 May 2015 23:35:03 -0400
Subject: [PATCH] Create default roles for administrative functions

To reduce the number of users on a system who are superusers,
create a set of roles by default during initdb which are allowed to run
certain functions that allow non-superusers to perform specific
administrative tasks and have access to privileged information.

The prefix "pg_" is reserved for default system roles, similar to
schemas and tablespaces.  pg_upgrade is modified to check for any roles
which start with "pg_" and complain if they exist.  pg_dumpall is
modified to not dump out roles starting with "pg_" on 9.5-and-above
systems.  CreateRole is modified to refuse creation of roles which start
with "pg_", similar to CreateSchema.

Roles created are: pg_backup, pg_monitor, pg_replay, pg_replication,
pg_rotate_logfile, pg_signal_backend, pg_file_settings and pg_admin.

The XLOG location information functions now requires the pg_monitor role
as the compression rate can be used to derive sensitive information.
---
 contrib/test_decoding/expected/permissions.out |  8 +--
 doc/src/sgml/catalogs.sgml |  9 +++
 doc/src/sgml/user-manag.sgml   | 91 ++
 src/backend/access/transam/xlogfuncs.c | 50 ++
 src/backend/catalog/catalog.c  |  5 +-
 src/backend/catalog/system_views.sql   |  6 +-
 src/backend/commands/user.c| 13 +++-
 src/backend/replication/logical/logicalfuncs.c | 17 ++---
 src/backend/replication/slotfuncs.c| 29 
 src/backend/replication/walsender.c|  8 ++-
 src/backend/utils/adt/misc.c   | 12 ++--
 src/backend/utils/adt/pgstatfuncs.c| 25 ---
 src/backend/utils/misc/guc.c   |  7 ++
 src/bin/pg_dump/pg_dumpall.c   |  2 +
 src/bin/pg_upgrade/check.c | 40 ++-
 src/include/catalog/pg_authid.h| 19 +-
 16 files changed, 277 insertions(+), 64 deletions(-)

diff --git a/contrib/test_decoding/expected/permissions.out b/contrib/test_decoding/expected/permissions.out
index 212fd1d..79a7f86 100644
--- a/contrib/test_decoding/expected/permissions.out
+++ b/contrib/test_decoding/expected/permissions.out
@@ -54,13 +54,13 @@ RESET ROLE;
 -- plain user *can't* can control replication
 SET ROLE lr_normal;
 SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-ERROR:  must be superuser or replication role to use replication slots
+ERROR:  must be superuser or member of pg_replication to use replicat

Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-12 Thread Craig Ringer

On 13 May 2015 at 09:55, Stephen Frost  wrote:

> Craig,
>
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
> > On 12 May 2015 at 21:10, Stephen Frost  wrote:
> > > > This can be done without protocol-level changes and with no backward
> > > > compatibility impact to existing applications. Any objections?
> > >
> > > I don't particularly object but I'm not entirely sure that it's that
> > > simple to clear out the cookie value- look at the previous discussion
> > > about ALTER ROLE / password resets and trying to keep them from ending
> > > up in the logs.
> >
> > In this case we don't have to care about the cookie values ending up in
> the
> > logs. They're just single use tokens. If an app can read the PostgreSQL
> > logs or access privileged pg_stat_activity then it's already privileged
> > enough that it's outside the scope of something like this.
>
> Perhaps that would work, but the issue still exists about the connection
> on which it's set potentially being able to see the value if we don't
> scrub it, no?
>

Yes, we must scrub it on that session, but that can be done with a dummy
stats report if nothing else.


>
> > > Perhaps people will feel differently about this since
> > > it's expected to be programatically used, but, personally, I'd favor
> > > trying to do something at the protocol level instead.
> >
> > I would also prefer a protocol level solution, but prior discussion has
> > shown that Tom in particular sees it as unsafe to add new protocol
> messages
> > in the v3 protocol. So I didn't think it was productive to start with a
> > protocol-level approach when it can be done without protocol changes.
>
> I hope it wasn't quite so cut-and-dry as "can't change anything"..
> Especially if it's a client initiated request which clearly indicates
> that the client supports the response then, hopefully, we'd be able to
> make a change.  More specifics or a link to the prior discussion would
> help here.
>

It was with regards to returning the commit LSN.

Thread begins here:
http://www.postgresql.org/message-id/53e2d346.9030...@2ndquadrant.com,
specific issue here:
http://www.postgresql.org/message-id/25297.1407459...@sss.pgh.pa.us . It's
WRT to using a GUC to enable/disable a protocol message, so it may simply
not apply here, since we can have a client-initiated message without which
the server behaviour isn't any different.

> > > As is currently the case, poolers will still have to use a superuser
> > > > connection if they want to pool across users.
> > >
> > > That just needs to be fixed. :(  Having poolers connect as superuser is
> > > *not* ok..
> >
> > While I agree, that's existing common (and horrible) practice, and a
> > separate problem.
> >
> > We need a way to say "this pooler connection can become  > users>". Right now SET SESSION AUTHORIZATION is all or nothing.
>
> That's exactly what SET ROLE provides ...
>
> > Is there a reason why we would need to support both?  Clearly this is
> > > all post-9.5 work and seems like it might be reasonable to have
> > > completed for 9.6.  Have you thought much about how Heikki's work on
> > > SCRAM might play into this?
> >
> > I haven't looked at the SCRAM work, and will take a look.
>
> Ok.
>
> > If it can offer separation of authentication and authorization then it
> > would be in a good position to handle letting SET SESSION AUTHORIZATION
> run
> > as non-superuser, which would be great. I don't see how it could help
> with
> > making it non-resettable though.
>
> It's not going to provide that right off the bat, but it is defining a
> new authentication mechanism with changes to the protocol..  Perhaps
> there's a way to work in the other things you're looking for as part of
> that change?  That is to say, if the client says "I support SCRAM" and
> we authenticate that way then perhaps that can also mean "I can do SCRAM
> to re-authenticate later too"


It shouldn't be necessary to re-authenticate. Just change the active
authorizations to any that the current authentication permits.

Arguably that's what SET ROLE does already though.


> There's two pieces here- is it really a sensible use-case for the
> connection pooler to need to do SET ROLE and the app need to do it?  I'm
> not convinced that actually makes sense or is a use-case we need to
> support.
>

Fair point. A protocol level SET ROLE that cannot be RESET may be enough,
even if it's not totally transparent.

If making it possible with S.S.A. later is necessary we could look at
extending pg_authid but you're right that it's probably not truly needed.


> Uhh, setting role had better change the privilege tests to operate
> against the role that you changed to.  If it doesn't somewhere, then
> that's a bug we need to fix.
>

I'll see if I can turn "vague memories" into something more useful.


> What you're asking for above wrt saying a given user can become a set of
> users is what we've already *got* with roles.  The only thing missing
> here is that the session can detect that

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Craig Ringer

On 13 May 2015 at 00:31, Pavel Stehule  wrote:

>
>
> 2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski :
>
>> On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
>> > create type scheduled_time as (second int[], minute int[], hour int[],
>> dow
>> > int[], month int[]);
>> >  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>> >  (,"{5}",,,) .. run once per hour
>> > Comments, notices?
>>
>> First, please note that I'm definitely not a hacker, just a user.
>>
>> One comment that I'd like to make, is that since we're at planning
>> phase, I think it would be great to add capability to limit number of
>> executions of given command.
>> This would allow running things like "at" in unix - run once, at given
>> time, and that's it.
>>
>
> I would not to store state on this level - so "at" should be implemented
> on higher level. There is very high number of possible strategies, what can
> be done with failed tasks - and I would not to open this topic. I believe
> with proposed scheduler, anybody can simply implement what need in PLpgSQL
> with dynamic SQL. But on second hand "run once" can be implemented with
> proposed API too.
>

That seems reasonable in a v1, so long as there's room to easily extend it
without pain to add "at"-like one-shot commands, at-startup commands, etc.

I'd prefer to see a scheduling interface that's a close match for cron's or
that leaves room for it - so things like "*/5" for every five minutes,
ranges like "Mon-Fri", etc. If there's a way to express similar
capabilities more cleanly using PostgreSQL's types and conventions that
makes sense, but I'm not sure a composite type of arrays fits that.

How do you plan to manage the bgworkers?

In BDR, where we have a similar need to have workers across multiple
databases, and where each database contains a list of workers to launch, we
have:

* A single static "supervisor" bgworker. In 9.5 this will connect with
InvalidOid as the target database so it can only access shared catalogs. In
9.4 this isn't possible in the bgworker API so we have to connect to a
dummy database.

* A dynamic background worker for each database in which BDR is enabled,
which is launched from the supervisor. We check which DBs are BDR-enabled
by (ab)using database security labels and checking pg_shseclabel from the
supervisor worker so we only launch bgworkers on BDR-enabled DBs.

* A dynamic background worker for each peer node, launched by the
per-database worker based on the contents of that database's
bdr.bdr_connections table.

What I suspect you're going to want is:

* A static worker launched by your extension when it starts, which launches
per-db workers for each DB in which the scheduler is enabled. You could use
a GUC listing scheduler-enabled DBs in postgresql.conf and have an
on-reload hook to update it, you don't need to do the security label hack.

* A DB scheduler worker, which looks up the scheduled tasks list, finds the
next scheduled event, and sleeps on a long latch timeout until then,
resetting it when interrupted. When it reaches the scheduled event it would
launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL
procedure over the SPI.

* A task runner worker, which gets launched by the db scheduler to actually
run a task using the SPI.

Does that match your current thinking?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-12 Thread Stephen Frost

Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 12 May 2015 at 21:10, Stephen Frost  wrote:
> > > This can be done without protocol-level changes and with no backward
> > > compatibility impact to existing applications. Any objections?
> >
> > I don't particularly object but I'm not entirely sure that it's that
> > simple to clear out the cookie value- look at the previous discussion
> > about ALTER ROLE / password resets and trying to keep them from ending
> > up in the logs.
> 
> In this case we don't have to care about the cookie values ending up in the
> logs. They're just single use tokens. If an app can read the PostgreSQL
> logs or access privileged pg_stat_activity then it's already privileged
> enough that it's outside the scope of something like this.

Perhaps that would work, but the issue still exists about the connection
on which it's set potentially being able to see the value if we don't
scrub it, no?

> > Perhaps people will feel differently about this since
> > it's expected to be programatically used, but, personally, I'd favor
> > trying to do something at the protocol level instead.
> 
> I would also prefer a protocol level solution, but prior discussion has
> shown that Tom in particular sees it as unsafe to add new protocol messages
> in the v3 protocol. So I didn't think it was productive to start with a
> protocol-level approach when it can be done without protocol changes.

I hope it wasn't quite so cut-and-dry as "can't change anything"..
Especially if it's a client initiated request which clearly indicates
that the client supports the response then, hopefully, we'd be able to
make a change.  More specifics or a link to the prior discussion would
help here.

> I'm very happy to do this at the protocol level, I just don't want to
> implement that and then have to do the SQL-level approach or implement a
> whole v4 protocol ;-)

I certainly agree that we should discuss it first.

> > > As is currently the case, poolers will still have to use a superuser
> > > connection if they want to pool across users.
> >
> > That just needs to be fixed. :(  Having poolers connect as superuser is
> > *not* ok..
> 
> While I agree, that's existing common (and horrible) practice, and a
> separate problem.
> 
> We need a way to say "this pooler connection can become  users>". Right now SET SESSION AUTHORIZATION is all or nothing.

That's exactly what SET ROLE provides ...

> Is there a reason why we would need to support both?  Clearly this is
> > all post-9.5 work and seems like it might be reasonable to have
> > completed for 9.6.  Have you thought much about how Heikki's work on
> > SCRAM might play into this?
> 
> I haven't looked at the SCRAM work, and will take a look.

Ok.

> If it can offer separation of authentication and authorization then it
> would be in a good position to handle letting SET SESSION AUTHORIZATION run
> as non-superuser, which would be great. I don't see how it could help with
> making it non-resettable though.

It's not going to provide that right off the bat, but it is defining a
new authentication mechanism with changes to the protocol..  Perhaps
there's a way to work in the other things you're looking for as part of
that change?  That is to say, if the client says "I support SCRAM" and
we authenticate that way then perhaps that can also mean "I can do SCRAM
to re-authenticate later too" and we can feel comfortable about the
protocol level changes to make what we're talking about here work, at
least when SCRAM is involved.

If that works, we could probably figure out a way to make the other auth
methods able to work with this too.

> > I guess I'm a bit confused at why SET ROLE being visible to apps is a
> > problem..?
> 
> If we used SET ROLE rather than SET SESSION AUTHORIZATION for this, then
> apps that use SET ROLE and RESET ROLE couldn't do so anymore, so a pooled
> connection wouldn't be quite the same as an unpooled connection.

There's two pieces here- is it really a sensible use-case for the
connection pooler to need to do SET ROLE and the app need to do it?  I'm
not convinced that actually makes sense or is a use-case we need to
support.

> session_user would also report the pooler's user identity.

So?  I'm not sure that's an issue, though if it is, we could probably
deal with it in some way.

> postgres=> SET ROLE craig;
> SET
> postgres=> SELECT session_user, current_user;
>  session_user | current_user
> --+--
>  postgres | craig
> (1 row)
> 
> ...  and IIRC it affects some of the privilege and role membership tests,
> which distinguish between currently active role and role membership.

Uhh, setting role had better change the privilege tests to operate
against the role that you changed to.  If it doesn't somewhere, then
that's a bug we need to fix.

> I think if we want to SET the session authorization, SET SESSION
> AUTHORIZATION or a protocol level equivalent is more appropriate. It has
> the desired behaviour, b

Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-12 Thread Craig Ringer

On 12 May 2015 at 21:10, Stephen Frost  wrote:

>
> > This can be done without protocol-level changes and with no backward
> > compatibility impact to existing applications. Any objections?
>
> I don't particularly object but I'm not entirely sure that it's that
> simple to clear out the cookie value- look at the previous discussion
> about ALTER ROLE / password resets and trying to keep them from ending
> up in the logs.

In this case we don't have to care about the cookie values ending up in the
logs. They're just single use tokens. If an app can read the PostgreSQL
logs or access privileged pg_stat_activity then it's already privileged
enough that it's outside the scope of something like this.

> Perhaps people will feel differently about this since
> it's expected to be programatically used, but, personally, I'd favor
> trying to do something at the protocol level instead.
>

I would also prefer a protocol level solution, but prior discussion has
shown that Tom in particular sees it as unsafe to add new protocol messages
in the v3 protocol. So I didn't think it was productive to start with a
protocol-level approach when it can be done without protocol changes.

I'm very happy to do this at the protocol level, I just don't want to
implement that and then have to do the SQL-level approach or implement a
whole v4 protocol ;-)

Is there a reason why they would need to be visible to the privileged
> user?

Not really.

> > As is currently the case, poolers will still have to use a superuser
> > connection if they want to pool across users.
>
> That just needs to be fixed. :(  Having poolers connect as superuser is
> *not* ok..
>

While I agree, that's existing common (and horrible) practice, and a
separate problem.

We need a way to say "this pooler connection can become ". Right now SET SESSION AUTHORIZATION is all or nothing.

Is there a reason why we would need to support both?  Clearly this is
> all post-9.5 work and seems like it might be reasonable to have
> completed for 9.6.  Have you thought much about how Heikki's work on
> SCRAM might play into this?
>

I haven't looked at the SCRAM work, and will take a look.

If it can offer separation of authentication and authorization then it
would be in a good position to handle letting SET SESSION AUTHORIZATION run
as non-superuser, which would be great. I don't see how it could help with
making it non-resettable though.

> I guess I'm a bit confused at why SET ROLE being visible to apps is a
> problem..?

If we used SET ROLE rather than SET SESSION AUTHORIZATION for this, then
apps that use SET ROLE and RESET ROLE couldn't do so anymore, so a pooled
connection wouldn't be quite the same as an unpooled connection.

session_user would also report the pooler's user identity.

postgres=> SET ROLE craig;
SET
postgres=> SELECT session_user, current_user;
 session_user | current_user
--+--
 postgres | craig
(1 row)

...  and IIRC it affects some of the privilege and role membership tests,
which distinguish between currently active role and role membership.

I think if we want to SET the session authorization, SET SESSION
AUTHORIZATION or a protocol level equivalent is more appropriate. It has
the desired behaviour, barring a limit on being reset.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-12 Thread Andres Freund

On 2015-05-12 00:42:14 +0530, Abhijit Menon-Sen wrote:
> At 2015-05-11 19:15:47 +0200, and...@anarazel.de wrote:
> > I don't really care how it's named, as long as it makes clear that
> > it's not an exact measurement.
> 
> Not having heard any better suggestions, I picked "pgstatapprox" as a
> compromise between length and familiarity/consistency with pgstattuple.

I can live with that, although I'd personally go with
pgstattuple_approx()...

Other than that and some minor local changes I've made, I'm ready to
commit this.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-12 Thread Kouhei Kaigai

Hello,

I tried to make patches for the three approaches.
Please don't think the option-3 serious proposition, however,
it is the only solution at this moment unfortunately.

In my understanding, we don't guarantee interface compatibility
across major version up, including the definitions of non-static
functions. It is role of extension's author to follow up the
new major version (and raise a problem report during development
cycle if feature update makes problems without alternatives).
In fact, people usually submit patches and a part of them changes
definition of non-static functions, however, nobody can guarantee
no extension uses this function thus don't break compatibility.
It is a collateral evidence we don't think non-static functions
are not stable interface for extensions, and it shall not be
a reason why to prohibit functions in spite of its necessity.

On the other hands, I understand it is not only issues around
createplan.c, but also a (philosophical) issue around criteria
and human's decision which functions should be static or
non-static. So, it usually takes time to get overall consensus.
If we keep the create_plan_recurse() static, the option-2 is
a solution to balance both of opinions.

Anyway, I really dislike the option-3, want to have a solution.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Kaigai Kouhei(海外 浩平)
> Sent: Tuesday, May 12, 2015 10:24 AM
> To: 'Andres Freund'; 'Robert Haas'
> Cc: 'Tom Lane'; 'Kohei KaiGai'; 'Thom Brown'; 'Shigeru Hanada';
> 'pgsql-hackers@postgreSQL.org'
> Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> 
> Hi,
> 
> Let me back to the original concern and show possible options
> we can take here. At least, the latest master is not usable to
> implement custom-join logic without either of these options.
> 
> option-1)
> Revert create_plan_recurse() to non-static function for extensions.
> It is the simplest solution, however, it is still gray zone which
> functions shall be public and whether we deal with the non-static
> functions as a stable API or not.
> IMO, we shouldn't treat non-static functions as stable APIs, even
> if it can be called by extensions not only codes in another source
> file. In fact, we usually changes definition of non-static functions
> even though we know extensions uses. It is role of extension to
> follow up the feature across major version.
> 
> 
> option-2)
> Tom's suggestion. Add a new list field of Path nodes on CustomPath,
> then create_customscan_plan() will call static create_plan_recurse()
> function to construct child plan nodes.
> Probably, the attached patch will be an image of this enhancement,
> but not tested yet, of course. Once we adopt this approach, I'll
> adjust my PG-Strom code towards the new interface within 2 weeks
> and report problems if any.
> 
> 
> option-3)
> Enforce authors of custom-scan provider to copy and paste createplan.c.
> I really don't want this option and nobody will be happy.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei 
> 
> 
> > -Original Message-
> > From: Kaigai Kouhei(海外 浩平)
> > Sent: Monday, May 11, 2015 12:48 PM
> > To: 'Andres Freund'; Robert Haas
> > Cc: Tom Lane; Kohei KaiGai; Thom Brown; Shigeru Hanada;
> > pgsql-hackers@postgreSQL.org
> > Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> >
> > > On 2015-05-10 21:26:26 -0400, Robert Haas wrote:
> > > > On Sun, May 10, 2015 at 8:41 PM, Tom Lane  wrote:
> > > > > > This commit reverts create_plan_recurse() as static function.
> > > > > Yes.  I am not convinced that external callers should be calling that,
> > > > > and would prefer not to enlarge createplan.c's API footprint without a
> > > > > demonstration that this is right and useful.  (This is one of many
> > > > > ways in which this patch is suffering from having gotten committed
> > > > > without submitted use-cases.)
> > >
> > > Wasn't there a submitted use case? IIRC Kaigai had referenced some
> > > pg-strom (?) code using it?
> > >
> > > I'm failing to see how create_plan_recurse() being exposed externally is
> > > related to "having gotten committed without submitted use-cases".  Even
> > > if submitted, presumably as simple as possible code, doesn't use it,
> > > that's not a proof that less simple code does not need it.
> > >
> > Yes, PG-Strom code uses create_plan_recurse() to construct child plan
> > node of the GPU accelerated custom-join logic, once it got chosen.
> > Here is nothing special. It calls create_plan_recurse() as built-in
> > join path doing on the underlying inner/outer paths.
> > It is not difficult to submit as a working example, however, its total
> > code size (excludes GPU code) is 25KL at this moment.
> >
> > I'm not certain whether it is a simple example.
> >
> > > > Your unwillingness to make functions global or to stick PGDLLIMPORT
> > > > markings on variables that pe

Re: [HACKERS] Final Patch for GROUPING SETS

2015-05-12 Thread Andres Freund

On 2015-05-12 20:40:49 +0200, Andres Freund wrote:
> On 2015-05-12 05:36:19 +0200, Andres Freund wrote:
> > What I dislike so far:
> > * Minor formatting things. Just going to fix and push the ones I
> >   dislike.
> > * The Hopcroft-Karp stuff not being separate
> > * The increased complexity of grouping_planner. It'd imo be good if some
> >   of that could be refactored into a separate function. Specifically the
> >   else if (parse->hasAggs || (parse->groupingSets && parse->groupClause))
> >   block.
> > * I think it'd not hurt to add rule deparse check for the function in
> >   GROUPING SETS case. I didn't see one at least.
> 
> * The code in nodeAgg.c isn't pretty in places. Stuff like if
>   (node->chain_depth > 0) estate->agg_chain_head = save_chain_head;...
>   Feels like a good bit of cleanup would be possible there.

In the executor I'd further like:
* to split agg_retrieve_direct into a version for grouping sets and one
  without. I think that'll be a pretty clear win for clarity.
* to spin out common code between agg_retrieve_direct (in both the
  functions its split into), agg_retrieve_hashed and
  agg_retrieve_chained. It should e.g. be fairly simple to spin out the
  tail end processing of a input group (finalize_aggregate loop,
  ExecQual) into a separate function.

Andrew, are you going to be working on any of these?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-12 Thread Peter Geoghegan

On Tue, May 12, 2015 at 3:18 PM, Peter Geoghegan  wrote:
> On Tue, May 12, 2015 at 3:16 PM, Andres Freund  wrote:
>> Could you rebase and adjust your patch? I'd rather have the inference
>> structure refactoring separate.
>
> Sure.

Rebased version of patch is attached.

-- 
Peter Geoghegan
From cc2fcd8862e01880b1559f315aa3da23017edfe6 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Mon, 11 May 2015 15:37:54 -0700
Subject: [PATCH] Further fixes to minor ON CONFLICT issues

Deparsing with an inference clause is now correctly supported.  In
passing, re-factor InferenceElem representation of opclass, to defer
opfamily lookup for Relation index matching until index inference proper
(i.e., within the optimizer).  This is done for the convenience of the
new ruleutils.c code, but independently make senses.

Finally, fix a few typos, and rename a variable -- "candidates" seemed
like a misnomer for the return value of infer_arbiter_indexes().
---
 contrib/pg_stat_statements/pg_stat_statements.c |  3 +-
 src/backend/executor/nodeModifyTable.c  |  2 +-
 src/backend/nodes/copyfuncs.c   |  3 +-
 src/backend/nodes/equalfuncs.c  |  3 +-
 src/backend/nodes/outfuncs.c|  3 +-
 src/backend/nodes/readfuncs.c   |  3 +-
 src/backend/optimizer/util/plancat.c| 34 +++
 src/backend/parser/parse_clause.c   | 16 ++
 src/backend/utils/adt/ruleutils.c   | 75 -
 src/include/nodes/primnodes.h   |  3 +-
 src/test/regress/expected/insert_conflict.out   |  2 +-
 src/test/regress/expected/rules.out | 38 +++--
 src/test/regress/sql/rules.sql  |  7 ++-
 13 files changed, 133 insertions(+), 59 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6abe3f0..f97cc2c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2637,8 +2637,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 InferenceElem *ie = (InferenceElem *) node;
 
 APP_JUMB(ie->infercollid);
-APP_JUMB(ie->inferopfamily);
-APP_JUMB(ie->inferopcinputtype);
+APP_JUMB(ie->inferopclass);
 JumbleExpr(jstate, ie->expr);
 			}
 			break;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 89f1f57..72bbd62 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -414,7 +414,7 @@ ExecInsert(ModifyTableState *mtstate,
    estate, true, &specConflict,
    arbiterIndexes);
 
-			/* adjust the tuple's state accordingly */
+			/* adjust the tuple's state */
 			if (!specConflict)
 heap_finish_speculative(resultRelationDesc, tuple);
 			else
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 76b63af..957c2bc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1803,8 +1803,7 @@ _copyInferenceElem(const InferenceElem *from)
 
 	COPY_NODE_FIELD(expr);
 	COPY_SCALAR_FIELD(infercollid);
-	COPY_SCALAR_FIELD(inferopfamily);
-	COPY_SCALAR_FIELD(inferopcinputtype);
+	COPY_SCALAR_FIELD(inferopclass);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index e032142..f26626e 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -687,8 +687,7 @@ _equalInferenceElem(const InferenceElem *a, const InferenceElem *b)
 {
 	COMPARE_NODE_FIELD(expr);
 	COMPARE_SCALAR_FIELD(infercollid);
-	COMPARE_SCALAR_FIELD(inferopfamily);
-	COMPARE_SCALAR_FIELD(inferopcinputtype);
+	COMPARE_SCALAR_FIELD(inferopclass);
 
 	return true;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index fe868b8..454d9ec 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1450,8 +1450,7 @@ _outInferenceElem(StringInfo str, const InferenceElem *node)
 
 	WRITE_NODE_FIELD(expr);
 	WRITE_OID_FIELD(infercollid);
-	WRITE_OID_FIELD(inferopfamily);
-	WRITE_OID_FIELD(inferopcinputtype);
+	WRITE_OID_FIELD(inferopclass);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 8136306..81b6243 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1141,8 +1141,7 @@ _readInferenceElem(void)
 
 	READ_NODE_FIELD(expr);
 	READ_OID_FIELD(infercollid);
-	READ_OID_FIELD(inferopfamily);
-	READ_OID_FIELD(inferopcinputtype);
+	READ_OID_FIELD(inferopclass);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b425680..a857ba3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -438,8 +438,8 @@ infer_arbiter_indexes(PlannerInfo *root)
 	Bitmapset  *inferAttrs = NULL;
 	List	   *inferElems = NIL;
 
-	/* Result */
-	List	   *

[HACKERS] Why does contain_leaked_vars believe MinMaxExpr is safe?

2015-05-12 Thread Tom Lane

MinMaxExpr is an implicit invocation of a btree comparison function.
Are we supposing that all of those are necessarily leakproof?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-12 Thread Peter Geoghegan

On Tue, May 12, 2015 at 3:16 PM, Andres Freund  wrote:
> Could you rebase and adjust your patch? I'd rather have the inference
> structure refactoring separate.

Sure.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-12 Thread Andres Freund

On 2015-05-12 23:40:47 +0200, Andres Freund wrote:
> That's just a straight up bug. expression_tree_walker (called via
> ChangeVarNodes) did not know about exclRelTlist, leading to
> fix_join_expr() not matching the excluded vars to the excluded
> relation/tlist.

I've fixed these, and added a minimum amount of tests. Pleas echeck
whether more are needed.

Could you rebase and adjust your patch? I'd rather have the inference
structure refactoring separate.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-12 Thread Peter Geoghegan

On Tue, May 12, 2015 at 2:40 PM, Andres Freund  wrote:
>> It's not as if I have no idea. ReplaceVarsFromTargetList() is probably
>> quite confused by all this, because the passed nomatch_varno argument
>> is often rt_index -- but what about EXCLUDED.*? adjustJoinTreeList()
>> does not know anything about EXCLUDED.* either. I see little practical
>> reason to make the rewriter do any better.
>
> I don't think any of these are actually influenced by upsert?

Evidently you're right. I'm not opposed to supporting ON CONFLICT
UPDATE with CREATE RULE, even if such rules are completely wonky. It
might be a useful way of testing the implementation.

>> When I debugged the problem of the optimizer raising that "target
>> lists" error with a rule with an action with EXCLUDED.* (within
>> setrefs.c's fix_join_expr_mutator()), it looked like an off-by-one
>> issue here:
>>
>> /* If it's for acceptable_rel, adjust and return it */
>> if (var->varno == context->acceptable_rel)
>> {
>> var = copyVar(var);
>> var->varno += context->rtoffset;
>> if (var->varnoold > 0)
>> var->varnoold += context->rtoffset;
>> return (Node *) var;
>> }
>
> That's just a straight up bug. expression_tree_walker (called via
> ChangeVarNodes) did not know about exclRelTlist, leading to
> fix_join_expr() not matching the excluded vars to the excluded
> relation/tlist.

The omissions you mention should be corrected on general principle, I think.

> I'm not immediately seing how that could bit us otherwise today, but
> it'd definitely otherwise be a trap. That's why I think it's unwise to
> ignore problems before having fully debugged them.

Fair point.

> Additionally OffsetVarNodes() did not adjust exclRelIndex, which
> "breaks" explain insofar it's not going to display the 'excluded.'
> anymore. I don't think it could have other consequences.

Okay.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-12 Thread Andres Freund

On 2015-05-11 20:16:00 -0700, Peter Geoghegan wrote:
> On Mon, May 11, 2015 at 7:34 PM, Andres Freund  wrote:
> > You should try to understand why it's failing. Just prohibiting the
> > rules, without understanding what's actually going on, could very well
> > hide a real bug.
> 
> It's not as if I have no idea. ReplaceVarsFromTargetList() is probably
> quite confused by all this, because the passed nomatch_varno argument
> is often rt_index -- but what about EXCLUDED.*? adjustJoinTreeList()
> does not know anything about EXCLUDED.* either. I see little practical
> reason to make the rewriter do any better.

I don't think any of these are actually influenced by upsert?

> When I debugged the problem of the optimizer raising that "target
> lists" error with a rule with an action with EXCLUDED.* (within
> setrefs.c's fix_join_expr_mutator()), it looked like an off-by-one
> issue here:
> 
> /* If it's for acceptable_rel, adjust and return it */
> if (var->varno == context->acceptable_rel)
> {
> var = copyVar(var);
> var->varno += context->rtoffset;
> if (var->varnoold > 0)
> var->varnoold += context->rtoffset;
> return (Node *) var;
> }

That's just a straight up bug. expression_tree_walker (called via
ChangeVarNodes) did not know about exclRelTlist, leading to
fix_join_expr() not matching the excluded vars to the excluded
relation/tlist.

I'm not immediately seing how that could bit us otherwise today, but
it'd definitely otherwise be a trap. That's why I think it's unwise to
ignore problems before having fully debugged them.

Additionally OffsetVarNodes() did not adjust exclRelIndex, which
"breaks" explain insofar it's not going to display the 'excluded.'
anymore. I don't think it could have other consequences.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-12 Thread Tom Lane

I wrote:
> I did a very basic update of your postgres_fdw patch to test this with,
> and attach that so that you don't have to repeat the effort.  I'm not sure
> whether we want to try to convert that into something committable.  I'm
> afraid that the extra round trips involved in doing row locking this way
> will be so expensive that no one really wants it for postgres_fdw.  It's
> more credible that FDWs operating against local storage would have use
> for it.

Of course, if we don't do that then we still have your original gripe
about ctid not being correct in EvalPlanQual results.  I poked at this
a bit and it seems like we could arrange to pass ctid through even in
the ROW_MARK_COPY case, if we define the t_ctid field of a composite
Datum as being the thing to use.  The attached WIP, mostly-comment-free
patch seems to fix the original test case.  It would likely result in
ctid coming out as (0,0) not (4294967295,0) for FDWs that don't take
any special steps to return a valid ctid, as a result of the fact that
heap_form_tuple just leaves that field zeroed.  We could fix that by
adding an ItemPointerSetInvalid(&(td->t_ctid)) call to heap_form_tuple,
but I dunno if we want to add even a small number of cycles for this
purpose to such a core function.

regards, tom lane

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0c44260..2350de3 100644
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
*** make_tuple_from_result_row(PGresult *res
*** 2965,2971 
--- 2965,2974 
  	tuple = heap_form_tuple(tupdesc, values, nulls);
  
  	if (ctid)
+ 	{
  		tuple->t_self = *ctid;
+ 		tuple->t_data->t_ctid = *ctid;
+ 	}
  
  	/* Clean up */
  	MemoryContextReset(temp_context);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 43d3c44..4c39e06 100644
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
*** EvalPlanQualFetchRowMarks(EPQState *epqs
*** 2613,2621 
  
  			/* build a temporary HeapTuple control structure */
  			tuple.t_len = HeapTupleHeaderGetDatumLength(td);
- 			ItemPointerSetInvalid(&(tuple.t_self));
  			/* relation might be a foreign table, if so provide tableoid */
  			tuple.t_tableOid = erm->relid;
  			tuple.t_data = td;
  
  			/* copy and store tuple */
--- 2613,2622 
  
  			/* build a temporary HeapTuple control structure */
  			tuple.t_len = HeapTupleHeaderGetDatumLength(td);
  			/* relation might be a foreign table, if so provide tableoid */
  			tuple.t_tableOid = erm->relid;
+ 			/* also copy t_ctid in case somebody supplied it */
+ 			tuple.t_self = td->t_ctid;
  			tuple.t_data = td;
  
  			/* copy and store tuple */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mogrify and indent features for jsonb

2015-05-12 Thread Eva7

Yeaha didn't work either on  http://jsonprettyprint.net
   for me.



--
View this message in context: 
http://postgresql.nabble.com/mogrify-and-indent-features-for-jsonb-tp5838008p5848933.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN range operator class

2015-05-12 Thread Alvaro Herrera

Heikki Linnakangas wrote:
> On 05/12/2015 10:49 PM, Alvaro Herrera wrote:
> >If in the future, for instance, we come up with a way to store the ipv4
> >plus ipv6 info, we will want to change the page format.  If we add a
> >page version to the metapage, we can detect the change at pg_upgrade
> >time and force a reindex of the index.
> 
> A version number in the metapage is a certainly a good idea. But we already
> have that, don't we? :
> 
> >/* Metapage definitions */
> >typedef struct BrinMetaPageData
> >{
> > uint32  brinMagic;
> > uint32  brinVersion;
> > BlockNumber pagesPerRange;
> > BlockNumber lastRevmapPage;
> >} BrinMetaPageData;
> >
> >#define BRIN_CURRENT_VERSION 1
> >#define BRIN_META_MAGIC  0xA8109CFA
> 
> Did you have something else in mind?

Yeah, I was thinking we could have a separate version number for the
opclass code as well.  An external extension could change that, for
instance.  Also, we could change the 'inclusion' version and leave
minmax alone.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN range operator class

2015-05-12 Thread Alvaro Herrera

So, in reading these patches, it came to me that we might want to have
pg_upgrade mark indexes invalid if we in the future change the
implementation of some opclass.  For instance, the inclusion opclass
submitted here uses three columns: the indexed value itself, plus two
booleans; each of these booleans is a workaround for some nasty design
decision in the underlying datatypes.

One boolean is "unmergeable": if a block range contains both IPv4 and
IPv6 addresses, we mark it as 'unmergeable' and then every query needs
to visit that block range always.  The other boolean is "contains empty"
and is used for range types: it is set if the empty value is present
somewhere in the block range.

If in the future, for instance, we come up with a way to store the ipv4
plus ipv6 info, we will want to change the page format.  If we add a
page version to the metapage, we can detect the change at pg_upgrade
time and force a reindex of the index.

Thoughts?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN range operator class

2015-05-12 Thread Heikki Linnakangas


On 05/12/2015 10:49 PM, Alvaro Herrera wrote:

If in the future, for instance, we come up with a way to store the ipv4
plus ipv6 info, we will want to change the page format.  If we add a
page version to the metapage, we can detect the change at pg_upgrade
time and force a reindex of the index.


A version number in the metapage is a certainly a good idea. But we 
already have that, don't we? :



/* Metapage definitions */
typedef struct BrinMetaPageData
{
uint32  brinMagic;
uint32  brinVersion;
BlockNumber pagesPerRange;
BlockNumber lastRevmapPage;
} BrinMetaPageData;

#define BRIN_CURRENT_VERSION1
#define BRIN_META_MAGIC 0xA8109CFA


Did you have something else in mind?

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2015-05-12 Thread Andres Freund

On 2015-05-12 05:36:19 +0200, Andres Freund wrote:
> What I dislike so far:
> * Minor formatting things. Just going to fix and push the ones I
>   dislike.
> * The Hopcroft-Karp stuff not being separate
> * The increased complexity of grouping_planner. It'd imo be good if some
>   of that could be refactored into a separate function. Specifically the
>   else if (parse->hasAggs || (parse->groupingSets && parse->groupClause))
>   block.
> * I think it'd not hurt to add rule deparse check for the function in
>   GROUPING SETS case. I didn't see one at least.

* The code in nodeAgg.c isn't pretty in places. Stuff like if
  (node->chain_depth > 0) estate->agg_chain_head = save_chain_head;...
  Feels like a good bit of cleanup would be possible there.

> I think the problem is "just" that for each variable, in each grouping
> set - a very large number in a large cube - we're recursing through the
> whole ChainAggregate tree, as each Var just points to a var one level
> lower.
> 
> It might be worthwhile to add a little hack that deparses the variables
> agains the "lowest" relevant node (i.e. the one below the last chain
> agg). Since they'll all have the same targetlist that ought to be safe.

I've prototype hacked this, and indeed, adding a shortcut from the
intermediate chain nodes to the 'leaf chain node' cuts the explain time
from 11 to 2 seconds on some arbitrary statement. The remaining time is
the equivalent problem in the sort nodes...

I'm not terribly bothered by this. We can relatively easily fix this up
if required.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-12 Thread Tom Lane

Etsuro Fujita  writes:
> On 2015/05/12 7:42, Tom Lane wrote:
>> I don't much like the division of labor between LockForeignRow and
>> FetchForeignRow.  In principle that would lead to not one but two
>> extra remote accesses per locked row in SELECT FOR UPDATE, at least
>> in the case that an EvalPlanQual recheck is required.  (I see that
>> in your prototype patch for postgres_fdw you attempt to avoid that
>> by saving a retrieved tuple in LockForeignRow and then returning it
>> in FetchForeignRow, but that seems extremely ugly and bug-prone,
>> since there is nothing in the API spec insisting that those calls match
>> up one-to-one.)  The fact that locking and fetching a tuple are separate
>> operations in the heapam API is a historical artifact that probably
>> doesn't make sense to duplicate in the FDW API.

> I understand your concern about the postgres_fdw patch.  However, I
> think we should divide this into the two routines as the core patch
> does, because I think that would allow an FDW author to implement these
> routines so as to improve the efficiency for scenarios that seldom need
> fetching, by not retrieving and saving locked tuples in LockForeignRow.

I find it hard to envision a situation where an FDW could lock a row
without being able to fetch its contents more or less for free.  We have
IIRC discussed changing the way that works even for heapam, since the
current design requires multiple buffer lock/unlock cycles which aren't
free either.  In any case, I think that the temptation to do probably-
buggy stuff like what you did in your prototype would be too strong for
most people, and that we're much better off with a simpler one-step API.

An additional advantage of the way I changed this is that it makes the
logic in nodeLockRows simpler too; we no longer need the very messy
hack added by commit 2db576ba8c449fca.

>> Another problem is that we fail to detect whether an EvalPlanQual recheck
>> is required during a SELECT FOR UPDATE on a remote table, which we
>> certainly ought to do if the objective is to make postgres_fdw semantics
>> match the local ones.

> I think that is interesting in theory, but I'm not sure that is worth
> much in practice.

Hm, well, AFAICS the entire point of this patch is to make it possible for
FDWs to duplicate the semantics used for local tables, so I'm not sure
why you'd want to ignore that aspect of it.

Anyway, I redid the patch along those lines, improved the documentation,
and have committed it.

I did a very basic update of your postgres_fdw patch to test this with,
and attach that so that you don't have to repeat the effort.  I'm not sure
whether we want to try to convert that into something committable.  I'm
afraid that the extra round trips involved in doing row locking this way
will be so expensive that no one really wants it for postgres_fdw.  It's
more credible that FDWs operating against local storage would have use
for it.

regards, tom lane

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0c44260..a122c9e 100644
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
*** typedef struct PgFdwRelationInfo
*** 88,93 
--- 88,95 
   *
   * 1) SELECT statement text to be sent to the remote server
   * 2) Integer list of attribute numbers retrieved by the SELECT
+  * 3) SELECT statement text to be sent to the remote server
+  * 4) Integer list of attribute numbers retrieved by the SELECT
   *
   * These items are indexed with the enum FdwScanPrivateIndex, so an item
   * can be fetched with list_nth().  For example, to get the SELECT statement:
*** enum FdwScanPrivateIndex
*** 98,104 
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs
  };
  
  /*
--- 100,110 
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs,
! 	/* SQL statement to execute remotely (as a String node) */
! 	FdwScanPrivateSelectSql2,
! 	/* Integer list of attribute numbers retrieved by SELECT */
! 	FdwScanPrivateRetrievedAttrs2
  };
  
  /*
*** typedef struct PgFdwModifyState
*** 186,191 
--- 192,221 
  } PgFdwModifyState;
  
  /*
+  * Execution state for fetching/locking foreign rows.
+  */
+ typedef struct PgFdwFetchState
+ {
+ 	Relation	rel;			/* relcache entry for the foreign table */
+ 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
+ 
+ 	/* for remote query execution */
+ 	PGconn	   *conn;			/* connection for the fetch */
+ 	char	   *p_name;			/* name of prepared statement, if created */
+ 
+ 	/* extracted fdw_private data */
+ 	char	   *query;			/* text of SELECT command */
+ 	List	   *retrieved_attrs;	/* attr numbers retriev

Re: [HACKERS] point_ops for GiST

2015-05-12 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Robert Haas wrote:
> > 2009/12/30 Teodor Sigaev :
> > > Sync with current CVS
> > 
> > I have reviewed this patch and it looks good to me.  The only
> > substantive question I have is why gist_point_consistent() uses a
> > different coding pattern for the box case than it does for the polygon
> > and circle cases?  It's not obvious to me on the face of it why these
> > aren't consistent.
> 
> Emre Hasegeli just pointed out to me that this patch introduced
> box_contain_pt() and in doing so used straight C comparison (<= etc)
> instead of FPlt() and friends.

(This is commit 4cbe473938779ec414d90c2063c4398e68a70838)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] point_ops for GiST

2015-05-12 Thread Alvaro Herrera

Robert Haas wrote:
> 2009/12/30 Teodor Sigaev :
> > Sync with current CVS
> 
> I have reviewed this patch and it looks good to me.  The only
> substantive question I have is why gist_point_consistent() uses a
> different coding pattern for the box case than it does for the polygon
> and circle cases?  It's not obvious to me on the face of it why these
> aren't consistent.

Emre Hasegeli just pointed out to me that this patch introduced
box_contain_pt() and in doing so used straight C comparison (<= etc)
instead of FPlt() and friends.  I would think that that's a bug and
needs to be changed -- but certainly not backpatched, because gist
indexes would/might become corrupt.

This is in the context of his inclusion opclass for BRIN
http://www.postgresql.org/message-id/CAE2gYzwBZQ=z02zioefnhrxof+1vegqc_cwpvj8lwnqgx1-...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-xacts and our process problem

2015-05-12 Thread Stephen Frost

* Peter Geoghegan (p...@heroku.com) wrote:
> On Tue, May 12, 2015 at 6:00 AM, Robert Haas  wrote:
> > Another crucial difference between the multixact patch and many other
> > patches is that it wasn't a feature you could turn off.  For example,
> > if BRIN has bugs, you can almost certainly avoid hitting them by not
> > using BRIN.  And many people won't, so even if the feature turns out
> > to be horrifically buggy, 90%+ of our users will not even notice.
> > ALTER TABLE .. SET LOGGED/UNLOGGED may easily have bugs that eat your
> > data, but if you don't use it, then you won't be affected.  Of the
> > major user-visible features committed to 9.5 that could hose our users
> > more broadly, I'd put RLS and UPSERT pretty high on the list.  We
> > might be lucky enough that any breakage there is confined to users of
> > those features, but the code is not as contained as it is for
> > something like BRIN, so there is a risk of breaking other stuff.
> 
> I think that the chances of UPSERT seriously affecting those that
> don't use it are extremely low. For those that use the feature, we
> haven't repeated the mistakes of Multixacts: the on-disk
> representation of tuples that are committed is always identical to the
> historic representation of ordinary tuples, because speculative
> insertions are explicitly "confirmed". VACUUM does not need to care.

I feel more-or-less the same about RLS, but then again, it can be
difficult to see issues when you're so close to a piece of work.

Reviewing UPSERT is on my list of things to do post-feature freeze and
I'd certainly welcome additional review of RLS.  Thankfully, it's gotten
review, comments, and rework since it went in and is in quite a bit
better shape than it was originally.  It would have been better to get
some of that before it went in, though, on the flip side, we could have
ended up spending a lot more time trying to hash through the UPSERT+RLS
bits if RLS hadn't already gone in and been worked through by that
point and I'm also not sure if we would have gotten the other
improvments and changes in (particularly things like the improvement of
qual push-down through RLS and SB views, and changing when RLS on
INSERT/UPDATE happens would have been more difficult post
feature-freeze..).

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Silent Data Manipulation - Time(stamp) w/o Time Zone Literals w/ TZ

2015-05-12 Thread David G. Johnston

"In a literal that has been determined to be timestamp without time zone,
PostgreSQL will silently ignore any time zone indication. That is, the
resulting value is derived from the date/time fields in the input value,
and is not adjusted for time zone."

http://www.postgresql.org/docs/9.3/static/datatype-datetime.html

8.5.1.3 Time Stamps


Not exactly a widely impactful decision but is there harm in emitting a

"NOTICE: timezone specification ignored"

Just got a complaint on -admin about this:

http://www.postgresql.org/message-id/CACT-NGKQRtBGKOGUMT_goxyKu=yym8et1q5my-yjfuww++u...@mail.gmail.com

David J.

Re: [HACKERS] BRIN range operator class

2015-05-12 Thread Alvaro Herrera

Alvaro Herrera wrote:

> In patch 05, you use straight > etc comparisons of point/box values.
> All the other code in that file AFAICS uses FPlt() macros and others; I
> assume we should do likewise.

Oooh, looking at the history of this I just realized that the comments
signed "tgl" are actually Thomas G. Lockhart, not Tom G. Lane!  See
commit 9e2a87b62db87fc4175b00dabfd26293a2d072fa

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN range operator class

2015-05-12 Thread Alvaro Herrera

Emre Hasegeli wrote:
> > I pushed patches 04 and 07, as well as adopting some of the changes to
> > the regression test in 06.  I'm afraid I caused a bit of merge pain for
> > you -- sorry about that.
> 
> No problem.  I rebased the remaining ones.

In patch 05, you use straight > etc comparisons of point/box values.
All the other code in that file AFAICS uses FPlt() macros and others; I
assume we should do likewise.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-xacts and our process problem

2015-05-12 Thread Peter Geoghegan

On Tue, May 12, 2015 at 6:00 AM, Robert Haas  wrote:
> I think that's rather facile, and I really don't see how you would
> know that from looking at those release notes.  I thought multixacts
> had risk, but obviously nobody came close to predicting how bad things
> were going to be.  If they had, I'm pretty sure we would have pulled
> the patch.  The fact that the 9.4 btree changes weren't equally
> destabilizing doesn't mean that they weren't risky.  There was a risk
> that the Cuban missile crisis would start a nuclear war; in the end,
> it didn't, but that doesn't mean there was no risk.

I think you go on to make my argument for me, here. The fklocks patch
was particularly big and complicated, and slipped 9.2, and everyone
was more or less obligated to use it with their existing application.
It was not difficult to imagine that it was *the* highest risk item.
That wasn't a particularly useful observation at that point - I don't
think it made anyone very introspective about MultiXacts. My point, of
course, is that it was a concern about relative risk, as opposed to
absolute risk, and there's not that much you can do with that -
something has to be #1.

> Part of what went wrong with multixacts is neither Alvaro nor anyone
> who reviewed the patch gave adequate thought to the vacuum
> requirements.  There was a whole series of things that needed to be
> done there which just weren't done.  I think if it had been realized
> how much work remained to do there, and how necessary it was for every
> single bit of machinery that we have for freezing xmin to also exist
> for freezing xmax, we would not have gone forward.  Conceptual
> failures, where there is a whole class of work that you just don't
> even realize needs to be done, are much more damaging than mechanical
> errors, where you realize that something needs to be done but you
> don't do it correctly.

I agree, but no one really knew this at the time. Despite this,
everyone still would have identified fklocks as the highest risk item,
and indeed, some actually did. It's relatively easy to say that
something is the highest risk item in an anonymous poll. That's what
makes it easy to not take it seriously.

> Another crucial difference between the multixact patch and many other
> patches is that it wasn't a feature you could turn off.  For example,
> if BRIN has bugs, you can almost certainly avoid hitting them by not
> using BRIN.  And many people won't, so even if the feature turns out
> to be horrifically buggy, 90%+ of our users will not even notice.
> ALTER TABLE .. SET LOGGED/UNLOGGED may easily have bugs that eat your
> data, but if you don't use it, then you won't be affected.  Of the
> major user-visible features committed to 9.5 that could hose our users
> more broadly, I'd put RLS and UPSERT pretty high on the list.  We
> might be lucky enough that any breakage there is confined to users of
> those features, but the code is not as contained as it is for
> something like BRIN, so there is a risk of breaking other stuff.

I think that the chances of UPSERT seriously affecting those that
don't use it are extremely low. For those that use the feature, we
haven't repeated the mistakes of Multixacts: the on-disk
representation of tuples that are committed is always identical to the
historic representation of ordinary tuples, because speculative
insertions are explicitly "confirmed". VACUUM does not need to care.

> Departing from what's user-visible, Heikki's WAL format changes could
> break recovery badly for everyone and we could just be screwed.  That
> risk is particularly acute because we really can't change the WAL
> format once the release is shipped.  If it's broken, we're probably in
> big trouble.  Multixacts, too, fell into this category of things that
> cannot be turned off: they touched the heap storage format, and anyone
> who used foreign keys (which is nearly everyone) really had no choice
> but to use them.

It seems like you're just saying that because it's a complicated patch
that touches the WAL format. It's not a specific concern, and it's not
a concern about a systematic defect or "conceptual failure", as you
put it. That makes it of limited value - you can't hold up progress
because of a very vague concern like that.

> All of these things combined in an explosive fashion.  If the patch
> had been simple enough to be broadly understandable, or if it had been
> something that could plausibly have come with an "off" switch, or if
> anyone had realized that there were whole areas that had not been
> thought through carefully, the consequences would have been much less
> serious.

Agreed.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: pg_basebackup -F t now succeeds with a long symlink target

2015-05-12 Thread Alvaro Herrera


You have a memory error in this patch -- clobbered by pfree:

> Details
> ---
> http://git.postgresql.org/pg/commitdiff/97e0aa697983cf7f7f79e69f2dc248fdefb7dbf6
  here

Sorry, couldn't resist.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2015-05-12 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, May 12, 2015 at 12:08 PM, Stephen Frost  wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> >> It's says:
> >>
> >> starting vacuum ERROR: blah
> >> ERROR: blah
> >> ERROR: blah
> >> done
> >>
> >> And then continues on.  Sure, that's not the greatest error reporting
> >> output ever, but what do you expect from pgbench?  I think it's clear
> >> enough what's going on there.  The messages appear in quick
> >> succession, because it doesn't take very long to notice that the table
> >> isn't there, so it's not like you are sitting there going "wait,
> >> what?".
> >>
> >> If we're going to add something, I like your second suggestion
> >> "(ignoring this error and continuing anyway)" more than the first one.
> >> Putting "ignoring:" before the thing you plan to ignore will be
> >> confusing, I think.
> >
> > +1 to adding "(ignoring this error and continuing anyway)" and
> > committing this.
> 
> You want to take care of that?

Done.  Results looked good to me:

--> pgbench -f test.sql
starting vacuum...ERROR:  relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
transaction type: Custom query
scaling factor: 1
query mode: simple
[...]

CF app updated.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Pavel Stehule

2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski :

> On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
> > create type scheduled_time as (second int[], minute int[], hour int[],
> dow
> > int[], month int[]);
> >  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
> >  (,"{5}",,,) .. run once per hour
> > Comments, notices?
>
> First, please note that I'm definitely not a hacker, just a user.
>
> One comment that I'd like to make, is that since we're at planning
> phase, I think it would be great to add capability to limit number of
> executions of given command.
> This would allow running things like "at" in unix - run once, at given
> time, and that's it.
>

I would not to store state on this level - so "at" should be implemented on
higher level. There is very high number of possible strategies, what can be
done with failed tasks - and I would not to open this topic. I believe with
proposed scheduler, anybody can simply implement what need in PLpgSQL with
dynamic SQL. But on second hand "run once" can be implemented with proposed
API too.


pg_create_scheduled_command('delete obsolete data', '(,,"{1}","{1}",)',
$$DO $_$
  BEGIN
DELETE FROM data WHERE inserted < current_timestamp - interval '1month';
PERFORM pg_update_scheduled_command(scheduled_command_oid(),
max_workers => 0);
  END $_$
$$);

Regards

Pavel




> Best regards,
>
> depesz
>
> --
> The best thing about modern society is how easy it is to avoid contact
> with it.
>
> http://depesz.com/
>

Re: [HACKERS] pgbench -f and vacuum

2015-05-12 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, May 12, 2015 at 12:08 PM, Stephen Frost  wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> >> It's says:
> >>
> >> starting vacuum ERROR: blah
> >> ERROR: blah
> >> ERROR: blah
> >> done
> >>
> >> And then continues on.  Sure, that's not the greatest error reporting
> >> output ever, but what do you expect from pgbench?  I think it's clear
> >> enough what's going on there.  The messages appear in quick
> >> succession, because it doesn't take very long to notice that the table
> >> isn't there, so it's not like you are sitting there going "wait,
> >> what?".
> >>
> >> If we're going to add something, I like your second suggestion
> >> "(ignoring this error and continuing anyway)" more than the first one.
> >> Putting "ignoring:" before the thing you plan to ignore will be
> >> confusing, I think.
> >
> > +1 to adding "(ignoring this error and continuing anyway)" and
> > committing this.
> 
> You want to take care of that?

Sure.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pgbench -f and vacuum

2015-05-12 Thread Robert Haas

On Tue, May 12, 2015 at 12:08 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> It's says:
>>
>> starting vacuum ERROR: blah
>> ERROR: blah
>> ERROR: blah
>> done
>>
>> And then continues on.  Sure, that's not the greatest error reporting
>> output ever, but what do you expect from pgbench?  I think it's clear
>> enough what's going on there.  The messages appear in quick
>> succession, because it doesn't take very long to notice that the table
>> isn't there, so it's not like you are sitting there going "wait,
>> what?".
>>
>> If we're going to add something, I like your second suggestion
>> "(ignoring this error and continuing anyway)" more than the first one.
>> Putting "ignoring:" before the thing you plan to ignore will be
>> confusing, I think.
>
> +1 to adding "(ignoring this error and continuing anyway)" and
> committing this.

You want to take care of that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2015-05-12 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> It's says:
> 
> starting vacuum ERROR: blah
> ERROR: blah
> ERROR: blah
> done
> 
> And then continues on.  Sure, that's not the greatest error reporting
> output ever, but what do you expect from pgbench?  I think it's clear
> enough what's going on there.  The messages appear in quick
> succession, because it doesn't take very long to notice that the table
> isn't there, so it's not like you are sitting there going "wait,
> what?".
> 
> If we're going to add something, I like your second suggestion
> "(ignoring this error and continuing anyway)" more than the first one.
> Putting "ignoring:" before the thing you plan to ignore will be
> confusing, I think.

+1 to adding "(ignoring this error and continuing anyway)" and
committing this.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-05-12 Thread Stephen Frost

Etsuro,

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:
> Here is an updated version.  In this version, the bug has been
> fixed, but any regression tests for that hasn't been added, because
> I'm not sure that the above way is a good idea and don't have any
> other ideas.
> 
> The EXPLAIN output has also been improved as discussed in [1].

While the EXPLAIN output changed, the structure hasn't really changed
from what was discussed previously and there's not been any real
involvment from the core code in what's happening here.

Clearly, the documentation around how to use the FDW API hasn't changed
at all and there's been no additions to it for handling bulk work.
Everything here continues to be done inside of postgres_fdw, which
essentially ignores the prescribed "Update/Delete one tuple" interface
for ExecForeignUpdate/ExecForeignDelete.

I've spent the better part of the past two days trying to reason my way
around that while reviewing this patch and I haven't come out the other
side any happier with this approach than I was back in
20140911153049.gc16...@tamriel.snowman.net.

There are other things that don't look right to me, such as what's going
on at the bottom of push_update_down(), but I don't think there's much
point going into it until we figure out what the core FDW API here
should look like.  It might not be all that far from what we have now,
but I don't think we can just ignore the existing, documented, API.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-12 Thread Robert Haas

On Tue, May 12, 2015 at 9:10 AM, Stephen Frost  wrote:
> ... but, personally, I'd favor
> trying to do something at the protocol level instead.

Me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multixid hindsight design

2015-05-12 Thread Robert Haas

On Mon, May 11, 2015 at 5:20 PM, Heikki Linnakangas  wrote:
> The main problem with the infamous multixid changes was that it made
> pg_multixact a permanent, critical, piece of data. Without it, you cannot
> decipher whether some rows have been deleted or not. The 9.3 changes
> uncovered pre-existing issues with vacuuming and wraparound, but the fact
> that multixids are now critical turned those the otherwise relatively
> harmless bugs into data loss.

Agreed.

> We have pg_clog, which is a similar critical data structure. That's a pain
> too - you need VACUUM and you can't easily move tables from one cluster to
> another for example - but we've learned to live with it. But we certainly
> don't need any more such data structures.

Yes.

> So the lesson here is that having a permanent pg_multixact is not nice, and
> we should get rid of it. Here's how to do that:
>
> Looking at the tuple header, the CID and CTID fields are only needed, when
> either xmin or xmax is running. Almost: in a HOT-updated tuple, CTID is
> required even after xmax has committed, but since it's a HOT update, the new
> tuple is always on the same page so you only need the offsetnumber part.
> That leaves us with 8 bytes that are always available for storing
> "ephemeral" information. By ephemeral, I mean that it is only needed when
> xmin or xmax is in-progress. After that, e.g. after a shutdown, it's never
> looked at.
>
> Let's add a new SLRU, called Tuple Ephemeral Data (TED). It is addressed by
> a 64-bit pointer, which means that it never wraps around. That 64-bit
> pointer is stored in the tuple header, in those 8 ephemeral bytes currently
> used for CID and CTID. Whenever a tuple is deleted/updated and locked at the
> same time, a TED entry is created for it, in the new SLRU, and the pointer
> to the entry is put on the tuple. In the TED entry, we can use as many bytes
> as we need to store the ephemeral data. It would include the CID (or
> possibly both CMIN and CMAX separately, now that we have the space), CTID,
> and the locking XIDs. The list of locking XIDs could be stored there
> directly, replacing multixids completely, or we could store a multixid
> there, and use the current pg_multixact system to decode them. Or we could
> store the multixact offset in the TED, replacing the multixact offset SLRU,
> but keep the multixact member SLRU as is.
>
> The XMAX stored on the tuple header would always be a real transaction ID,
> not a multixid. Hence locked-only tuples don't need to be frozen afterwards.
>
> The beauty of this would be that the TED entries can be zapped at restart,
> just like pg_subtrans, and pg_multixact before 9.3. It doesn't need to be
> WAL-logged, and we are free to change its on-disk layout even in a minor
> release.
>
> Further optimizations are possible. If the TED entry fits in 8 bytes, it can
> be stored directly in the tuple header. Like today, if a tuple is locked but
> not deleted/updated, you only need to store the locker XID, and you can
> store the locking XID directly on the tuple. Or if it's deleted and locked,
> CTID is not needed, only CID and locker XID, so you can store those direcly
> on the tuple. Plus some spare bits to indicate what is stored. And if the
> XMIN is older than global-xmin, you could also steal the XMIN field for
> storing TED data, making it possible to store 12 bytes directly in the tuple
> header. Plus some spare bits again to indicate that you've done that.
>
> Now, given where we are, how do we get there? Upgrade is a pain, because
> even if we no longer generate any new multixids, we'll have to be able to
> decode them after pg_upgrade. Perhaps condense pg_multixact into a simpler
> pg_clog-style bitmap at pg_upgrade, to make it small and simple to read, but
> it would nevertheless be a fair amount of code just to deal with pg_upgraded
> databases.
>
> I think this is worth doing, even after we've fixed all the acute multixid
> bugs, because this would be more robust in the long run. It would also
> remove the need to do anti-wraparound multixid vacuums, and the newly-added
> tuning knobs related to that.

One danger is that in rearranging all of this stuff we may introduce
lots of new bugs.  I do agree that making multixacts need to survive a
server crash was not a good idea.  We liked freezing xmin so much, we
decided to freeze xmax, too?  Uggh.  As painful as that's been,
though, we're 18 months into it at this point.  If we do another
reorganization, are going to end up back at month 0, where pretty much
everybody had corruption all the time rather than only some people on
some workloads?  Maybe not, but it's certainly something to worry
about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "Bugs" CF?

2015-05-12 Thread Stephen Frost

* Magnus Hagander (mag...@hagander.net) wrote:
> On Tue, May 12, 2015 at 3:21 PM, Stephen Frost  wrote:
> > * Magnus Hagander (mag...@hagander.net) wrote:
> > > We could also use the category that we have now, or even create the
> > concept
> > > of a tag (where you can assign multiple ones). And then have a view that
> > > brings together a view of everything with a specific tag/category,
> > > *regardless* of which CF it's on.
> >
> > I like the tag idea.
> 
> Are there other things you would consider tags for as well? As in should we
> if we do this look at a generic tag system, or just a "tag this is a
> bugfix"?

I was thinking a generic system though I'm not sure that the tags should
be free-form..  My first thought is a "release" tag, ie: 9.4.2 or
something.  Having a 'committed' + '9.4.2' view would be kind of neat.

> > The biggest issue that I see with this is, again, trying to make sure
> > people know that they *should* put bug patches into the CF and make it
> > clear *which* CF to put them into.
> >
> > I don't know that we've even got an answer for the second question
> > currently, do we?  My thought is "whatever one people are looking at
> > now" but I'm guessing others feel differently.
> 
> I would say it should go into whatever CF is currently "Open". Don't treat
> thems eparately from a submission POV, other than adding the tag. Only from
> a "consumption" POV, through the special view.

Yeah, but "Open" currently means June.  Perhaps that's not an issue but
it still feels awkward to me.

> > > > Another thought which occured to me, just to throw it out there, was
> > the
> > > > idea of a "Next point release" kind of CF, which is perhaps renamed
> > when
> > > > to whatever the point release actually is and anything which didn't
> > make
> > > > it is bumped to a new CF entry, or something along those lines.
> > >
> > > That doesn't sound like a CF to me. Again, it might be a tag or a
> > category
> > > or something, but the main thing with the CF term is that it's the
> > > fixed-period cycle of development. We shouldn't start abusing that too
> > > much...
> >
> > A tag for this would be great..
> 
> How is that different from a  bugfix, though?

This would be the release tag that I'm suggesting above.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] "Bugs" CF?

2015-05-12 Thread Magnus Hagander

On Tue, May 12, 2015 at 4:08 PM, Alvaro Herrera 
wrote:

> Stephen Frost wrote:
> > Magnus,
> >
> > * Magnus Hagander (mag...@hagander.net) wrote:
> > > We could also use the category that we have now, or even create the
> concept
> > > of a tag (where you can assign multiple ones). And then have a view
> that
> > > brings together a view of everything with a specific tag/category,
> > > *regardless* of which CF it's on.
> >
> > I like the tag idea.
>
> +1
>
> > > So the patch itself would live in whatever CF it was put in (or punted
> to),
> > > but you can get a global view. Which with a filter would make it easy
> to
> > > see "everything flagged as a bugfix that has not been committed".
> Which I
> > > think is the main thing we're looking for here, is it not?
> >
> > The biggest issue that I see with this is, again, trying to make sure
> > people know that they *should* put bug patches into the CF and make it
> > clear *which* CF to put them into.
>
> Maybe the answer to this is not to put it in any CF at all, and just
> expect it to be seen through a tag view.


I think the way things are now, a whole lof ot things could break if a
patch is not on *any* commitfest...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] "Bugs" CF?

2015-05-12 Thread Magnus Hagander

On Tue, May 12, 2015 at 3:21 PM, Stephen Frost  wrote:

> Magnus,
>
> * Magnus Hagander (mag...@hagander.net) wrote:
> > We could also use the category that we have now, or even create the
> concept
> > of a tag (where you can assign multiple ones). And then have a view that
> > brings together a view of everything with a specific tag/category,
> > *regardless* of which CF it's on.
>
> I like the tag idea.
>

Are there other things you would consider tags for as well? As in should we
if we do this look at a generic tag system, or just a "tag this is a
bugfix"?


> So the patch itself would live in whatever CF it was put in (or punted
> to),
> > but you can get a global view. Which with a filter would make it easy to
> > see "everything flagged as a bugfix that has not been committed". Which I
> > think is the main thing we're looking for here, is it not?
>
> The biggest issue that I see with this is, again, trying to make sure
> people know that they *should* put bug patches into the CF and make it
> clear *which* CF to put them into.
>
> I don't know that we've even got an answer for the second question
> currently, do we?  My thought is "whatever one people are looking at
> now" but I'm guessing others feel differently.
>

I would say it should go into whatever CF is currently "Open". Don't treat
thems eparately from a submission POV, other than adding the tag. Only from
a "consumption" POV, through the special view.


>
> > > Another thought which occured to me, just to throw it out there, was
> the
> > > idea of a "Next point release" kind of CF, which is perhaps renamed
> when
> > > to whatever the point release actually is and anything which didn't
> make
> > > it is bumped to a new CF entry, or something along those lines.
> >
> > That doesn't sound like a CF to me. Again, it might be a tag or a
> category
> > or something, but the main thing with the CF term is that it's the
> > fixed-period cycle of development. We shouldn't start abusing that too
> > much...
>
> A tag for this would be great..
>

How is that different from a  bugfix, though?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] "Bugs" CF?

2015-05-12 Thread Alvaro Herrera

Stephen Frost wrote:
> Magnus,
> 
> * Magnus Hagander (mag...@hagander.net) wrote:
> > We could also use the category that we have now, or even create the concept
> > of a tag (where you can assign multiple ones). And then have a view that
> > brings together a view of everything with a specific tag/category,
> > *regardless* of which CF it's on.
> 
> I like the tag idea.

+1

> > So the patch itself would live in whatever CF it was put in (or punted to),
> > but you can get a global view. Which with a filter would make it easy to
> > see "everything flagged as a bugfix that has not been committed". Which I
> > think is the main thing we're looking for here, is it not?
> 
> The biggest issue that I see with this is, again, trying to make sure
> people know that they *should* put bug patches into the CF and make it
> clear *which* CF to put them into.

Maybe the answer to this is not to put it in any CF at all, and just
expect it to be seen through a tag view.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Back-branch update releases planned for next week

2015-05-12 Thread Tom Lane

In view of the multixactid wraparound issues that have gotten fixed over
the last week or two, it's time to put out some minor releases.  After
some discussion among core and the packagers list, we concluded that we
should do it next week (before the Memorial Day holiday).  As per usual
timing, we'll wrap tarballs Monday the 18th for public announcement
Thursday the 21st.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-12 Thread Amit Kapila

On Tue, May 12, 2015 at 6:37 PM, Andrew Dunstan  wrote:
>
>
> On 05/12/2015 08:35 AM, Andrew Dunstan wrote:
>>
>>
>> Yes, sorry, I had a moment of brain fade yesterday. However, I think
we're a bit under-documented on the pg_basebackup page, regarding both tar
mode and tablespace_map (which isn't even mentioned).
>>
>> And there is this which I noticed this morning: the --tablespace-mapping
feature of pg_basebackup seems to be quite broken in --format=tar mode - it
simply has no effect AFAICT. I assume it was broken before, but we should
either fix it (possibly hard) or disallow the combination (which would be a
pity).
>>
>> I'm going to go ahead and commit this in the state I have it now,
because for the most part these are preexisting deficiencies.
>>
>>
>
>
> One more thing: I think pg_basebackup will now not work in tar mode with
pre-9.5 servers, since it will be using an unrecognized option of the
BASE_BACKUP protocol command. If so that certainly needs to be documented
and release noted.
>

Yes, thats true and I have added the same in docs, updated
patch attached.

As a side note, I think we should have added the same for
--max-rate= option introduced in 9.4.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


backup_tablespace_fix-ad-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "Bugs" CF?

2015-05-12 Thread Stephen Frost

Magnus,

* Magnus Hagander (mag...@hagander.net) wrote:
> We could also use the category that we have now, or even create the concept
> of a tag (where you can assign multiple ones). And then have a view that
> brings together a view of everything with a specific tag/category,
> *regardless* of which CF it's on.

I like the tag idea.

> So the patch itself would live in whatever CF it was put in (or punted to),
> but you can get a global view. Which with a filter would make it easy to
> see "everything flagged as a bugfix that has not been committed". Which I
> think is the main thing we're looking for here, is it not?

The biggest issue that I see with this is, again, trying to make sure
people know that they *should* put bug patches into the CF and make it
clear *which* CF to put them into.

I don't know that we've even got an answer for the second question
currently, do we?  My thought is "whatever one people are looking at
now" but I'm guessing others feel differently.

> > Another thought which occured to me, just to throw it out there, was the
> > idea of a "Next point release" kind of CF, which is perhaps renamed when
> > to whatever the point release actually is and anything which didn't make
> > it is bumped to a new CF entry, or something along those lines.
> 
> That doesn't sound like a CF to me. Again, it might be a tag or a category
> or something, but the main thing with the CF term is that it's the
> fixed-period cycle of development. We shouldn't start abusing that too
> much...

A tag for this would be great..

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-12 Thread Stephen Frost

Craig,

All very interesting, but, to be honest, I don't really have time this
week to chat about it. :(  Apologies for that.  A couple comments below.

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> For some time I've wanted a way to "SET SESSION AUTHORISATION" or "SET
> ROLE" in a way that cannot simply be RESET, so that a connection may be
> handed to a less-trusted service or application to do some work with.
> 
> This is most useful for connection pools, where it's currently necessary to
> have a per-user pool, to trust users not to do anything naughty, or to
> filter the functions and commands they can run through a whitelist to stop
> them trying to escalate rights to the pooler user.
> 
> In the short term I'd like to:
> 
> * Add a WITH COOKIE option to "SET SESSION AUTHORIZATION", which takes an
> app-generated code (much like PREPARE TRANSACTION does).
> 
> * Make DISCARD ALL, RESET SESSION AUTHORIZATION, etc, also take WITH
> COOKIE. If session authorization was originally set with a cookie value,
> the same cookie value must be passed or an ERROR will be raised when RESET
> is attempted.
> 
> * A second SET SESSION AUTHORIZATION without a prior RESET would be
> rejected with an ERROR if the first SET used a cookie value.
> 
> This can be done without protocol-level changes and with no backward
> compatibility impact to existing applications. Any objections?

I don't particularly object but I'm not entirely sure that it's that
simple to clear out the cookie value- look at the previous discussion
about ALTER ROLE / password resets and trying to keep them from ending
up in the logs.  Perhaps people will feel differently about this since
it's expected to be programatically used, but, personally, I'd favor
trying to do something at the protocol level instead.

> These commands will appear in the logs if log_statement = 'all', but the
> codes are transient cookie values, not passwords. They'll be visible in
> pg_stat_activity but only to the privileged user. It'll probably be
> necessary to clear the last command string when executing SET SESSION
> AUTHORIZATION so the new user can't snoop the cookie value from a
> concurrent connection, but that should be simple enough.

Is there a reason why they would need to be visible to the privileged
user?  Just thinking about it from the perspective of it being a
protocol change, if it's done that way, it wouldn't be in the SQL
string; trying to understand if that's an actual problem.

> As is currently the case, poolers will still have to use a superuser
> connection if they want to pool across users.

That just needs to be fixed. :(  Having poolers connect as superuser is
*not* ok..

> In the longer term I want to add a protocol-level equivalent that lets a
> session switch session authorization or role, for efficiency and log-spam
> reasons.

I'm all about this and have discussed it a few times before with people.
This is a much better approach, imv, than what you're suggesting above.
Is there a reason why we would need to support both?  Clearly this is
all post-9.5 work and seems like it might be reasonable to have
completed for 9.6.  Have you thought much about how Heikki's work on
SCRAM might play into this?

> I'm also interested in a way to allow SET SESSION AUTHORIZATION to a list
> of permitted roles when run as a non-superuser, for connection pool use.
> SET ROLE might do, but it's more visible to apps, wheras SET SESSION
> AUTHORIZATION really makes the connection appear to "become" the target
> user.

I guess I'm a bit confused at why SET ROLE being visible to apps is a
problem..?

> That's later though - first,

Hit send too quickly?

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-12 Thread Andrew Dunstan



On 05/12/2015 08:35 AM, Andrew Dunstan wrote:


Yes, sorry, I had a moment of brain fade yesterday. However, I think 
we're a bit under-documented on the pg_basebackup page, regarding both 
tar mode and tablespace_map (which isn't even mentioned).


And there is this which I noticed this morning: the 
--tablespace-mapping feature of pg_basebackup seems to be quite broken 
in --format=tar mode - it simply has no effect AFAICT. I assume it was 
broken before, but we should either fix it (possibly hard) or disallow 
the combination (which would be a pity).


I'm going to go ahead and commit this in the state I have it now, 
because for the most part these are preexisting deficiencies.






One more thing: I think pg_basebackup will now not work in tar mode with 
pre-9.5 servers, since it will be using an unrecognized option of the 
BASE_BACKUP protocol command. If so that certainly needs to be 
documented and release noted.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-xacts and our process problem

2015-05-12 Thread Robert Haas

On Tue, May 12, 2015 at 3:12 AM, Peter Geoghegan  wrote:
> On Mon, May 11, 2015 at 11:42 PM, Noah Misch  wrote:
>> it came out that most people had identified fklocks as the highest-risk 9.3
>> patch.  Here's an idea.  Shortly after the 9.5 release notes draft, let's 
>> take
>> a secret ballot to identify the changes threatening the most damage through
>> undiscovered bugs.  (Let's say the electorate consists of every committer and
>> every person who reviewed at least one patch during the release cycle.)
>> Publish the three top vote totals.  This serves a few purposes.  It sends a
>> message to each original committer that the community doubts his handling of
>> the change.  The secret ballot helps voters be honest, and seven votes 
>> against
>> your commit is hard to ignore.  It's a hint to that committer to drum up more
>> reviews and testing, to pick a simpler project next time, or even to revert.
>> The poll results would also help target beta testing and post-commit reviews.
>> For example, I would plan to complete a full post-commit review of one patch
>> in the list.
>
> The highest risk item identified for 9.4 was the B-Tree bug fix
> patches, IIRC. It was certainly mentioned this time last year as the
> most likely candidate (during the 2014 developer meeting). I'm
> suspicious of this kind of ballot. While 9.4 has not been out for that
> long, evidence that that B-Tree stuff is in any way destabilizing is
> still thin on the ground, a year later.
>
> Anyone that identified fklocks as the highest risk 9.3 item shouldn't
> be too proud of their correct prediction. If you just look at the
> release notes, it's completely obvious, even to someone who doesn't
> know what a MultiXact is.

I think that's rather facile, and I really don't see how you would
know that from looking at those release notes.  I thought multixacts
had risk, but obviously nobody came close to predicting how bad things
were going to be.  If they had, I'm pretty sure we would have pulled
the patch.  The fact that the 9.4 btree changes weren't equally
destabilizing doesn't mean that they weren't risky.  There was a risk
that the Cuban missile crisis would start a nuclear war; in the end,
it didn't, but that doesn't mean there was no risk.

Part of what went wrong with multixacts is neither Alvaro nor anyone
who reviewed the patch gave adequate thought to the vacuum
requirements.  There was a whole series of things that needed to be
done there which just weren't done.  I think if it had been realized
how much work remained to do there, and how necessary it was for every
single bit of machinery that we have for freezing xmin to also exist
for freezing xmax, we would not have gone forward.  Conceptual
failures, where there is a whole class of work that you just don't
even realize needs to be done, are much more damaging than mechanical
errors, where you realize that something needs to be done but you
don't do it correctly.

As an example, take Tom's patch to speed up the parameter setup hooks
for PL/pgsql.  Here's his initial analysis:

http://www.postgresql.org/message-id/4146.1425872...@sss.pgh.pa.us

Then a lot of arguing about non-technical points ensued, followed
eventually by this:

http://www.postgresql.org/message-id/25506.1426029...@sss.pgh.pa.us

We all have ideas like that - things that initially seem like good
ideas, but there's some crucial conceptual point that we're missing
that means that the patch doesn't just need bug fixes; but rather the
whole idea needs to be reconsidered.  If we find that out before
release, we can pull the whole thing back.  If we find it out after
release, things get a lot harder.  In the case of multixacts, we
didn't realize that we'd overlooked significant pieces of work until
after the thing was shipped.

Another crucial difference between the multixact patch and many other
patches is that it wasn't a feature you could turn off.  For example,
if BRIN has bugs, you can almost certainly avoid hitting them by not
using BRIN.  And many people won't, so even if the feature turns out
to be horrifically buggy, 90%+ of our users will not even notice.
ALTER TABLE .. SET LOGGED/UNLOGGED may easily have bugs that eat your
data, but if you don't use it, then you won't be affected.  Of the
major user-visible features committed to 9.5 that could hose our users
more broadly, I'd put RLS and UPSERT pretty high on the list.  We
might be lucky enough that any breakage there is confined to users of
those features, but the code is not as contained as it is for
something like BRIN, so there is a risk of breaking other stuff.
Departing from what's user-visible, Heikki's WAL format changes could
break recovery badly for everyone and we could just be screwed.  That
risk is particularly acute because we really can't change the WAL
format once the release is shipped.  If it's broken, we're probably in
big trouble.  Multixacts, too, fell into this category of things that
cannot be turned off: they touched the heap

Re: [HACKERS] Multi-xacts and our process problem

2015-05-12 Thread Bruce Momjian

On Tue, May 12, 2015 at 02:42:16AM -0400, Noah Misch wrote:
> > > The "revert or rework" ship had already sailed at that point. I
> > 
> > True.
> > 
> > > don't think we had much choice than just soldier through the bugs
> > > after the release.
> > 
> > The problem is we "soldiered on" without adding any resources to the
> > problem or doing a systematic review once it became clear one was
> > necessary.
> 
> In both this latest emergency and the Nov-Dec 2013 one, several people showed
> up and helped.  We did well in that respect, but your idea that we should have
> started a systematic post-commit review is a good one.  Hoping other parts of
> the change were fine, the first team dispersed.

Yes, Alvaro faithfully addressed each bug that was reported and applied
a fix, and then we waited for the next reported bug, which happened
repeatedly.  What didn't happen is someone realizing that we had a
situation that required a _different_ approach.  We just stuck to the
approach that had always worked in the past and never revisited it.

Some people think we need a better plan, but a larger issue is that we
need to recognize when our existing plan _isn't_ working and do
something different.  No matter what plan or procedure we create, there
are going to be cases where it doesn't work, and if everyone is too busy
to recognize that our plan isn't working, these mistakes will be
repeated.  Our existing plan had worked so well for so many years that
it took very long for us to recognize we needed a new approach in this
case.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-12 Thread Andrew Dunstan

On 05/11/2015 11:01 PM, Amit Kapila wrote:

On Tue, May 12, 2015 at 5:50 AM, Andrew Dunstan > wrote:

>
>
> On 05/11/2015 02:02 AM, Amit Kapila wrote:
>>
>> On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan 
mailto:and...@dunslane.net> 
>> wrote:

>> >
>> >
>> >
>> > This generally looks good, but I have a couple of questions 
before I commit it.

>> >
>> > First, why is the new option for the  BASE_BACKUP command of the 
Streaming Replication protcol "TAR"? It seems rather misleading. 
Shouldn't it be something like "TABLESPACEMAP"?

>> >
>>
>> The reason to keep new option's name as TAR was that tablespace_map
>> was generated for that format type, but I agree with you that something
>> like "TABLESPACEMAP" suits better, so I have changed it to
>> "TABLESPACE_MAP".  Putting '_' in name makes it somewhat consistent
>> with other names and filename it generates with this new option.
>>
>>
>> > Second, these lines in xlog.c seem wrong:
>> >
>> > else if ((ch == '\n' || ch == '\r') && prev_ch == '\\')
>> > str[i-1] = '\n';
>> >
>> > It looks to me like we should be putting ch in the string, not 
arbitrarily transforming \r into \n.

>> >
>>
>> You are right, I have changed it as per your suggestion.
>>
>>
>
>
> OK, I have cleaned this up a bit - I had already started so I didn't 
take your latest patch but instead applied relevant changes to my 
changeset. Here is my latest version.

>
> In testing I notice that now "pg_baseback -F t" leaves it completely 
up to the user on all platforms to create the relevant links in 
pg_tblspc/. It includes the tablespace_map file in base.tar, but 
that's really just informational.

>

Sorry, but I am not able to follow your point.   User don't need to create
the relevant links, they will get created during first startup (during 
recovery)

from the backup.  I have tested and it works both on Windows and Linux.

Refer below code of patch in xlog.c

+
+ /* read the tablespace_map file if present and create symlinks. */
+ if (read_tablespace_map(&tablespaces))
+ {
..

> I think we need to add something to the pg_basebackup docs about 
that, at the very least (and it will also need to be a release note item.)

>

Currently, below lines in patch suggest that this file is required for 
recovery.

Do you expect more information to be added?

+These files are not merely for your information; their presence and
+contents are critical to the proper operation of the system's 
recovery

+process.

Yes, sorry, I had a moment of brain fade yesterday. However, I think 
we're a bit under-documented on the pg_basebackup page, regarding both 
tar mode and tablespace_map (which isn't even mentioned).

And there is this which I noticed this morning: the --tablespace-mapping 
feature of pg_basebackup seems to be quite broken in --format=tar mode - 
it simply has no effect AFAICT. I assume it was broken before, but we 
should either fix it (possibly hard) or disallow the combination (which 
would be a pity).

I'm going to go ahead and commit this in the state I have it now, 
because for the most part these are preexisting deficiencies.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

2015-05-12 Thread Heikki Linnakangas


On 05/12/2015 03:27 AM, digoal zhou wrote:

PostgreSQL (<=9.4) trend to smooth buffer write smooth in a
checkpoint_completion_target (checkpoint_timeout or checkpoint_segments),
but when we use synchronous_commit=off, there is a little problem for
the checkpoint_segments
target, because xlog write fast(for full page write which the first page
write after checkpoint), so checkpointer cann't sleep and write buffer not
smooth.
...
I think we can add an condition to the IsCheckpointOnSchedule,
 if (synchronous_commit != SYNCHRONOUS_COMMIT_OFF)
 {
 recptr = GetInsertRecPtr();
 elapsed_xlogs = (((double) (recptr -
ckpt_start_recptr)) / XLogSegSize) / CheckPointSegments;

 if (progress < elapsed_xlogs)
 {
 ckpt_cached_elapsed = elapsed_xlogs;
 return false;
 }
  }


This has nothing to do with asynchronous_commit, except that setting 
asynchronous_commit=off makes your test case run faster, and hit the 
problem harder.


I think the real problem here is that IsCheckpointOnSchedule assumes 
that the rate of WAL generated is constant throughout the checkpoint 
cycle, but in reality you generate a lot more WAL immediately after the 
checkpoint begins, thanks to full_page_writes. For example, in the 
beginning of the cycle, you quickly use up, say, 20% of the WAL space in 
the first 10 seconds, and the scheduling thinks it's in a lot of hurry 
to finish the checkpoint because it extrapolates that the rest of the 
WAL will be used up in the next 40 seconds. But in reality, the WAL 
consumption levels off, and you have many minutes left until 
CheckPointSegments.


Can you try the attached patch? It modifies the above calculation to 
take the full-page-write effect into account. I used X^1.5 as the 
corrective function, which roughly reflects the typical WAL consumption 
pattern. You can adjust the exponent, 1.5, to make the correction more 
or less aggressive.


- Heikki

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 0dce6a8..fb02f56 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -763,6 +763,19 @@ IsCheckpointOnSchedule(double progress)
 		recptr = GetInsertRecPtr();
 		elapsed_xlogs = (((double) (recptr - ckpt_start_recptr)) / XLogSegSize) / CheckPointSegments;
 
+		/*
+		 * Immediately after a checkpoint, a lot more WAL is generated when
+		 * full_page_write is enabled, because every WAL record has to include
+		 * a full image of the modified page. It levels off as time passes and
+		 * more updates fall on pages that have already been modified since
+		 * the last checkpoint.
+		 *
+		 * To correct for that effect, apply a corrective factor on the
+		 * amount of WAL consumed so far.
+		 */
+		if (fullPageWrites)
+			elapsed_xlogs = pow(elapsed_xlogs, 1.5);
+
 		if (progress < elapsed_xlogs)
 		{
 			ckpt_cached_elapsed = elapsed_xlogs;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK TABLE Permissions

2015-05-12 Thread Stephen Frost

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Tue, May 12, 2015 at 4:53 AM, Stephen Frost wrote:
> > Yeah, it might not be bad to have tests for all the different lock types
> > and make sure that the permissions match up.  I'd probably put those
> > tests into 'permissions.sql' instead though.
> 
> You mean privileges.sql, right? I just wrote a patch with that. I'll
> create a new thread with the patch.

Er, yes.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] "Bugs" CF?

2015-05-12 Thread Magnus Hagander

On Fri, May 8, 2015 at 5:24 AM, Stephen Frost  wrote:

> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Now, if we can easily shove stuff back and forth between the main queue
> > and the bug-fix queue, maybe that's not such a big deal.  But it doesn't
> > seem like the current CF app is really amenable to that; doesn't it for
> > example want to leave an entry behind when you move something to another
> > CF?
>
> I agree that we'd need to have that capability if we are to go down this
> road, but I won't pretend to have any idea about the level of
> difficulty.  If we're agreed that this is at least reasonable to explore
> then we really need to ask Magnus to weigh in when it comes to the
> implementation side of things.
>

The technical side can certainly be handled. It would require some
special-casing that might make it a bit uglier, but it's certainly doable.

We could also use the category that we have now, or even create the concept
of a tag (where you can assign multiple ones). And then have a view that
brings together a view of everything with a specific tag/category,
*regardless* of which CF it's on.

So the patch itself would live in whatever CF it was put in (or punted to),
but you can get a global view. Which with a filter would make it easy to
see "everything flagged as a bugfix that has not been committed". Which I
think is the main thing we're looking for here, is it not?

Apart from being cleaner from the code perspective, I think that might make
it more clean from a user perspective as well. It would also mean that a
patch can easily be re-categorised in and out of being a bugfix quite
simply, and just get a history entry when that happens (and not a
bumped-between-cf entry).

Another thought which occured to me, just to throw it out there, was the
> idea of a "Next point release" kind of CF, which is perhaps renamed when
> to whatever the point release actually is and anything which didn't make
> it is bumped to a new CF entry, or something along those lines.
>

That doesn't sound like a CF to me. Again, it might be a tag or a category
or something, but the main thing with the CF term is that it's the
fixed-period cycle of development. We shouldn't start abusing that too
much...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread hubert depesz lubaczewski

On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
>  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>  (,"{5}",,,) .. run once per hour
> Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Pavel Stehule

2015-05-12 10:45 GMT+02:00 Dave Page :

> On Tue, May 12, 2015 at 10:25 AM, Pavel Stehule 
> wrote:
> > Generic simple scheduler to contrib
> > ===
> > Job schedulers are important and sometimes very complex part of any
> > software. PostgreSQL miss it. I propose new contrib module, that can be
> used
> > simply for some tasks, and that can be used as base for other more richer
> > schedulers. I prefer minimalist design - but strong enough for enhancing
> > when it is necessary. Some complex logic can be implemented in PL better
> > than in C. Motto: Simply to learn, simply to use, simply to customize.
> >
> > Motivation
> > --
> > Possibility to simplify administration of repeated tasks. Possibility to
> > write complex schedulers in PL/pgSQL or other PL.
> >
> > Design
> > --
> > Any scheduled command will be executed in independent worker. The number
> > workers for one command can be limited. Total number of workers will be
> > limited. Any command will be executed under specified user with known
> > timeout in current database. Next step can be implementation global
> > scheduler - but we have not a environment for running server side global
> > scripts, so I don't think about it in this moment.
> >
> > This scheduler does not guarantee number of executions. Without available
> > workers the execution will be suspended, after crash the execution can be
> > repeated. But it can be solved in upper layer if it is necessary. It is
> not
> > designed as realtime system. Scheduled task will be executed immediately
> > when related worker will be free, but the execution window is limited to
> > next start.
> >
> > This design don't try to solve mechanism for repeating tasks when tasks
> hash
> > a crash. This can be solved better in PL on custom layer when it is
> > necessary.
> >
> > Scheduled time is stored to type scheduled_time:
> >
> > create type scheduled_time as (second int[], minute int[], hour int[],
> dow
> > int[], month int[]);
> >
> >  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
> >  (,"{5}",,,) .. run once per hour
> >
> > The core is table pg_scheduled_commands
> >
> > Oid:   1
> > name:
> > user:  pavel
> > stime:(,"{5}",,,)
> > max_workers: 1
> > timeout:  10s
> > command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())
> >
> >
> > set timeout to 0 ~ unlimited, -1 default statement_timeout
> > set max_workers to 0 ~ disable tasks
> >
> > API
> > ---
> > pg_create_scheduled_command(name,
> >  stime,
> >  command,
> >  user default current_user,
> >  max_workers default 1,
> >  timeout default -1);
> >
> > pg_drop_scheduled_command(oid)
> > pg_drop_scheduled_command(name);
> >
> > pg_update_scheduled_command(oid | name, ...
> >
> > Usage:
> > --
> > pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)',
> $$DELETE
> > FROM data WHERE inserted < current_timestamp - interval '1month'$$);
> > pg_update_scheduled_command('delete obsolete data', max_workers => 2,
> > timeout :=> '1h');
> > pg_drop_scheduled_command('delete obsolete data');
> >
> > select * from pg_scheduled_commands;
> >
> >
> > Comments, notices?
>
> It's not integrated with the server (though it is integrated with
> pgAdmin), but pgAgent provides scheduling services for PostgreSQL
> already, offering multi-schedule, multi-step job execution.
>
> http://www.pgadmin.org/docs/1.20/pgagent.html
>

I know pgagent - the proposal is about more deeper integration with core -
based on background workers without any other dependency.

Regards

Pavel


>
> --
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Dave Page

On Tue, May 12, 2015 at 10:25 AM, Pavel Stehule  wrote:
> Generic simple scheduler to contrib
> ===
> Job schedulers are important and sometimes very complex part of any
> software. PostgreSQL miss it. I propose new contrib module, that can be used
> simply for some tasks, and that can be used as base for other more richer
> schedulers. I prefer minimalist design - but strong enough for enhancing
> when it is necessary. Some complex logic can be implemented in PL better
> than in C. Motto: Simply to learn, simply to use, simply to customize.
>
> Motivation
> --
> Possibility to simplify administration of repeated tasks. Possibility to
> write complex schedulers in PL/pgSQL or other PL.
>
> Design
> --
> Any scheduled command will be executed in independent worker. The number
> workers for one command can be limited. Total number of workers will be
> limited. Any command will be executed under specified user with known
> timeout in current database. Next step can be implementation global
> scheduler - but we have not a environment for running server side global
> scripts, so I don't think about it in this moment.
>
> This scheduler does not guarantee number of executions. Without available
> workers the execution will be suspended, after crash the execution can be
> repeated. But it can be solved in upper layer if it is necessary. It is not
> designed as realtime system. Scheduled task will be executed immediately
> when related worker will be free, but the execution window is limited to
> next start.
>
> This design don't try to solve mechanism for repeating tasks when tasks hash
> a crash. This can be solved better in PL on custom layer when it is
> necessary.
>
> Scheduled time is stored to type scheduled_time:
>
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
>
>  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>  (,"{5}",,,) .. run once per hour
>
> The core is table pg_scheduled_commands
>
> Oid:   1
> name:
> user:  pavel
> stime:(,"{5}",,,)
> max_workers: 1
> timeout:  10s
> command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())
>
>
> set timeout to 0 ~ unlimited, -1 default statement_timeout
> set max_workers to 0 ~ disable tasks
>
> API
> ---
> pg_create_scheduled_command(name,
>  stime,
>  command,
>  user default current_user,
>  max_workers default 1,
>  timeout default -1);
>
> pg_drop_scheduled_command(oid)
> pg_drop_scheduled_command(name);
>
> pg_update_scheduled_command(oid | name, ...
>
> Usage:
> --
> pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)', $$DELETE
> FROM data WHERE inserted < current_timestamp - interval '1month'$$);
> pg_update_scheduled_command('delete obsolete data', max_workers => 2,
> timeout :=> '1h');
> pg_drop_scheduled_command('delete obsolete data');
>
> select * from pg_scheduled_commands;
>
>
> Comments, notices?

It's not integrated with the server (though it is integrated with
pgAdmin), but pgAgent provides scheduling services for PostgreSQL
already, offering multi-schedule, multi-step job execution.

http://www.pgadmin.org/docs/1.20/pgagent.html

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to improve a few appendStringInfo* calls

2015-05-12 Thread David Rowley

On 12 May 2015 at 12:57, Peter Eisentraut  wrote:

> On 4/11/15 6:25 AM, David Rowley wrote:
> > I've attached a small patch which just fixes a few appendStringInfo*
> > calls that are not quite doing things the way that it was intended.
>
> done
>
>
Thank you for pushing.

Shortly after I sent the previous patch I did a few more searches and also
found some more things that are not quite right.
Most of these are to use the binary append method when the length of the
string is already known.
There's also some fixes in there for appendPQExpBufferStr too.

I've re-based the patch and attached here.

Regards

David Rowley

appendStringInfo_fixes_2015-05-12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multixid hindsight design

2015-05-12 Thread Heikki Linnakangas


On 05/12/2015 01:51 AM, Tom Lane wrote:

Heikki Linnakangas  writes:

So the lesson here is that having a permanent pg_multixact is not nice,
and we should get rid of it. Here's how to do that:


That would be cool, but ...


Looking at the tuple header, the CID and CTID fields are only needed,
when either xmin or xmax is running. Almost: in a HOT-updated tuple,
CTID is required even after xmax has committed, but since it's a HOT
update, the new tuple is always on the same page so you only need the
offsetnumber part.


I think this is totally wrong.  HOT update or not, you need the forward
link represented by ctid not just until xmin/xmax commit, but until
they're in the past according to all active snapshots.  That's so that
other transactions can chain forward to the "current" version of a tuple
which they found according to their snapshots.

It might be you can salvage the idea anyway, since it's still true that
the forward links wouldn't be needed after a crash and restart.  But the
argument as stated is wrong.


Ah yes, I stated that wrong. What I meant was that they are not needed 
after xmin and xmax are older than global xmin.



(There's also the question of forensic requirements, although I'm aware
that it's barely worth bringing that up since nobody else here seems to
put much weight on that.)


I do care about that. In this scheme, you would always have the 
updater/deleter XMAX on the tuple itself, which IMO is more useful for 
forensic purposes than a multixid. You lose the CID and CTID in the 
tuple (for tuples that are updated and locked at the same time), but if 
you keep the TED around longer, you have all the information still 
there. On the whole, I don't think this is much worse than the current 
situation.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: contrib module - generic command scheduler

2015-05-12 Thread Pavel Stehule

Generic simple scheduler to contrib
===
Job schedulers are important and sometimes very complex part of any
software. PostgreSQL miss it. I propose new contrib module, that can be
used simply for some tasks, and that can be used as base for other more
richer schedulers. I prefer minimalist design - but strong enough for
enhancing when it is necessary. Some complex logic can be implemented in PL
better than in C. Motto: Simply to learn, simply to use, simply to
customize.

Motivation
--
Possibility to simplify administration of repeated tasks. Possibility to
write complex schedulers in PL/pgSQL or other PL.

Design
--
Any scheduled command will be executed in independent worker. The number
workers for one command can be limited. Total number of workers will be
limited. Any command will be executed under specified user with known
timeout in current database. Next step can be implementation global
scheduler - but we have not a environment for running server side global
scripts, so I don't think about it in this moment.

This scheduler does not guarantee number of executions. Without available
workers the execution will be suspended, after crash the execution can be
repeated. But it can be solved in upper layer if it is necessary. It is not
designed as realtime system. Scheduled task will be executed immediately
when related worker will be free, but the execution window is limited to
next start.

This design don't try to solve mechanism for repeating tasks when tasks
hash a crash. This can be solved better in PL on custom layer when it is
necessary.

Scheduled time is stored to type scheduled_time:

create type scheduled_time as (second int[], minute int[], hour int[], dow
int[], month int[]);

 (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
 (,"{5}",,,) .. run once per hour

The core is table pg_scheduled_commands

Oid:   1
name:
user:  pavel
stime:(,"{5}",,,)
max_workers: 1
timeout:  10s
command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())


set timeout to 0 ~ unlimited, -1 default statement_timeout
set max_workers to 0 ~ disable tasks

API
---
pg_create_scheduled_command(name,
 stime,
 command,
 user default current_user,
 max_workers default 1,
 timeout default -1);

pg_drop_scheduled_command(oid)
pg_drop_scheduled_command(name);

pg_update_scheduled_command(oid | name, ...

Usage:
--
pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)', $$DELETE
FROM data WHERE inserted < current_timestamp - interval '1month'$$);
pg_update_scheduled_command('delete obsolete data', max_workers => 2,
timeout :=> '1h');
pg_drop_scheduled_command('delete obsolete data');

select * from pg_scheduled_commands;


Comments, notices?

Regards

Pavel

Re: [HACKERS] Multi-xacts and our process problem

2015-05-12 Thread Peter Geoghegan

On Mon, May 11, 2015 at 11:42 PM, Noah Misch  wrote:
> it came out that most people had identified fklocks as the highest-risk 9.3
> patch.  Here's an idea.  Shortly after the 9.5 release notes draft, let's take
> a secret ballot to identify the changes threatening the most damage through
> undiscovered bugs.  (Let's say the electorate consists of every committer and
> every person who reviewed at least one patch during the release cycle.)
> Publish the three top vote totals.  This serves a few purposes.  It sends a
> message to each original committer that the community doubts his handling of
> the change.  The secret ballot helps voters be honest, and seven votes against
> your commit is hard to ignore.  It's a hint to that committer to drum up more
> reviews and testing, to pick a simpler project next time, or even to revert.
> The poll results would also help target beta testing and post-commit reviews.
> For example, I would plan to complete a full post-commit review of one patch
> in the list.

The highest risk item identified for 9.4 was the B-Tree bug fix
patches, IIRC. It was certainly mentioned this time last year as the
most likely candidate (during the 2014 developer meeting). I'm
suspicious of this kind of ballot. While 9.4 has not been out for that
long, evidence that that B-Tree stuff is in any way destabilizing is
still thin on the ground, a year later.

Anyone that identified fklocks as the highest risk 9.3 item shouldn't
be too proud of their correct prediction. If you just look at the
release notes, it's completely obvious, even to someone who doesn't
know what a MultiXact is.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

67 matches

Mail list logo