date:20141205

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila

On Sat, Dec 6, 2014 at 10:43 AM, David Rowley  wrote:
> On 4 December 2014 at 19:35, Amit Kapila  wrote:
>>
>> Attached patch is just to facilitate the discussion about the
>> parallel seq scan and may be some other dependent tasks like
>> sharing of various states like combocid, snapshot with parallel
>> workers.  It is by no means ready to do any complex test, ofcourse
>> I will work towards making it more robust both in terms of adding
>> more stuff and doing performance optimizations.
>>
>> Thoughts/Suggestions?
>>
>
> This is good news!

Thanks.

> I've not gotten to look at the patch yet, but I thought you may be able
to make use of the attached at some point.
>

I also think so, that it can be used in near future to enhance
and provide more value to the parallel scan feature.  Thanks
for taking the initiative to do the leg-work for supporting
aggregates.

> It's bare-bones core support for allowing aggregate states to be merged
together with another aggregate state. I would imagine that if a query such
as:
>
> SELECT MAX(value) FROM bigtable;
>
> was run, then a series of parallel workers could go off and each find the
max value from their portion of the table and then perhaps some other node
type would then take all the intermediate results from the workers, once
they're finished, and join all of the aggregate states into one and return
that. Naturally, you'd need to check that all aggregates used in the
targetlist had a merge function first.
>

Direction sounds to be right.

> This is just a few hours of work. I've not really tested the pg_dump
support or anything yet. I've also not added any new functions to allow
AVG() or COUNT() to work, I've really just re-used existing functions where
I could, as things like MAX() and BOOL_OR() can just make use of the
existing transition function. I thought that this might be enough for early
tests.
>
> I'd imagine such a workload, ignoring IO overhead, should scale pretty
much linearly with the number of worker processes. Of course, if there was
a GROUP BY clause then the merger code would have to perform more work.
>

Agreed.

> If you think you might be able to make use of this, then I'm willing to
go off and write all the other merge functions required for the other
aggregates.
>

Don't you think that first we should stabilize the basic (target list
and quals that can be independently evaluated by workers) parallel
scan and then jump to do such enhancements?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila

On Sat, Dec 6, 2014 at 12:27 AM, Jim Nasby  wrote:
> On 12/5/14, 9:08 AM, José Luis Tallón wrote:
>>
>>
>> More over, when load goes up, the relative cost of parallel working
should go up as well.
>> Something like:
>>  p = number of cores
>>  l = 1min-load
>>
>>  additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
>>
>> (for c>1, of course)
>
>
> ...
>
>> The parallel seq scan nodes are definitively the best approach for
"parallel query", since the planner can optimize them based on cost.
>> I'm wondering about the ability to modify the implementation of some
methods themselves once at execution time: given a previously planned
query, chances are that, at execution time (I'm specifically thinking about
prepared statements here), a different implementation of the same "node"
might be more suitable and could be used instead while the condition holds.
>
>
> These comments got me wondering... would it be better to decide on
parallelism during execution instead of at plan time? That would allow us
to dynamically scale parallelism based on system load. If we don't even
consider parallelism until we've pulled some number of tuples/pages from a
relation,
>

>this would also eliminate all parallel overhead on small relations.
> --

I think we have access to this information in planner (RelOptInfo -> pages),
if we want, we can use that to eliminate the small relations from
parallelism, but question is how big relations do we want to consider
for parallelism, one way is to check via tests which I am planning to
follow, do you think we have any heuristic which we can use to decide
how big relations should be consider for parallelism?




With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila

On Fri, Dec 5, 2014 at 8:43 PM, Stephen Frost  wrote:
>
> José,
>
> * José Luis Tallón (jltal...@adv-solutions.net) wrote:
> > On 12/04/2014 07:35 AM, Amit Kapila wrote:
> > >The number of worker backends that can be used for
> > >parallel seq scan can be configured by using a new GUC
> > >parallel_seqscan_degree, the default value of which is zero
> > >and it means parallel seq scan will not be considered unless
> > >user configures this value.
> >
> > The number of parallel workers should be capped (of course!) at the
> > maximum amount of "processors" (cores/vCores, threads/hyperthreads)
> > available.
> >
> > More over, when load goes up, the relative cost of parallel working
> > should go up as well.
> > Something like:
> > p = number of cores
> > l = 1min-load
> >
> > additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
> >
> > (for c>1, of course)
>
> While I agree in general that we'll need to come up with appropriate
> acceptance criteria, etc, I don't think we want to complicate this patch
> with that initially.
>
>A SUSET GUC which caps the parallel GUC would be
> enough for an initial implementation, imv.
>

This is exactly what I have done in patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila

On Fri, Dec 5, 2014 at 8:46 PM, Stephen Frost  wrote:
>
> Amit,
>
> * Amit Kapila (amit.kapil...@gmail.com) wrote:
> > postgres=# explain select c1 from t1;
> >   QUERY PLAN
> > --
> >  Seq Scan on t1  (cost=0.00..101.00 rows=100 width=4)
> > (1 row)
> >
> >
> > postgres=# set parallel_seqscan_degree=4;
> > SET
> > postgres=# explain select c1 from t1;
> >   QUERY PLAN
> > --
> >  Parallel Seq Scan on t1  (cost=0.00..25.25 rows=100 width=4)
> >Number of Workers: 4
> >Number of Blocks Per Workers: 25
> > (3 rows)
>
> This is all great and interesting, but I feel like folks might be
> waiting to see just what kind of performance results come from this (and
> what kind of hardware is needed to see gains..).

Initially I was thinking that first we should discuss if the design
and idea used in patch is sane, but now as you have asked and
even Robert has asked the same off list to me, I will take the
performance data next week (Another reason why I have not
taken any data is that still the work to push qualification down
to workers is left which I feel is quite important).  However I still
think if I get some feedback on some of the basic things like below,
it would be good.

1. As the patch currently stands, it just shares the relevant
data (like relid, target list, block range each worker should
perform on etc.) to the worker and then worker receives that
data and form the planned statement which it will execute and
send the results back to master backend.  So the question
here is do you think it is reasonable or should we try to form
the complete plan for each worker and then share the same
and may be other information as well like range table entries
which are required.   My personal gut feeling in this matter
is that for long term it might be better to form the complete
plan of each worker in master and share the same, however
I think the current way as done in patch (okay that needs
some improvement) is also not bad and quite easier to implement.

2. Next question related to above is what should be the
output of ExplainPlan, as currently worker is responsible
for forming its own plan, Explain Plan is not able to show
the detailed plan for each worker, is that okay?

3. Some places where optimizations are possible:
- Currently after getting the tuple from heap, it is deformed by
worker and sent via message queue to master backend, master
backend then forms the tuple and send it to upper layer which
before sending it to frontend again deforms it via slot_getallattrs(slot).
- Master backend currently receives the data from multiple workers
serially.  We can optimize in a way that it can check other queues,
if there is no data in current queue.
- Master backend is just responsible for coordination among workers
It shares the required information to workers and then fetch the
data processed by each worker, by using some more logic, we might
be able to make master backend also fetch data from heap rather than
doing just co-ordination among workers.

I think in all above places we can do some optimisation, however
we can do that later as well, unless they hit the performance badly for
cases which people care most.

4. Should parallel_seqscan_degree value be dependent on other
backend processes like MaxConnections, max_worker_processes,
autovacuum_max_workers do or should it be independent like
max_wal_senders?

I think it is better to keep it dependent on other backend processes,
however for simplicity, I have kept it similar to max_wal_senders for now.

> There's likely to be
> situations where this change is an improvement while also being cases
> where it makes things worse.

Agreed and I think that will be more clear after doing some
performance tests.

> One really interesting case would be parallel seq scans which are
> executing against foreign tables/FDWs..
>

Sure.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread David Rowley

On 4 December 2014 at 19:35, Amit Kapila  wrote:

>
> Attached patch is just to facilitate the discussion about the
> parallel seq scan and may be some other dependent tasks like
> sharing of various states like combocid, snapshot with parallel
> workers.  It is by no means ready to do any complex test, ofcourse
> I will work towards making it more robust both in terms of adding
> more stuff and doing performance optimizations.
>
> Thoughts/Suggestions?
>
>
This is good news!
I've not gotten to look at the patch yet, but I thought you may be able to
make use of the attached at some point.

It's bare-bones core support for allowing aggregate states to be merged
together with another aggregate state. I would imagine that if a query such
as:

SELECT MAX(value) FROM bigtable;

was run, then a series of parallel workers could go off and each find the
max value from their portion of the table and then perhaps some other node
type would then take all the intermediate results from the workers, once
they're finished, and join all of the aggregate states into one and return
that. Naturally, you'd need to check that all aggregates used in the
targetlist had a merge function first.

This is just a few hours of work. I've not really tested the pg_dump
support or anything yet. I've also not added any new functions to allow
AVG() or COUNT() to work, I've really just re-used existing functions where
I could, as things like MAX() and BOOL_OR() can just make use of the
existing transition function. I thought that this might be enough for early
tests.

I'd imagine such a workload, ignoring IO overhead, should scale pretty much
linearly with the number of worker processes. Of course, if there was a
GROUP BY clause then the merger code would have to perform more work.

If you think you might be able to make use of this, then I'm willing to go
off and write all the other merge functions required for the other
aggregates.

Regards

David Rowley

merge_aggregate_state_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila

On Fri, Dec 5, 2014 at 8:38 PM, José Luis Tallón 
wrote:
>
> On 12/04/2014 07:35 AM, Amit Kapila wrote:
>>
>> [snip]
>>
>> The number of worker backends that can be used for
>> parallel seq scan can be configured by using a new GUC
>> parallel_seqscan_degree, the default value of which is zero
>> and it means parallel seq scan will not be considered unless
>> user configures this value.
>
>
> The number of parallel workers should be capped (of course!) at the
maximum amount of "processors" (cores/vCores, threads/hyperthreads)
available.
>

Also, it should consider MaxConnections configured by user.

> More over, when load goes up, the relative cost of parallel working
should go up as well.
> Something like:
> p = number of cores
> l = 1min-load
>
> additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
>
> (for c>1, of course)
>

How will you identify load in above formula and what is exactly 'c'
(is it parallel workers involved?).

For now, I have managed this simply by having a configuration
variable and it seems to me that the same should be good
enough for first version, we can definitely enhance it in future
version by dynamically allocating the number of workers based
on their availability and need of query, but I think lets leave that
for another day.

>
>> In ExecutorStart phase, initiate the required number of workers
>> as per parallel seq scan plan and setup dynamic shared memory and
>> share the information required for worker to execute the scan.
>> Currently I have just shared the relId, targetlist and number
>> of blocks to be scanned by worker, however I think we might want
>> to generate a plan for each of the workers in master backend and
>> then share the same to individual worker.
>
> [snip]
>>
>> Attached patch is just to facilitate the discussion about the
>> parallel seq scan and may be some other dependent tasks like
>> sharing of various states like combocid, snapshot with parallel
>> workers.  It is by no means ready to do any complex test, ofcourse
>> I will work towards making it more robust both in terms of adding
>> more stuff and doing performance optimizations.
>>
>> Thoughts/Suggestions?
>
>
> Not directly (I haven't had the time to read the code yet), but I'm
thinking about the ability to simply *replace* executor methods from an
extension.
> This could be an alternative to providing additional nodes that the
planner can include in the final plan tree, ready to be executed.
>
> The parallel seq scan nodes are definitively the best approach for
"parallel query", since the planner can optimize them based on cost.
> I'm wondering about the ability to modify the implementation of some
methods themselves once at execution time: given a previously planned
query, chances are that, at execution time (I'm specifically thinking about
prepared statements here), a different implementation of the same "node"
might be more suitable and could be used instead while the condition holds.
>

Idea sounds interesting and I think probably in some cases
different implementation of same node might help, but may be
at this stage if we focus on one kind of implementation (which is
a win for reasonable number of cases) and make it successful,
then doing alternative implementations will be comparatively
easier and have more chances of success.

> If this latter line of thinking is too off-topic within this thread and
there is any interest, we can move the comments to another thread and I'd
begin work on a PoC patch. It might as well make sense to implement the
executor overloading mechanism alongide the custom plan API, though.
>

Sure, please go ahead which ever way you like to proceed.
If you want to contribute in this area/patch, then you are
welcome.

> Any comments appreciated.
>
>
> Thank you for your work, Amit

Many thanks to you as well for showing interest.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Jim Nasby


On 12/5/14, 5:49 PM, Josh Berkus wrote:

On 12/05/2014 02:41 PM, Jim Nasby wrote:

Perhaps we should also officially recommend production servers be setup
to create core files. AFAIK the only downside is the time it would take
to write a core that's huge because of shared buffers, but perhaps
there's some way to avoid writing those? (That means the core won't help
if the bug is due to something in a buffer, but that seems unlikely
enough that the tradeoff is worth it...)


Not practical in a lot of cases.  For example, this user was unwilling
to enable core dumps on the production replicas because writing out the
16GB of shared buffers they had took over 10 minutes in a test.


Which is why I wondered if there's a way to avoid writing out shared buffers...

But at least getting a stack trace would be a big start.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Peter Geoghegan

On Fri, Dec 5, 2014 at 3:49 PM, Josh Berkus  wrote:
> to enable core dumps on the production replicas because writing out the
> 16GB of shared buffers they had took over 10 minutes in a test.

No one ever thinks it'll happen to them anyway - recommending enabling
core dumps seems like a waste of time, since as Tom mentioned package
managers shouldn't be expected to get on board with that plan. I think
a zero overhead backtrace feature from within a SIGSEGV handler (with
appropriate precautions around corrupt/exhausted call stacks) using
glibc is the right thing here.

Indeed, glibc does have infrastructure that can be used to get a
backtrace [1], which is probably what we'd end up using, but even
POSIX has infrastructure like sigaltstack(). It can be done.

[1] https://www.gnu.org/software/libc/manual/html_node/Backtraces.html
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Josh Berkus

On 12/05/2014 02:41 PM, Jim Nasby wrote:
> Perhaps we should also officially recommend production servers be setup
> to create core files. AFAIK the only downside is the time it would take
> to write a core that's huge because of shared buffers, but perhaps
> there's some way to avoid writing those? (That means the core won't help
> if the bug is due to something in a buffer, but that seems unlikely
> enough that the tradeoff is worth it...)

Not practical in a lot of cases.  For example, this user was unwilling
to enable core dumps on the production replicas because writing out the
16GB of shared buffers they had took over 10 minutes in a test.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Tom Lane

Peter Geoghegan  writes:
> On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby  wrote:
>> Perhaps we should also officially recommend production servers be setup to
>> create core files. AFAIK the only downside is the time it would take to
>> write a core that's huge because of shared buffers

> I don't think that's every going to be practical.

I'm fairly sure that on some distros (Red Hat, at least) there is distro
policy against having daemons produce core dumps by default, for multiple
reasons including possible disk space consumption and leakage of secure
information.  So even if we recommended this, the recommendation would be
overridden by some/many packagers.

There is much to be said though for trying to emit at least a minimal
stack trace into the postmaster log file.  I'm pretty sure glibc has a
function for that; dunno if it's going to be practical on other platforms.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Peter Geoghegan

On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby  wrote:
> Perhaps we should also officially recommend production servers be setup to
> create core files. AFAIK the only downside is the time it would take to
> write a core that's huge because of shared buffers

I don't think that's every going to be practical.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Jim Nasby


On 12/5/14, 4:11 PM, Peter Geoghegan wrote:

On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus  wrote:

We made some changes which decreased query cancel (optimizing queries,
turning on hot_standby_feedback) and we haven't seen a segfault since
then.  As far as the user is concerned, this solves the problem, so I'm
never going to get a trace or a core dump file.


Forgot a major piece of evidence as to why I think this is related to
query cancel:  in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.


I wonder if it would be useful to add additional instrumentation so
that even without a core dump, there was some cursory information
about the nature of a segfault.

Yes, doing something with a SIGSEGV handler is very scary, and there
are major portability concerns (e.g.
https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
robust on Linux. For what it's worth, this open source project offers
that kind of functionality in the form of a library:
https://github.com/vmarkovtsev/DeathHandler


Perhaps we should also officially recommend production servers be setup to 
create core files. AFAIK the only downside is the time it would take to write a 
core that's huge because of shared buffers, but perhaps there's some way to 
avoid writing those? (That means the core won't help if the bug is due to 
something in a buffer, but that seems unlikely enough that the tradeoff is 
worth it...)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-05 Thread Adam Brightwell

Stephen,

Thanks for the feedback.


> > diff --git a/src/backend/access/transam/xlogfuncs.c
> b/src/backend/access/transam/xlogfuncs.c
> > --- 56,62 
> >
> >   backupidstr = text_to_cstring(backupid);
> >
> > ! if (!superuser() && !check_role_attribute(GetUserId(),
> ROLE_ATTR_REPLICATION))
> >   ereport(ERROR,
> >   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> >  errmsg("must be superuser or replication role to run a
> backup")));
>
> The point of has_role_attribute() was to avoid the need to explicitly
> say "!superuser()" everywhere.  The idea with check_role_attribute() is
> that we want to present the user (in places like pg_roles) with the
> values that are *actually* set.
>
> In other words, the above should just be:
>
> if (!has_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))
>

Yes, I understand.  My original thought here was that I was replacing
'has_rolreplication' which didn't perform any superuser check and that I
was trying to adhere to minimal changes.  But I agree this would be the
appropriate solution.  Fixed.

>
> > + /*
> > +  * check_role_attribute
> > +  *   Check if the role with the specified id has been assigned a
> specific
> > +  *   role attribute.  This function does not allow any superuser
> bypass.
>
> I don't think we need to say that it doesn't "allow" superuser bypass.
> Instead, I'd comment that has_role_attribute() should be used for
> permissions checks while check_role_attribute is for checking what the
> value in the rolattr bitmap is and isn't for doing permissions checks
> directly.
>

Ok. Understood.  Fixed.

> diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
> > *** pg_role_aclcheck(Oid role_oid, Oid rolei
> > *** 4602,4607 
> > --- 4603,4773 
> >   return ACLCHECK_NO_PRIV;
> >   }
> >
> > + /*
> > +  * pg_has_role_attribute_id_attr
>
> I'm trying to figure out what the point of the trailing "_attr" in the
> function name is..?  Doesn't seem necessary to have that for these
> functions and it'd look a bit cleaner without it, imv.
>

So, I was trying to follow what I perceived as the following convention for
these functions: pg.  So, what "_attr"
represents is the second argument of the function which is the attribute to
check.  I could agree that might be unnecessary, but that was my thought
process on it.  At any rate, I've removed it.

> ! #define ROLE_ATTR_ALL  255 /* or (1 << N_ROLE_ATTRIBUTES) - 1 */
>
> I'd say "equals" or something rather than "or" since that kind of
> implies that it's an laternative, but we can't do that as discussed in
> the comment (which I like).
>

Agreed. Fixed.


> > ! /* role attribute check routines */
> > ! extern bool has_role_attribute(Oid roleid, RoleAttr attribute);
> > ! extern bool check_role_attribute(Oid roleid, RoleAttr attribute);
>
> With all the 'has_role_attribute(GetUserId(), ROLEATTR_BLAH)' cases, I'd
> suggest doing the same as 'superuser()' and also provide a simpler
> version: 'have_role_attribute(ROLEATTR_BLAH)' which handles doing the
> GetUserId() itself.  That'd simplify quite a few of the above calls.
>

Good point.  Added.

Attached is an updated patch.

-Adam


-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 133143d..8475985
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 22,32 
--- 22,34 
  #include "access/xlog_internal.h"
  #include "access/xlogutils.h"
  #include "catalog/catalog.h"
+ #include "catalog/pg_authid.h"
  #include "catalog/pg_type.h"
  #include "funcapi.h"
  #include "miscadmin.h"
  #include "replication/walreceiver.h"
  #include "storage/smgr.h"
+ #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/numeric.h"
  #include "utils/guc.h"
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,60 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser() && !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
--- 56,62 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!have_role_attribute(ROLE_ATTR_REPLICATION))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,88 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser() && !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg("must be superuser or replication role to run a backup";
--- 84,90 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!have_role_attribute(ROLE_ATTR_REPLICATION))
  		ereport(ERR

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Peter Geoghegan

On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus  wrote:
>> We made some changes which decreased query cancel (optimizing queries,
>> turning on hot_standby_feedback) and we haven't seen a segfault since
>> then.  As far as the user is concerned, this solves the problem, so I'm
>> never going to get a trace or a core dump file.
>
> Forgot a major piece of evidence as to why I think this is related to
> query cancel:  in each case, the segfault was preceeded by a
> multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
> the backend running the query which segfaulted might have been the only
> backend *not* cancelled due to query conflict concurrently.
> Contradicting this, there are other multi-backend query cancels in the
> logs which do NOT produce a segfault.

I wonder if it would be useful to add additional instrumentation so
that even without a core dump, there was some cursory information
about the nature of a segfault.

Yes, doing something with a SIGSEGV handler is very scary, and there
are major portability concerns (e.g.
https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
robust on Linux. For what it's worth, this open source project offers
that kind of functionality in the form of a library:
https://github.com/vmarkovtsev/DeathHandler

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Josh Berkus

On 12/05/2014 12:54 PM, Josh Berkus wrote:
> Hackers,
> 
> This is not a complete enough report for a diagnosis.  I'm posting it
> here just in case someone else sees something like it, and having an
> additional report will help figure out the underlying issue.
> 
> * 700GB database with around 5,000 writes per second
> * 8 replicas handling around 10,000 read queries per second each
> * replicas are slammed (40-70% utilization)
> * replication produces lots of replication query cancels
> 
> In this scenario, a specific query against some of the less busy and
> fairly small tables would produce a segfault (signal 11) once every 1-4
> days randomly.  This query could have 100's of successful runs for every
> segfault. This was not reproduceable manually, and the segfaults never
> happened on the master.  Nor did we ever see a segfault based on any
> other query, including against the tables which were generally the
> source of the query cancels.
> 
> In case it's relevant, the query included use of regexp_split_to_array()
> and ORDER BY random(), neither of which are generally used in the user's
> other queries.
> 
> We made some changes which decreased query cancel (optimizing queries,
> turning on hot_standby_feedback) and we haven't seen a segfault since
> then.  As far as the user is concerned, this solves the problem, so I'm
> never going to get a trace or a core dump file.

Forgot a major piece of evidence as to why I think this is related to
query cancel:  in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Elusive segfault with 9.3.5 & query cancel

2014-12-05 Thread Josh Berkus

Hackers,

This is not a complete enough report for a diagnosis.  I'm posting it
here just in case someone else sees something like it, and having an
additional report will help figure out the underlying issue.

* 700GB database with around 5,000 writes per second
* 8 replicas handling around 10,000 read queries per second each
* replicas are slammed (40-70% utilization)
* replication produces lots of replication query cancels

In this scenario, a specific query against some of the less busy and
fairly small tables would produce a segfault (signal 11) once every 1-4
days randomly.  This query could have 100's of successful runs for every
segfault. This was not reproduceable manually, and the segfaults never
happened on the master.  Nor did we ever see a segfault based on any
other query, including against the tables which were generally the
source of the query cancels.

In case it's relevant, the query included use of regexp_split_to_array()
and ORDER BY random(), neither of which are generally used in the user's
other queries.

We made some changes which decreased query cancel (optimizing queries,
turning on hot_standby_feedback) and we haven't seen a segfault since
then.  As far as the user is concerned, this solves the problem, so I'm
never going to get a trace or a core dump file.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Bruce Momjian

On Fri, Dec  5, 2014 at 04:10:12PM -0300, Alvaro Herrera wrote:
> Well, ALTER TABLE is special: you can give several subcommands, and each
> subcommand can be one of a rather long list of possible subcommands.
> Testing every combination would mean a combinatorial explosion, which
> would indeed be too large.  But surely we want a small bunch of tests to
> prove that having several subcommands works fine, and also at least one
> test for every possible subcommand.
> 
> > We have rejected simple regression test additions on the basis that
> > the syntax works and is unlikely to break once tested once by the
> > developer.
> 
> This rationale doesn't sound so good to me.  Something might work fine
> the minute it is committed, but someone else might break it
> inadvertently later; this has actually happened.  Having no tests at all
> for a feature isn't good.  
> 
> I know we have recently rejected patches that added tests only to
> improve the coverage percent, for instance in CREATE DATABASE, because
> the runtime of the tests got too large.  Are there other examples of
> rejected tests?

Yes, there are many cases we have added options or keywords but didn't
add a regression test.

> > > > and it could easily bloat the regression tests over time.
> > > 
> > > We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
> > > qualify as bloat?
> > 
> > No, that seems fine.  I am worried about having to have a test for every
> > syntax change, which we currently don't do?  Was that issue not clear in
> > my first email?
> 
> Well, if with "every syntax change" you mean "every feature addition",
> then I think we should have at least one test for each, yes.  It's not
> like we add new syntax every day anyway.

Well, my point is that this is a new behavior we have to do, at least to
test the logical DDL behavior --- I suppose it could be remove after
testing.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-05 Thread Peter Geoghegan

On Fri, Dec 5, 2014 at 12:33 PM, Robert Haas  wrote:
>> Well, if an alias is used, and you refer to an attribute using a
>> non-alias name (i.e. the original table name), then you'll already get
>> an error suggesting that the alias be used instead -- of course,
>> that's nothing new. It doesn't matter to the existing hinting
>> mechanism if the attribute name is otherwise wrong. Once you fix the
>> code to use the alias suggested, you'll then get this new
>> Levenshtein-based hint.
>
> In that case, I think I favor giving no hint at all when the RTE name
> is specified but doesn't match exactly.

I don't follow. The existing mechanism only concerns what to do when
the original table name was used when an alias should have been used
instead. What does that have to do with this patch?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-05 Thread Robert Haas

On Wed, Dec 3, 2014 at 9:21 PM, Peter Geoghegan  wrote:
> On Tue, Dec 2, 2014 at 1:11 PM, Robert Haas  wrote:
>> Basically, the case in which I think it's helpful to issue a
>> suggestion here is when the user has used the table name rather than
>> the alias name.  I wonder if it's worth checking for that case
>> specifically, in lieu of what you've done here, and issuing a totally
>> different hint in that case ("HINT: You must refer to this as column
>> as "prime_minister.id" rather than "cameron.id").
>
> Well, if an alias is used, and you refer to an attribute using a
> non-alias name (i.e. the original table name), then you'll already get
> an error suggesting that the alias be used instead -- of course,
> that's nothing new. It doesn't matter to the existing hinting
> mechanism if the attribute name is otherwise wrong. Once you fix the
> code to use the alias suggested, you'll then get this new
> Levenshtein-based hint.

In that case, I think I favor giving no hint at all when the RTE name
is specified but doesn't match exactly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas

On Fri, Dec 5, 2014 at 3:05 PM, Jim Nasby  wrote:
>> On what basis do you expect that?  Every time you use a view, you're
>> using a pg_node_tree.  Nobody's ever complained that having to reload
>> the pg_node_tree column was too slow, and I see no reason to suppose
>> that things would be any different here.
>>
>> I mean, we can certainly invent something new if there is a reason to
>> do so.  But you (and a few other people) seem to be trying pretty hard
>> to avoid using the massive amount of infrastructure that we already
>> have to do almost this exact thing, which puzzles the heck out of me.
>
> My concern is how to do the routing of incoming tuples. I'm assuming it'd be
> significantly faster to compare two tuples than to run each tuple through a
> bunch of nodetrees.

As I said before, that's a completely unrelated problem.

To quickly route tuples for range or list partitioning, you're going
to want to have an array of Datums in memory and bseach it.  That says
nothing about how they should be stored on disk.  Whatever the on-disk
representation looks like, the relcache is going to need to reassemble
it into an array that can be binary-searched.  As long as that's not
hard to do - and none of the proposals here would make it hard to do -
there's no reason to care about this from that point of view.

At least, not that I can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Jim Nasby


On 12/5/14, 2:02 PM, Robert Haas wrote:

On Fri, Dec 5, 2014 at 2:52 PM, Jim Nasby  wrote:

The other option would be to use some custom rowtype to store boundary
values and have a method that can form a boundary tuple from a real one.
Either way, I suspect this is better than frequently evaluating
pg_node_trees.


On what basis do you expect that?  Every time you use a view, you're
using a pg_node_tree.  Nobody's ever complained that having to reload
the pg_node_tree column was too slow, and I see no reason to suppose
that things would be any different here.

I mean, we can certainly invent something new if there is a reason to
do so.  But you (and a few other people) seem to be trying pretty hard
to avoid using the massive amount of infrastructure that we already
have to do almost this exact thing, which puzzles the heck out of me.


My concern is how to do the routing of incoming tuples. I'm assuming it'd be 
significantly faster to compare two tuples than to run each tuple through a 
bunch of nodetrees.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Peter Geoghegan

On Fri, Dec 5, 2014 at 1:01 AM, Anssi Kääriäinen
 wrote:
> If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in
> Django for the existing save() method, then it needs to know if the
> result was an UPDATE or INSERT. If we are going to use this for other
> operations (for example bulk merge of rows to the database), it would be
> very convenient to have per-row updated/created information available so
> that we can fire the post_save signals for the rows. If we don't have
> that information available, it means we can't fire signals, and no
> signals means we can't use the bulk merge operation internally as we
> have to fire the signals where that happened before.
>
> Outside of Django there are likely similar reasons to want to know if
> the result of an operation was a creation of a new row. The reason could
> be creation of related row, doing some action in application layer, or
> just UI message telling "object created successfully" vs "object updated
> successfully".

It probably isn't ideal, but you'd at least be able to do something
with row level triggers in the absence of a standard way of directly
telling if an insert or update was performed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas

On Fri, Dec 5, 2014 at 2:52 PM, Jim Nasby  wrote:
> The other option would be to use some custom rowtype to store boundary
> values and have a method that can form a boundary tuple from a real one.
> Either way, I suspect this is better than frequently evaluating
> pg_node_trees.

On what basis do you expect that?  Every time you use a view, you're
using a pg_node_tree.  Nobody's ever complained that having to reload
the pg_node_tree column was too slow, and I see no reason to suppose
that things would be any different here.

I mean, we can certainly invent something new if there is a reason to
do so.  But you (and a few other people) seem to be trying pretty hard
to avoid using the massive amount of infrastructure that we already
have to do almost this exact thing, which puzzles the heck out of me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Jim Nasby

On 12/5/14, 1:22 PM, Jim Nasby wrote:

On 12/5/14, 3:42 AM, Amit Langote wrote:

>  I think you are right.  I think in this case we need something similar
>to column pg_index.indexprs which is of type pg_node_tree(which
>seems to be already suggested by Robert). So may be we can proceed
>with this type and see if any one else has better idea.

One point raised about/against pg_node_tree was the values represented therein 
would turn out to be too generalized to be used with advantage during planning. 
But, it seems we could deserialize it in advance back to the internal form 
(like an array of a struct) as part of the cached relation data. This overhead 
would only be incurred in case of partitioned tables. Perhaps this is what 
Robert suggested elsewhere.

In order to store a composite type in a catalog, we would need to have one field that has 
the typid of the composite, and the field that stores the actual composite data would 
need to be a "dumb" varlena that stores the composite HeapTupleHeader.

On further thought; if we disallow NULL as a partition boundary, we don't need 
a separate rowtype; we could just use the one associated with the relation 
itself. Presumably that would make comparing tuples to the relation list a lot 
easier.

I was hung up on how that would work in the case of ALTER TABLE, but we'd have 
the same problem with using pg_node_tree: if you alter a table in such a way 
that *might* affect your partitioning, you have to do some kind of revalidation 
anyway.

The other option would be to use some custom rowtype to store boundary values 
and have a method that can form a boundary tuple from a real one. Either way, I 
suspect this is better than frequently evaluating pg_node_trees.

There may be one other option. If range partitions are defined in terms of an 
expression that is different for every partition (ie: (substr(product_key, 1, 
4), date_trunc('month', sales_date))) then we could use a hash of that 
expression to identify a partition. In other words, range partitioning becomes 
a special case of hash partitioning. I do think we need a programmatic means to 
identify the range of an individual partition and hash won't solve that, but 
the performance of that case isn't critical so we could use pretty much 
whatever we wanted to there.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Jim Nasby

On 12/5/14, 3:42 AM, Amit Langote wrote:

>  I think you are right.  I think in this case we need something similar
>to column pg_index.indexprs which is of type pg_node_tree(which
>seems to be already suggested by Robert). So may be we can proceed
>with this type and see if any one else has better idea.

One point raised about/against pg_node_tree was the values represented therein 
would turn out to be too generalized to be used with advantage during planning. 
But, it seems we could deserialize it in advance back to the internal form 
(like an array of a struct) as part of the cached relation data. This overhead 
would only be incurred in case of partitioned tables. Perhaps this is what 
Robert suggested elsewhere.

In order to store a composite type in a catalog, we would need to have one field that has 
the typid of the composite, and the field that stores the actual composite data would 
need to be a "dumb" varlena that stores the composite HeapTupleHeader.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Alvaro Herrera

Bruce Momjian wrote:
> On Fri, Dec  5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
> > > > Standard regression tests are helpful because patch authors include new 
> > > > test
> > > > cases in the patches that stress their new options or commands.  This 
> > > > new test
> > > > framework needs to be something that internally runs the regression 
> > > > tests and
> > > > exercises DDL deparsing as the regression tests execute DDL.  That 
> > > > would mean
> > > > that new commands and options would automatically be deparse-tested by 
> > > > our new
> > > > framework as soon as patch authors add standard regress support.
> > > 
> > > Are you saying every time a new option is added to a command that a new
> > > regression test needs to be added?
> > 
> > Not necessarily -- an existing test could be modified, as well.
> > 
> > > We don't normally do that,
> > 
> > I sure hope we do have all options covered by tests.
> 
> Are you saying that every combination of ALTER options is tested?

Well, ALTER TABLE is special: you can give several subcommands, and each
subcommand can be one of a rather long list of possible subcommands.
Testing every combination would mean a combinatorial explosion, which
would indeed be too large.  But surely we want a small bunch of tests to
prove that having several subcommands works fine, and also at least one
test for every possible subcommand.

> We have rejected simple regression test additions on the basis that
> the syntax works and is unlikely to break once tested once by the
> developer.

This rationale doesn't sound so good to me.  Something might work fine
the minute it is committed, but someone else might break it
inadvertently later; this has actually happened.  Having no tests at all
for a feature isn't good.  

I know we have recently rejected patches that added tests only to
improve the coverage percent, for instance in CREATE DATABASE, because
the runtime of the tests got too large.  Are there other examples of
rejected tests?

> > > and it could easily bloat the regression tests over time.
> > 
> > We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
> > qualify as bloat?
> 
> No, that seems fine.  I am worried about having to have a test for every
> syntax change, which we currently don't do?  Was that issue not clear in
> my first email?

Well, if with "every syntax change" you mean "every feature addition",
then I think we should have at least one test for each, yes.  It's not
like we add new syntax every day anyway.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Bruce Momjian

On Fri, Dec  5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
> > > Standard regression tests are helpful because patch authors include new 
> > > test
> > > cases in the patches that stress their new options or commands.  This new 
> > > test
> > > framework needs to be something that internally runs the regression tests 
> > > and
> > > exercises DDL deparsing as the regression tests execute DDL.  That would 
> > > mean
> > > that new commands and options would automatically be deparse-tested by 
> > > our new
> > > framework as soon as patch authors add standard regress support.
> > 
> > Are you saying every time a new option is added to a command that a new
> > regression test needs to be added?
> 
> Not necessarily -- an existing test could be modified, as well.
> 
> > We don't normally do that,
> 
> I sure hope we do have all options covered by tests.

Are you saying that every combination of ALTER options is tested?  We
have rejected simple regression test additions on the basis that the
syntax works and is unlikely to break once tested once by the developer.

> > and it could easily bloat the regression tests over time.
> 
> We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
> qualify as bloat?

No, that seems fine.  I am worried about having to have a test for every
syntax change, which we currently don't do?  Was that issue not clear in
my first email?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Jim Nasby


On 12/5/14, 9:08 AM, José Luis Tallón wrote:


More over, when load goes up, the relative cost of parallel working should go 
up as well.
Something like:
 p = number of cores
 l = 1min-load

 additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)

(for c>1, of course)


...


The parallel seq scan nodes are definitively the best approach for "parallel 
query", since the planner can optimize them based on cost.
I'm wondering about the ability to modify the implementation of some methods themselves 
once at execution time: given a previously planned query, chances are that, at execution 
time (I'm specifically thinking about prepared statements here), a different 
implementation of the same "node" might be more suitable and could be used 
instead while the condition holds.


These comments got me wondering... would it be better to decide on parallelism 
during execution instead of at plan time? That would allow us to dynamically 
scale parallelism based on system load. If we don't even consider parallelism 
until we've pulled some number of tuples/pages from a relation, this would also 
eliminate all parallel overhead on small relations.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-05 Thread Stephen Frost

Adam,

* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
> I have attached an updated patch that incorporates the feedback and
> recommendations provided.

Thanks.  Comments below.

> diff --git a/src/backend/access/transam/xlogfuncs.c 
> b/src/backend/access/transam/xlogfuncs.c
> --- 56,62 
>   
>   backupidstr = text_to_cstring(backupid);
>   
> ! if (!superuser() && !check_role_attribute(GetUserId(), 
> ROLE_ATTR_REPLICATION))
>   ereport(ERROR,
>   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
>  errmsg("must be superuser or replication role to run a 
> backup")));

The point of has_role_attribute() was to avoid the need to explicitly
say "!superuser()" everywhere.  The idea with check_role_attribute() is
that we want to present the user (in places like pg_roles) with the
values that are *actually* set.

In other words, the above should just be:

if (!has_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))

> --- 84,90 
>   {
>   XLogRecPtr  stoppoint;
>   
> ! if (!superuser() && !check_role_attribute(GetUserId(), 
> ROLE_ATTR_REPLICATION))
>   ereport(ERROR,
>   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
>(errmsg("must be superuser or replication role to run a 
> backup";

Ditto here.

> diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
> --- 5031,5081 
>   }
>   
>   /*
> !  * has_role_attribute
> !  *   Check if the role has the specified role has a specific role attribute.
> !  *   This function will always return true for roles with superuser 
> privileges
> !  *   unless the attribute being checked is CATUPDATE.
>*
> !  * roleid - the oid of the role to check.
> !  * attribute - the attribute to check.
>*/
>   bool
> ! has_role_attribute(Oid roleid, RoleAttr attribute)
>   {
> ! /*
> !  * Superusers bypass all permission checking except in the case of 
> CATUPDATE.
> !  */
> ! if (!(attribute & ROLE_ATTR_CATUPDATE) && superuser_arg(roleid))
>   return true;
>   
> ! return check_role_attribute(roleid, attribute);
>   }
>   
> + /*
> +  * check_role_attribute
> +  *   Check if the role with the specified id has been assigned a specific
> +  *   role attribute.  This function does not allow any superuser bypass.

I don't think we need to say that it doesn't "allow" superuser bypass.
Instead, I'd comment that has_role_attribute() should be used for
permissions checks while check_role_attribute is for checking what the
value in the rolattr bitmap is and isn't for doing permissions checks
directly.

> diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
> *** CreateRole(CreateRoleStmt *stmt)
> *** 82,93 
>   boolencrypt_password = Password_encryption; /* encrypt 
> password? */
>   charencrypted_password[MD5_PASSWD_LEN + 1];
>   boolissuper = false;/* Make the user a superuser? */
> ! boolinherit = true; /* Auto inherit privileges? */
>   boolcreaterole = false; /* Can this user create 
> roles? */
>   boolcreatedb = false;   /* Can the user create 
> databases? */
>   boolcanlogin = false;   /* Can this user login? 
> */
>   boolisreplication = false;  /* Is this a replication role? 
> */
>   boolbypassrls = false;  /* Is this a row 
> security enabled role? */
>   int connlimit = -1; /* maximum connections allowed 
> */
>   List   *addroleto = NIL;/* roles to make this a member of */
>   List   *rolemembers = NIL;  /* roles to be members of this 
> role */
> --- 74,86 
>   boolencrypt_password = Password_encryption; /* encrypt 
> password? */
>   charencrypted_password[MD5_PASSWD_LEN + 1];
>   boolissuper = false;/* Make the user a superuser? */
> ! boolinherit = true; /* Auto inherit privileges? */
>   boolcreaterole = false; /* Can this user create 
> roles? */
>   boolcreatedb = false;   /* Can the user create 
> databases? */
>   boolcanlogin = false;   /* Can this user login? 
> */
>   boolisreplication = false;  /* Is this a replication role? 
> */
>   boolbypassrls = false;  /* Is this a row 
> security enabled role? */
> + RoleAttrattributes = ROLE_ATTR_NONE;/* role attributes, 
> initialized to none. */
>   int connlimit = -1; /* maximum connections allowed 
> */
>   List   *addroleto = NIL;/* roles to make this a member of */
>   List   *rolemembers = NIL;  /* roles to be members of this 
> role */

Please clean up the wh

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Robert Haas

On Fri, Dec 5, 2014 at 1:07 PM, Peter Geoghegan  wrote:
> Agreed. Importantly, we won't have painted ourselves into a corner
> where we cannot add it later, now that RETURNING projects updates
> tuples, too (V1.5 established that).

Yeah, it seems fine to postpone that to a later version, as long as we
haven't painted ourselves into a corner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Peter Geoghegan

On Fri, Dec 5, 2014 at 10:00 AM, Josh Berkus  wrote:
> I thought the point of INSERT ... ON CONFLICT update was so that you
> didn't have to care if it was a new row or not?
>
> If you do care, it seems like it makes more sense to do your own INSERTs
> and UPDATEs, as Django currently does.
>
> I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff
> which let me know updated|inserted|ignored, but I also don't see it as a
> feature requirement for 9.5.

Agreed. Importantly, we won't have painted ourselves into a corner
where we cannot add it later, now that RETURNING projects updates
tuples, too (V1.5 established that). I'm pretty confident that it
would be a mistake to try and make the inner "excluded" and "target"
aliases visible, in any case, because of the annoying ambiguity it
creates for the common, simple cases. I think it has to be a magic
function, or something like that.

I really hoped we'd be having a more technical, implementation level
discussion at this point. Simon rightly emphasized the importance of
getting the semantics right, and the importance of discussing that up
front, but I'm concerned;  that's almost all that has been discussed
here so far, which is surely not balanced either.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSSAPI, SSPI - include_realm default

2014-12-05 Thread Stephen Frost

* Magnus Hagander (mag...@hagander.net) wrote:
> On Wed, Nov 26, 2014 at 8:01 PM, Stephen Frost  wrote:
> >   As such, I'd like to propose changing the default to be
> >   'include_realm=1'.
> 
> Per our previous discussions, but to make sure it's also on record for
> others, +1 for this suggestion.

Patch attached which does this for master.

> >   This would be done for 9.5 and we would need to note it in the release
> >   notes, of course.
> 
> I suggest we also backpatch some documentation suggesting that people
> manually change the include_realm parameter (perhaps also with a note
> saying that the default will change in 9.5).

I'll work on a patch for back-branches if everyone is alright with this
patch against master.  Given my recent track record for changing
wording around, it seems prudent to get agreement on this first.

Thanks,

Stephen
From fe4d7a01f5d31fac565a8de2485cd9d113a9cbb4 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Fri, 5 Dec 2014 12:54:05 -0500
Subject: [PATCH] Change default for include_realm to zero

The default behavior for GSS and SSPI authentication methods has long
been to strip the realm off of the principal, however, this is not a
secure approach in multi-realm environments and the use-case for the
parameter at all has been superseded by the regex-based mapping support
available in pg_ident.conf.

Change the default for include_realm to be 'zero', meaning that we do
NOT remove the realm from the principal by default.  Any installations
which depend on the existing behavior will need to update their
configurations (ideally by leaving include_realm on and adding a mapping
in pg_ident.conf).  Note that the mapping capability exists in all
currently supported versions of PostgreSQL and so this change can be
done today.  Barring that, existing users can update their
configurations today to explicitly set include_realm=0 to ensure that
the prior behavior is maintained when they upgrade.

This needs to be noted in the release notes.

Per discussion with Magnus and Peter.
---
 doc/src/sgml/client-auth.sgml | 66 ---
 src/backend/libpq/hba.c   | 13 +
 2 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 7704f73..69517dd 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -943,15 +943,24 @@ omicron bryanh  guest1

 

-Client principals must have their PostgreSQL database user
-name as their first component, for example
-pgusername@realm.  Alternatively, you can use a user name
-mapping to map from the first component of the principal name to the
-database user name.  By default, the realm of the client is
-not checked by PostgreSQL. If you have cross-realm
-authentication enabled and need to verify the realm, use the
-krb_realm parameter, or enable include_realm
-and use user name mapping to check the realm.
+Client principals can be mapped to different PostgreSQL
+database user names with pg_ident.conf.  For example,
+pgusername@realm could be mapped to just pgusername.
+Alternatively, you can use the full username@realm principal as
+the role name in PostgreSQL without any mapping.
+   
+
+   
+PostgreSQL also supports a parameter to strip the realm from
+the principal.  This method is supported for backwards compatibility and is
+strongly discouraged as it is then impossible to distinguish different users
+with the same username but coming from different realms.  To enable this,
+set include_realm to zero.  For simple single-realm
+installations, include_realm combined with the
+krb_realm parameter (which checks that the realm provided
+matches exactly what is in the krb_realm parameter) would be a secure but
+less capable option compared to specifying an explicit mapping in
+pg_ident.conf.

 

@@ -993,10 +1002,13 @@ omicron bryanh  guest1
   include_realm
   

-If set to 1, the realm name from the authenticated user
-principal is included in the system user name that's passed through
-user name mapping (). This is
-useful for handling users from multiple realms.
+If set to 0, the realm name from the authenticated user principal is
+stripped off before being passed through the user name mapping
+(). This is discouraged and is
+primairly available for backwards compatibility as it is not secure
+in multi-realm environments unless krb_realm is also used.  Users
+are recommended to leave include_realm set to the default (1) and to
+provide an explicit mapping in pg_ident.conf.

   
  
@@ -1008,10 +1020,11 @@ omicron bryanh  guest1
 Allows for mapping between system and database user names.

Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-05 Thread Adam Brightwell

All,

I have attached an updated patch that incorporates the feedback and
recommendations provided.

Thanks,
Adam


-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 133143d..ccdff09
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 22,32 
--- 22,34 
  #include "access/xlog_internal.h"
  #include "access/xlogutils.h"
  #include "catalog/catalog.h"
+ #include "catalog/pg_authid.h"
  #include "catalog/pg_type.h"
  #include "funcapi.h"
  #include "miscadmin.h"
  #include "replication/walreceiver.h"
  #include "storage/smgr.h"
+ #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/numeric.h"
  #include "utils/guc.h"
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,60 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser() && !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
--- 56,62 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser() && !check_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,88 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser() && !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg("must be superuser or replication role to run a backup";
--- 84,90 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser() && !check_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg("must be superuser or replication role to run a backup";
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
new file mode 100644
index d30612c..0b23ba0
*** a/src/backend/catalog/aclchk.c
--- b/src/backend/catalog/aclchk.c
*** aclcheck_error_type(AclResult aclerr, Oi
*** 3423,3448 
  }
  
  
- /* Check if given user has rolcatupdate privilege according to pg_authid */
- static bool
- has_rolcatupdate(Oid roleid)
- {
- 	bool		rolcatupdate;
- 	HeapTuple	tuple;
- 
- 	tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid));
- 	if (!HeapTupleIsValid(tuple))
- 		ereport(ERROR,
- (errcode(ERRCODE_UNDEFINED_OBJECT),
-  errmsg("role with OID %u does not exist", roleid)));
- 
- 	rolcatupdate = ((Form_pg_authid) GETSTRUCT(tuple))->rolcatupdate;
- 
- 	ReleaseSysCache(tuple);
- 
- 	return rolcatupdate;
- }
- 
  /*
   * Relay for the various pg_*_mask routines depending on object kind
   */
--- 3423,3428 
*** pg_class_aclmask(Oid table_oid, Oid role
*** 3630,3636 
  	if ((mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) &&
  		IsSystemClass(table_oid, classForm) &&
  		classForm->relkind != RELKIND_VIEW &&
! 		!has_rolcatupdate(roleid) &&
  		!allowSystemTableMods)
  	{
  #ifdef ACLDEBUG
--- 3610,3616 
  	if ((mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) &&
  		IsSystemClass(table_oid, classForm) &&
  		classForm->relkind != RELKIND_VIEW &&
! 		!has_role_attribute(roleid, ROLE_ATTR_CATUPDATE) &&
  		!allowSystemTableMods)
  	{
  #ifdef ACLDEBUG
*** pg_extension_ownercheck(Oid ext_oid, Oid
*** 5051,5102 
  }
  
  /*
!  * Check whether specified role has CREATEROLE privilege (or is a superuser)
   *
!  * Note: roles do not have owners per se; instead we use this test in
!  * places where an ownership-like permissions test is needed for a role.
!  * Be sure to apply it to the role trying to do the operation, not the
!  * role being operated on!	Also note that this generally should not be
!  * considered enough privilege if the target role is a superuser.
!  * (We don't handle that consideration here because we want to give a
!  * separate error message for such cases, so the caller has to deal with it.)
   */
  bool
! has_createrole_privilege(Oid roleid)
  {
! 	bool		result = false;
! 	HeapTuple	utup;
! 
! 	/* Superusers bypass all permission checking. */
! 	if (superuser_arg(roleid))
  		return true;
  
! 	utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid));
! 	if (HeapTupleIsValid(utup))
! 	{
! 		result = ((Form_pg_authid) GETSTRUCT(utup))->rolcreaterole;
! 		ReleaseSysCache(utup);
! 	}
! 	return result;
  }
  
  bool
! has_bypassrls_privilege(Oid roleid)
  {
! 	bool		result = false;
! 	HeapTuple	utup;
  
! 	/* Superusers bypass all permission checking. */
! 	if (superuser_arg(roleid))
! 		return true;
  
! 	utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid));
! 	if (HeapTupleIsValid(utup))
! 	{
! 		result = ((Fo

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Josh Berkus

On 12/05/2014 07:59 AM, Robert Haas wrote:
> I think it's probably an important distinction, for the kinds of
> reasons Anssi mentions, but we should look for some method other than
> a system column of indicating it.  Maybe there's a magic function that
> returns a Boolean which you can call, or maybe some special clause, as
> with WITH ORDINALITY.

I thought the point of INSERT ... ON CONFLICT update was so that you
didn't have to care if it was a new row or not?

If you do care, it seems like it makes more sense to do your own INSERTs
and UPDATEs, as Django currently does.

I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff
which let me know updated|inserted|ignored, but I also don't see it as a
feature requirement for 9.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Robert Haas

On Fri, Dec 5, 2014 at 1:49 AM, Rahila Syed  wrote:
>>If that's really true, we could consider having no configuration any
>>time, and just compressing always.  But I'm skeptical that it's
>>actually true.
>
> I was referring to this for CPU utilization:
> http://www.postgresql.org/message-id/1410414381339-5818552.p...@n5.nabble.com
> 
>
> The above tests were performed on machine with configuration as follows
> Server specifications:
> Processors:Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T, 20 MB) * 2 nos
> RAM: 32GB
> Disk : HDD  450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos
> 1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s, 10,000 rpm

I think that measurement methodology is not very good for assessing
the CPU overhead, because you are only measuring the percentage CPU
utilization, not the absolute amount of CPU utilization.  It's not
clear whether the duration of the tests was the same for all the
configurations you tried - in which case the number of transactions
might have been different - or whether the number of operations was
exactly the same - in which case the runtime might have been
different.  Either way, it could obscure an actual difference in
absolute CPU usage per transaction.  It's unlikely that both the
runtime and the number of transactions were identical for all of your
tests, because that would imply that the patch makes no difference to
performance; if that were true, you wouldn't have bothered writing
it

What I would suggest is instrument the backend with getrusage() at
startup and shutdown and have it print the difference in user time and
system time.  Then, run tests for a fixed number of transactions and
see how the total CPU usage for the run differs.

Last cycle, Amit Kapila did a bunch of work trying to compress the WAL
footprint for updates, and we found that compression was pretty darn
expensive there in terms of CPU time.  So I am suspicious of the
finding that it is free here.  It's not impossible that there's some
effect which causes us to recoup more CPU time than we spend
compressing in this case that did not apply in that case, but the
projects are awfully similar, so I tend to doubt it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas

On Fri, Dec 5, 2014 at 3:11 AM, Amit Langote
 wrote:
>> I think you are right.  I think in this case we need something similar
>> to column pg_index.indexprs which is of type pg_node_tree(which
>> seems to be already suggested by Robert). So may be we can proceed
>> with this type and see if any one else has better idea.
>
> Yeah, with that, I was thinking we may be able to do something like dump a 
> Node that describes the range partition bounds or list of allowed values 
> (say, RangePartitionValues, ListPartitionValues).

That's exactly what the kind of thing I was thinking about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas

On Fri, Dec 5, 2014 at 2:18 AM, Amit Kapila  wrote:
> Do we really need to support dml or pg_dump for individual partitions?

I think we do.  It's quite reasonable for a DBA (or developer or
whatever) to want to dump all the data that's in a single partition;
for example, maybe they have the table partitioned, but also spread
across several servers.  When the data on one machine grows too big,
they want to dump that partition, move it to a new machine, and drop
the partition from the old machine.  That needs to be easy and
efficient.

More generally, with inheritance, I've seen the ability to reference
individual inheritance children be a real life-saver on any number of
occasions.  Now, a new partitioning system that is not as clunky as
constraint exclusion will hopefully be fast enough that people don't
need to do it very often any more.  But I would be really cautious
about removing the option.  That is the equivalent of installing a new
fire suppression system and then boarding up the emergency exit.
Yeah, you *hope* the new fire suppression system is good enough that
nobody will ever need to go out that way any more.  But if you're
wrong, people will die, so getting rid of it isn't prudent.  The
stakes are not quite so high here, but the principle is the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas

On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote
 wrote:
>> So, we're going to support exactly two levels of partitioning?
>> partitions with partissub=false and subpartitions with partissub=true?
>>  Why not support only one level of partitioning here but then let the
>> children have their own pg_partitioned_rel entries if they are
>> subpartitioned?  That seems like a cleaner design and lets us support
>> an arbitrary number of partitioning levels if we ever need them.
>
> Yeah, that's what I thought at some point in favour of dropping partissub 
> altogether. However, not that this design solves it, there is one question - 
> if we would want to support defining for a table both partition key and 
> sub-partition key in advance? That is, without having defined a first level 
> partition yet; in that case, what level do we associate sub-(sub-) 
> partitioning key with or more to the point where do we keep it?

Do we really need to allow that?  I think you let people partition a
toplevel table, and then partition its partitions once they've been
created.  I'm not sure there's a good reason to associate the
subpartitioning scheme with the toplevel table.  For one thing, that
forces all subpartitions to be partitioned the same way - do we want
to insist on that?  If we do, then I agree that we need to think a
little harder here.

> That would be a default partition. That is, where the tuples that don't 
> belong elsewhere (other defined partitions) go. VALUES clause of the 
> definition for such a partition would look like:
>
> (a range partition) ... VALUES LESS THAN MAXVALUE
> (a list partition) ... VALUES DEFAULT
>
> There has been discussion about whether there shouldn't be such a place for 
> tuples to go. That is, it should generate an error if a tuple can't go 
> anywhere (or support auto-creating a new one like in interval partitioning?)

I think Alvaro's response further down the thread is right on target.
But to go into a bit more detail, let's consider the three possible
cases:

- Hash partitioning.  Every key value gets hashed to some partition.
The concept of an overflow or default partition doesn't even make
sense.

- List partitioning.  Each key for which the user has defined a
mapping gets sent to the corresponding partition.  The keys that
aren't mapped anywhere can either (a) cause an error or (b) get mapped
to some default partition.  It's probably useful to offer both
behaviors.  But I don't think it requires a partitionisoverflow
column, because you can represent it some other way, such as by making
partitionvalues NULL, which is otherwise meaningless.

- Range partitioning.  In this case, what you've basically got is a
list of partition bounds and a list of target partitions.   Suppose
there are N partition bounds; then there will be N+1 targets.  Some of
those targets can be undefined, meaning an attempt to insert a key
with that value will error out.  For example, suppose the user defines
a partition for values 1-3 and 10-13.  Then your list of partition
bounds looks like this:

1,3,10,13

And your list of destinations looks like this:

undefined,firstpartition,undefined,secondpartition,undefined

More commonly, the ranges will be contiguous, so that there are no
gaps.  If you have everything <10 in the first partition, everything
10-20 in the second partition, and everything else in a third
partition, then you have bounds 10,20 and destinations
firstpartition,secondpartition,thirdpartition.  If you want values
greater than 20 to error out, then you have bounds 10,20 and
destinations firstpartition,secondpartition,undefined.

In none of this do you really have "an overflow partition".  Rather,
the first and last destinations, if defined, catch everything that has
a key lower than the lowest key or higher than the highest key.  If
not defined, you error out.

> I wonder if your suggestion of pg_node_tree plays well here. This then could 
> be a list of CONSTs or some such... And I am thinking it's a concern only for 
> range partitions, no? (that is, a multicolumn partition key)

I guess you could list or hash partition on multiple columns, too.
And yes, this is why I though of pg_node_tree.

>> > * DDL syntax (no multi-column partitioning, sub-partitioning support as 
>> > yet):
>> >
>> > -- create partitioned table and child partitions at once.
>> > CREATE TABLE parent (...)
>> > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ]
>> > [ (
>> >  PARTITION child
>> >{
>> >VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
>> >  | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
>> >}
>> >[ WITH ( ... ) ] [ TABLESPACE tbs ]
>> >  [, ...]
>> >   ) ] ;
>>
>> How are you going to dump and restore this, bearing in mind that you
>> have to preserve a bunch of OIDs across pg_upgrade?  What if somebody
>> wants to do pg_dump --table name_of_a_partition?
>>
> Assuming everything's (including partitioned relation and partitions at all 
> levels

Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-12-05 Thread Michael Paquier

On Sat, Dec 6, 2014 at 12:49 AM, Robert Haas  wrote:

> On Thu, Dec 4, 2014 at 8:45 AM, Michael Paquier
>  wrote:
> >> Here is patch which renames action_at_recovery_target to
> >> recovery_target_action everywhere.
> > Thanks, Looks good to me.
> >
> > A couple of things that would be good to document as well in
> > recovery-config.sgml:
> > - the fact that pause_at_recovery_target is deprecated.
>
> Why not just remove it altogether?  I don't think the
> backward-compatibility break is enough to get excited about, or to
> justify the annoyance of carrying an extra parameter that does (part
> of) the same thing.
>
At least we won't forget removing in the future something that has been
marked as deprecated for years. So +1 here for a simple removal, and a
mention in the future release notes.
-- 
Michael

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Robert Haas

On Thu, Dec 4, 2014 at 1:27 PM, Peter Geoghegan  wrote:
> On Thu, Dec 4, 2014 at 3:04 AM, Craig Ringer  wrote:
>> Yes, I think that's pretty important. With a negative attno so it's
>> treated as a "hidden" col that must be explicitly named to be shown and
>> won't be confused with user columns.
>
> I think that the standard for adding a new system attribute ought to
> be enormous. The only case where a new one was added post-Postgres95
> was "tableoid". I'm pretty sure that others aren't going to want to do
> it that way.

+1.  System attributes are a mess; a negative attribute number implies
not only that the column is by default hidden from view but also that
it is stored in the tuple header rather than the tuple data.  Using
one here, where we're not even talking about a property of the tuple
per se, would be a pretty goofy solution.

> Besides, I'm not entirely convinced that this is actually
> an important distinction to expose.

I think it's probably an important distinction, for the kinds of
reasons Anssi mentions, but we should look for some method other than
a system column of indicating it.  Maybe there's a magic function that
returns a Boolean which you can call, or maybe some special clause, as
with WITH ORDINALITY.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-12-05 Thread Robert Haas

On Thu, Dec 4, 2014 at 8:45 AM, Michael Paquier
 wrote:
>> Here is patch which renames action_at_recovery_target to
>> recovery_target_action everywhere.
> Thanks, Looks good to me.
>
> A couple of things that would be good to document as well in
> recovery-config.sgml:
> - the fact that pause_at_recovery_target is deprecated.

Why not just remove it altogether?  I don't think the
backward-compatibility break is enough to get excited about, or to
justify the annoyance of carrying an extra parameter that does (part
of) the same thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw does not see enums

2014-12-05 Thread Merlin Moncure

On Wed, Dec 3, 2014 at 5:17 PM, Tom Lane  wrote:
> David Fetter  writes:
>> On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote:
>>> What do you mean "reconstruct the enum"?
>
>> Capture its state at the time when IMPORT FOREIGN SCHEMA is executed.
>> Right now, if you try IMPORT SCHEMA on a foreign table with an enum in
>> it, postgresql_fdw errors out rather than trying to notice that
>> there's an enum definition which should precede creation and execute
>> it in the correct order.
>
> Oh, you think IMPORT FOREIGN SCHEMA should try to import enums?
> I doubt it.  What happens if the enum already exists locally?
> And why enums, and not domains, ranges, composite types, etc?

Probably IMPORT FOREIGN SCHEMA should not attempt to include type
dependencies.

However, if they are present in the importer (that is, the type exists
by name), it should assume that they are correct come what may.
Something like 'IMPORT FOREIGN TYPE'  would probably be needed to
translate a type between servers.  Unless the SQL standard has it or
gets it I doubt it will ever appear but the status quo isn't too bad
IMO.  Personally I have no issues with the risks involved with type
synchronizion; the problems faced are mostly academic with clean
errors and easily managed unless binary format is used with postgres
to postgres transfer (which IIRC the postgres fdw does not utilize at
this time).

User created types can't be transmitted between servers with the
existing binary format; you have to transmit them as text and hope the
structures agree.  Binary format transmission in postgres tends to be
quite a bit faster depending on the nature of the types involved,
things like ints, numerics, and timestamps tend to be much faster.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Andres Freund

On 2014-12-06 00:10:11 +0900, Michael Paquier wrote:
> On Sat, Dec 6, 2014 at 12:06 AM, Michael Paquier 
> wrote:
> 
> >
> >
> >
> > On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed 
> > wrote:
> >
> >> I attempted quick review and could not come up with much except this
> >>
> >> +   /*
> >> +* Calculate the amount of FPI data in the record. Each backup block
> >> +* takes up BLCKSZ bytes, minus the "hole" length.
> >> +*
> >> +* XXX: We peek into xlogreader's private decoded backup blocks for
> >> the
> >> +* hole_length. It doesn't seem worth it to add an accessor macro for
> >> +* this.
> >> +*/
> >> +   fpi_len = 0;
> >> +   for (block_id = 0; block_id <= record->max_block_id; block_id++)
> >> +   {
> >> +   if (XLogRecHasCompressedBlockImage(record, block_id))
> >> +   fpi_len += BLCKSZ - record->blocks[block_id].compress_len;
> >>
> >>
> >> IIUC, fpi_len in case of compressed block image should be
> >>
> >> fpi_len = record->blocks[block_id].compress_len;
> >>
> > Yep, true. Patches need a rebase btw as Heikki fixed a commit related to
> > the stats of pg_xlogdump.
> >
> 
> In any case, any opinions to switch this patch as "Ready for committer"?

Needing a rebase is a obvious conflict to that... But I guess some wider
looks afterwards won't hurt.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Stephen Frost

Amit,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
> postgres=# explain select c1 from t1;
>   QUERY PLAN
> --
>  Seq Scan on t1  (cost=0.00..101.00 rows=100 width=4)
> (1 row)
> 
> 
> postgres=# set parallel_seqscan_degree=4;
> SET
> postgres=# explain select c1 from t1;
>   QUERY PLAN
> --
>  Parallel Seq Scan on t1  (cost=0.00..25.25 rows=100 width=4)
>Number of Workers: 4
>Number of Blocks Per Workers: 25
> (3 rows)

This is all great and interesting, but I feel like folks might be
waiting to see just what kind of performance results come from this (and
what kind of hardware is needed to see gains..).  There's likely to be
situations where this change is an improvement while also being cases
where it makes things worse.

One really interesting case would be parallel seq scans which are
executing against foreign tables/FDWs..

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Stephen Frost

José,

* José Luis Tallón (jltal...@adv-solutions.net) wrote:
> On 12/04/2014 07:35 AM, Amit Kapila wrote:
> >The number of worker backends that can be used for
> >parallel seq scan can be configured by using a new GUC
> >parallel_seqscan_degree, the default value of which is zero
> >and it means parallel seq scan will not be considered unless
> >user configures this value.
> 
> The number of parallel workers should be capped (of course!) at the
> maximum amount of "processors" (cores/vCores, threads/hyperthreads)
> available.
> 
> More over, when load goes up, the relative cost of parallel working
> should go up as well.
> Something like:
> p = number of cores
> l = 1min-load
> 
> additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
> 
> (for c>1, of course)

While I agree in general that we'll need to come up with appropriate
acceptance criteria, etc, I don't think we want to complicate this patch
with that initially.  A SUSET GUC which caps the parallel GUC would be
enough for an initial implementation, imv.

> Not directly (I haven't had the time to read the code yet), but I'm
> thinking about the ability to simply *replace* executor methods from
> an extension.

You probably want to look at the CustomScan thread+patch directly then..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Michael Paquier

On Sat, Dec 6, 2014 at 12:06 AM, Michael Paquier 
wrote:

>
>
>
> On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed 
> wrote:
>
>> I attempted quick review and could not come up with much except this
>>
>> +   /*
>> +* Calculate the amount of FPI data in the record. Each backup block
>> +* takes up BLCKSZ bytes, minus the "hole" length.
>> +*
>> +* XXX: We peek into xlogreader's private decoded backup blocks for
>> the
>> +* hole_length. It doesn't seem worth it to add an accessor macro for
>> +* this.
>> +*/
>> +   fpi_len = 0;
>> +   for (block_id = 0; block_id <= record->max_block_id; block_id++)
>> +   {
>> +   if (XLogRecHasCompressedBlockImage(record, block_id))
>> +   fpi_len += BLCKSZ - record->blocks[block_id].compress_len;
>>
>>
>> IIUC, fpi_len in case of compressed block image should be
>>
>> fpi_len = record->blocks[block_id].compress_len;
>>
> Yep, true. Patches need a rebase btw as Heikki fixed a commit related to
> the stats of pg_xlogdump.
>

In any case, any opinions to switch this patch as "Ready for committer"?
-- 
Michael

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread José Luis Tallón


On 12/04/2014 07:35 AM, Amit Kapila wrote:

[snip]

The number of worker backends that can be used for
parallel seq scan can be configured by using a new GUC
parallel_seqscan_degree, the default value of which is zero
and it means parallel seq scan will not be considered unless
user configures this value.


The number of parallel workers should be capped (of course!) at the 
maximum amount of "processors" (cores/vCores, threads/hyperthreads) 
available.


More over, when load goes up, the relative cost of parallel working 
should go up as well.

Something like:
p = number of cores
l = 1min-load

additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)

(for c>1, of course)



In ExecutorStart phase, initiate the required number of workers
as per parallel seq scan plan and setup dynamic shared memory and
share the information required for worker to execute the scan.
Currently I have just shared the relId, targetlist and number
of blocks to be scanned by worker, however I think we might want
to generate a plan for each of the workers in master backend and
then share the same to individual worker.

[snip]

Attached patch is just to facilitate the discussion about the
parallel seq scan and may be some other dependent tasks like
sharing of various states like combocid, snapshot with parallel
workers.  It is by no means ready to do any complex test, ofcourse
I will work towards making it more robust both in terms of adding
more stuff and doing performance optimizations.

Thoughts/Suggestions?


Not directly (I haven't had the time to read the code yet), but I'm 
thinking about the ability to simply *replace* executor methods from an 
extension.
This could be an alternative to providing additional nodes that the 
planner can include in the final plan tree, ready to be executed.


The parallel seq scan nodes are definitively the best approach for 
"parallel query", since the planner can optimize them based on cost.
I'm wondering about the ability to modify the implementation of some 
methods themselves once at execution time: given a previously planned 
query, chances are that, at execution time (I'm specifically thinking 
about prepared statements here), a different implementation of the same 
"node" might be more suitable and could be used instead while the 
condition holds.


If this latter line of thinking is too off-topic within this thread and 
there is any interest, we can move the comments to another thread and 
I'd begin work on a PoC patch. It might as well make sense to implement 
the executor overloading mechanism alongide the custom plan API, though.

Any comments appreciated.


Thank you for your work, Amit


Regards,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Michael Paquier

On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed 
wrote:

> I attempted quick review and could not come up with much except this
>
> +   /*
> +* Calculate the amount of FPI data in the record. Each backup block
> +* takes up BLCKSZ bytes, minus the "hole" length.
> +*
> +* XXX: We peek into xlogreader's private decoded backup blocks for the
> +* hole_length. It doesn't seem worth it to add an accessor macro for
> +* this.
> +*/
> +   fpi_len = 0;
> +   for (block_id = 0; block_id <= record->max_block_id; block_id++)
> +   {
> +   if (XLogRecHasCompressedBlockImage(record, block_id))
> +   fpi_len += BLCKSZ - record->blocks[block_id].compress_len;
>
>
> IIUC, fpi_len in case of compressed block image should be
>
> fpi_len = record->blocks[block_id].compress_len;
>
Yep, true. Patches need a rebase btw as Heikki fixed a commit related to
the stats of pg_xlogdump.
-- 
Michael

Re: [HACKERS] advance local xmin more aggressively

2014-12-05 Thread Robert Haas

[ reviving a very old thread ]

On Tue, Feb 10, 2009 at 4:11 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> For example, maybe we could keep track of counts of snapshots removed
>> since the last xmin calculation, and only run this routine if the number
>> is different from zero (or some small positive integer).
>
> I think most of the callers of SnapshotResetXmin already know they
> removed something.
>
> It might be interesting for FreeSnapshot or something nearby to note
> whether the snapshot being killed has xmin = proc's xmin, and only do
> the update calculation if so.
>
> I still dislike the assumption that all resource owners are children of
> a known owner.  I suspect in fact that it's demonstrably wrong right
> now, let alone in future (cf comments in PortalRun).  If we're going to
> do this then snapmgr.c needs to track the snapshots for itself.  Of
> course that's going to make the "is it worth it" question even more
> pressing.

I've run into essentially the same problem Jeff originally complained
about with a large customer who has long-running transactions that
make extensive use of cursors.  Cursors are opened and closed over
time but it is rare for the number open to reach exactly zero, so what
ends up happening is that the backend xmin does not advance.  As you
can imagine, that doesn't work out well.

The approach I came up with initially was similar to Jeff's: start at
the topmost resource owner and traverse them all, visiting every
snapshot along the way.  But then I found this thread and saw the
comment that this might be "demonstrably wrong" and referring to the
comments in PortalRun.  Having reviewed those comments, which don't
seem to have changed much in the last five years, I can't understand
how they related to this issue.  It's true that the TopTransaction
resource owner could get swapped out under us during an internal
commit, but why should SnapshotResetXmin() have to care? It just
traverses the one that is in effect at the time it gets called.  The
only real danger I see here is that there could be *more than one*
toplevel resource owner.  I wonder if we could solve that problem by
adding a registry of active toplevel resource owners, so that if we
have a forest rather than a tree we can still find everything.

The problem I see with having snapmgr.c track the snapshots for itself
is that it is mostly duplicating of bookkeeping which is already being
done.  Since this problem doesn't affect the majority of users, it's
not desirable to add a lot of extra bookkeeping to cater to it - but
even if it did affect a lot of users, we still want it to be as cheap
as possible, and reusing the tracking that resource owners are already
doing seems like the way to get there.

I think there are a couple of things we can do to make this cheaper.
Suppose we keep track of the oldest xmin of any registered snapshot
and the number of registered snapshots that have that particular xmin.
Every time we deregister a snapshot, we check whether it is one of the
ones with the minimum xmin; if it is, we decrement the count.  When
the count reaches 0, we know that a traversal of all registered
snapshots is guaranteed to find a newer value to advertise in
MyProc->xmin; that way, we can arrange to do the work only when it's
likely to pay off.  In most cases this won't happen until the last
snapshot is unregistered, because our snapshots normally form a stack,
with newer snapshots having been taken later.  But if somebody
unregisters the oldest snapshot we'll immediately notice and
recalculate.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLOG_PARAMETER_CHANGE (WAL record) missing two params in its desc routine

2014-12-05 Thread Michael Paquier

On Fri, Dec 5, 2014 at 7:12 PM, Heikki Linnakangas 
wrote:

> On 12/05/2014 04:54 AM, Michael Paquier wrote:
>
>> Hi all,
>>
>> While reading the code in this area this morning, I noticed that
>> wal_log_hints and track_commit_timestamp are not mentioned in the desc
>> routine of XLOG_CHANGE_PARAMETER. Also, it is not mentioned in
>> postgresql.conf.sample that a value update of wal_log_hints requires a
>> system restart.
>>
>> In order to fix those things, attached are two patches:
>> - patch 1 should be applied back to 9.4 where wal_log_hints has been added
>>
>
> You got the wal_level and wal_log_hints strings backwards:
>
Oops. Thanks for double-checking.
-- 
Michael

Re: [HACKERS] Review of GetUserId() Usage

2014-12-05 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> Is the 'Only allow superusers to signal superuser-owned backends' check
> actually safe that way? I personally try to never use a superuser role
> as the login user, but grant my account a superuser role that doesn't
> inherit. But IIRC PGPROC->roleId won't change, even if a user does SET
> ROLE.

You're correct- but it's exactly the same as it is today.  If you grant
another user your role and then they 'SET ROLE' to you, they can cancel
any of your queries or terminate your backends, regardless of if those
roles have done some other 'SET ROLE'.  This change only removes the
need for those users to 'SET ROLE' to your user first.

The backend isn't considered 'superuser-owned' unless it's the login
role that's a superuser.  It might be interesting to change that to mean
'when a SET ROLE to superuser has been done', but what about security
definer functions or other transient escalation to superuser?  Would
those calls have to muck with PGPROC->roleId?

If we want to go there, it should definitely be a different patch.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Review of GetUserId() Usage

2014-12-05 Thread Andres Freund

On 2014-12-05 09:28:13 -0500, Stephen Frost wrote:
>  static int
>  pg_signal_backend(int pid, int sig)
>  {
> @@ -113,7 +117,12 @@ pg_signal_backend(int pid, int sig)
>   return SIGNAL_BACKEND_ERROR;
>   }
>  
> - if (!(superuser() || proc->roleId == GetUserId()))
> + /* Only allow superusers to signal superuser-owned backends. */
> + if (superuser_arg(proc->roleId) && !superuser())
> + return SIGNAL_BACKEND_NOSUPERUSER;
> +
> + /* Users can signal backends they have role membership in. */
> + if (!has_privs_of_role(GetUserId(), proc->roleId))
>   return SIGNAL_BACKEND_NOPERMISSION;
>  
>   /*
> @@ -141,35 +150,49 @@ pg_signal_backend(int pid, int sig)
>  }

Is the 'Only allow superusers to signal superuser-owned backends' check
actually safe that way? I personally try to never use a superuser role
as the login user, but grant my account a superuser role that doesn't
inherit. But IIRC PGPROC->roleId won't change, even if a user does SET
ROLE.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of GetUserId() Usage

2014-12-05 Thread Stephen Frost

* Stephen Frost (sfr...@snowman.net) wrote:
> * Stephen Frost (sfr...@snowman.net) wrote:
> > > 3. It messes around with pg_signal_backend().  There are currently no
> > > cases in which pg_signal_backend() throws an error, which is good,
> > > because it lets you write queries against pg_stat_activity() that
> > > don't fail halfway through, even if you are missing permissions on
> > > some things.  This patch introduces such a case, which is bad.
> > 
> > Good point, I'll move that check up into the other functions, which will
> > allow for a more descriptive error as well.
> 
> Err, I'm missing something here, as pg_signal_backend() is a misc.c
> static internal function?  How would you be calling it from a query
> against pg_stat_activity()?
> 
> I'm fine making the change anyway, just curious..

Updated patch attached which move the ereport() out of
pg_signal_backend() and into its callers, as the other permissions
checks are done, and includes the documentation changes.  The error
messages are minimally changed to match the new behvaior.

Thanks,

Stephen
From c60718e7a7ecb8d671627271533d6bd2b6878782 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 1 Dec 2014 11:13:09 -0500
Subject: [PATCH] GetUserId() changes to has_privs_of_role()

The pg_stat and pg_signal-related functions have been using GetUserId()
instead of has_privs_of_role() for checking if the current user should
be able to see details in pg_stat_activity or signal other processes,
requiring a user to do 'SET ROLE' for inheirited roles for a permissions
check, unlike other permissions checks.

This patch changes that behavior to, instead, act like most other
permission checks and use has_privs_of_role(), removing the 'SET ROLE'
need.  Documentation and error messages updated accordingly.

Per discussion with Alvaro, Peter, Adam (though not using Adam's patch),
and Robert.
---
 doc/src/sgml/func.sgml  | 13 ++---
 src/backend/utils/adt/misc.c| 39 +
 src/backend/utils/adt/pgstatfuncs.c | 19 +-
 3 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 62ec275..dbb61dc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16121,9 +16121,9 @@ SELECT set_config('log_statement_stats', 'off', false);
 pg_cancel_backend(pid int)
 
boolean
-   Cancel a backend's current query.  You can execute this against
-another backend that has exactly the same role as the user calling the
-function.  In all other cases, you must be a superuser.
+   Cancel a backend's current query.  This is also allowed if the
+calling role is a member of the role whose backend is being cancelled,
+however only superusers can cancel superuser backends.
 
   
   
@@ -16145,10 +16145,9 @@ SELECT set_config('log_statement_stats', 'off', false);
 pg_terminate_backend(pid int)
 
boolean
-   Terminate a backend.  You can execute this against
-another backend that has exactly the same role as the user
-calling the function.  In all other cases, you must be a
-superuser.
+   Terminate a backend.  This is also allowed if the calling role
+is a member of the role whose backend is being terminated, however only
+superusers can terminate superuser backends.

   
  
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 67539ec..2e74eb9 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -37,6 +37,7 @@
 #include "utils/lsyscache.h"
 #include "utils/ruleutils.h"
 #include "tcop/tcopprot.h"
+#include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/timestamp.h"
 
@@ -81,7 +82,9 @@ current_query(PG_FUNCTION_ARGS)
  * role as the backend being signaled. For "dangerous" signals, an explicit
  * check for superuser needs to be done prior to calling this function.
  *
- * Returns 0 on success, 1 on general failure, and 2 on permission error.
+ * Returns 0 on success, 1 on general failure, 2 on normal permission error
+ * and 3 if the caller needs to be a superuser.
+ *
  * In the event of a general failure (return code 1), a warning message will
  * be emitted. For permission errors, doing that is the responsibility of
  * the caller.
@@ -89,6 +92,7 @@ current_query(PG_FUNCTION_ARGS)
 #define SIGNAL_BACKEND_SUCCESS 0
 #define SIGNAL_BACKEND_ERROR 1
 #define SIGNAL_BACKEND_NOPERMISSION 2
+#define SIGNAL_BACKEND_NOSUPERUSER 3
 static int
 pg_signal_backend(int pid, int sig)
 {
@@ -113,7 +117,12 @@ pg_signal_backend(int pid, int sig)
 		return SIGNAL_BACKEND_ERROR;
 	}
 
-	if (!(superuser() || proc->roleId == GetUserId()))
+	/* Only allow superusers to signal superuser-owned backends. */
+	if (superuser_arg(proc->roleId) && !superuser())
+		return SIGNAL_BACKEND_NOSUPERUSER;
+
+	/* Users c

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Rahila Syed

I attempted quick review and could not come up with much except this

+   /*
+* Calculate the amount of FPI data in the record. Each backup block
+* takes up BLCKSZ bytes, minus the "hole" length.
+*
+* XXX: We peek into xlogreader's private decoded backup blocks for the
+* hole_length. It doesn't seem worth it to add an accessor macro for
+* this.
+*/
+   fpi_len = 0;
+   for (block_id = 0; block_id <= record->max_block_id; block_id++)
+   {
+   if (XLogRecHasCompressedBlockImage(record, block_id))
+   fpi_len += BLCKSZ - record->blocks[block_id].compress_len;


IIUC, fpi_len in case of compressed block image should be

fpi_len = record->blocks[block_id].compress_len;


Thank you,
Rahila Syed 




--
View this message in context: 
http://postgresql.nabble.com/Compression-of-full-page-writes-tp5769039p5829403.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2

2014-12-05 Thread Heikki Linnakangas


On 12/01/2014 02:44 PM, Teodor Sigaev wrote:


Initial message:
http://www.postgresql.org/message-id/5447b3ff.2080...@sigaev.ru

Second version fixes a forgotten changes in pg_am.



+   /* Get the information we need about each relevant datatypes */
+
+   if (OidIsValid(cache->config.leafType) && 
cache->config.leafType != atttype)
+   {
+   if (!OidIsValid(index_getprocid(index, 1, 
SPGIST_COMPRESS_PROC)))
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("compress method is not 
defined although input and leaf types are differ")));
+
+   fillTypeDesc(&cache->attType, cache->config.leafType);
+   }
+   else
+   {
+   fillTypeDesc(&cache->attType, atttype);
+   }
+


For some datatypes, the compress method might be useful even if the leaf 
type is the same as the column type. For example, you could allow 
indexing text datums larger than the page size, with a compress function 
that just truncates the input.


Could you find some use for this in one of the built-in or contrib 
types? Just to have something that exercises it as part of the 
regression suite. How about creating an opclass for the built-in polygon 
type that stores the bounding box, like the PostGIS guys are doing?


The documentation needs to be updated.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] superuser() shortcuts

2014-12-05 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Dec 4, 2014 at 3:59 PM, Stephen Frost  wrote:
> > I have a hard time wrapping my head around what a *lot* of our users ask
> > when they see a given error message, but if our error message is 'you
> > must have a pear-shaped object to run this command' then I imagine that
> > a new-to-PG user might think "well, I can't create pear shaped objects
> > in PG, so I guess this just isn't supported."  That's not necessairly
> > any fault of ours, but I do think "permission denied" reduces the chance
> > that there's any confusion about the situation.
> 
> This is just ridiculous.  You're postulating the existing of a person
> who thinks that a replication role is something that they buy at
> Wendi's rather than something granted by the database administrator.
> Give me a break.

I was pointing out that it might not be recognizable as a permission by
a person new to PG, yes.  I did not characterize that individual as a
database administrator, nor would they characterize themselves that way
(at least, the person I'm thinking of).

> > I do prefer my version but it's not an unfounded preference.
> 
> I never said that your preference was unfounded.  I said that I
> disagreed with it.

And this doesn't seem like it's going to change and certainly not over
email and so I'm going to go with Alvaro's approach and simply drop it.
I'll finish up the changes that I commented on above and move on.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] initdb: Improve error recovery.

2014-12-05 Thread Heikki Linnakangas


On 11/27/2014 02:28 PM, Mats Erik Andersson wrote:

Hello there,

I would like to improve error recovery of initdb when the
password file is empty. The present code declares "Error 0"
as the cause of failure. It suffices to use ferrer() since
fgets() returns NULL also at a premature EOF.


Thanks, committed.


In addition, a minor case is the need of a line feed in
order to print the error message on a line of its own
seems desirable.


Hmm. If you've piped stdout somewhere else, that produces a spurious 
empty line in stderr:


$ bin/initdb -D data-foo > /dev/null

initdb: input file "/home/heikki/pgsql.master/share/postgres.bki" does 
not belong to PostgreSQL 9.5devel

Check your installation or specify the correct path using the option -L.
initdb: removing data directory "data-foo"

I think the newline needs to go to stdout instead.

But in any case, that's just one error out of many similar ones that can 
happen during initdb. If we care enough to fix this, we should fix them 
all. Not sure it's worth the churn, though, unless you can come up with 
some kind of a simple wholesale solution, rather than adding 
fprintf(stdout, "\n") calls to every error condition.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Alvaro Herrera

Bruce Momjian wrote:
> On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
> > Standard regression tests are helpful because patch authors include new test
> > cases in the patches that stress their new options or commands.  This new 
> > test
> > framework needs to be something that internally runs the regression tests 
> > and
> > exercises DDL deparsing as the regression tests execute DDL.  That would 
> > mean
> > that new commands and options would automatically be deparse-tested by our 
> > new
> > framework as soon as patch authors add standard regress support.
> 
> Are you saying every time a new option is added to a command that a new
> regression test needs to be added?

Not necessarily -- an existing test could be modified, as well.

> We don't normally do that,

I sure hope we do have all options covered by tests.

> and it could easily bloat the regression tests over time.

We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
qualify as bloat?

> In summary, this testing will help, but it will not be fully reliable.

No testing is ever fully reliable.  If it were, there would never be
bugs.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Bruce Momjian

On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
> Standard regression tests are helpful because patch authors include new test
> cases in the patches that stress their new options or commands.  This new test
> framework needs to be something that internally runs the regression tests and
> exercises DDL deparsing as the regression tests execute DDL.  That would mean
> that new commands and options would automatically be deparse-tested by our new
> framework as soon as patch authors add standard regress support.

Are you saying every time a new option is added to a command that a new
regression test needs to be added?  We don't normally do that, and it
could easily bloat the regression tests over time.  In summary, this
testing will help, but it will not be fully reliable.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API

2014-12-05 Thread David Rowley

On 2 December 2014 at 15:36, Craig Ringer  wrote:

> On 12/01/2014 09:51 PM, Marco Nenciarini wrote:
> > I think this is a leftover, as you don't use elog afterwards.
>
> Good catch, fixed.
>
>
I've looked over this again and tested it on a windows 8.1 machine. I
cannot find any problems

The only comments about the code I have would maybe be to use some
constants like:

#define FILETIME_PER_SEC 1000L
#define FILETIME_PER_USEC 10

I had to read the Microsoft documentation to see that "A file time is a
64-bit value that represents the number of 100-nanosecond intervals that
have elapsed since 12:00 A.M. January 1, 1601 Coordinated Universal Time
(UTC)."

http://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx

The attached patch gets rid of those magic numbers, and hopefully makes it
a bit easier to see what's going on.

I agree with the lack of real need to log any sort of errors
if init_win32_gettimeofday() gets any unexpected errors while trying to
lookup GetSystemTimePreciseAsFileTime.

I'm marking this as ready for committer. It seems worth going in just for
the performance improvement alone, never mind the increased clock accuracy.

I'll leave it up to the committer to decide if it's better with or without
the attached patch.

Regards

David Rowley
diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c
index ab4f491..f4d8393 100644
--- a/src/port/gettimeofday.c
+++ b/src/port/gettimeofday.c
@@ -35,6 +35,13 @@
 static const unsigned __int64 epoch = UINT64CONST(1164447360);

 /*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_PER_SEC   1000L
+#define FILETIME_PER_USEC  10
+
+/*
  * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a
  * signature, so we can just store a pointer to whichever we find. This
  * is the pointer's type.
@@ -98,8 +105,9 @@ gettimeofday(struct timeval * tp, struct timezone * tzp)
ularge.LowPart = file_time.dwLowDateTime;
ularge.HighPart = file_time.dwHighDateTime;

-   tp->tv_sec = (long) ((ularge.QuadPart - epoch) / 1000L);
-   tp->tv_usec = (long) (((ularge.QuadPart - epoch) % 1000L) / 10);
+   tp->tv_sec = (long) ((ularge.QuadPart - epoch) / FILETIME_PER_SEC);
+   tp->tv_usec = (long) (((ularge.QuadPart - epoch) % FILETIME_PER_SEC)
+   / FILETIME_PER_USEC);

return 0;
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq pipelining

2014-12-05 Thread Heikki Linnakangas


On 12/05/2014 02:30 AM, Matt Newell wrote:



The explanation of PQgetFirstQuery makes it sound pretty hard to match
up the result with the query. You have to pay attention to PQisBusy.


PQgetFirstQuery should also be valid after
calling PQgetResult and then you don't have to worry about PQisBusy, so I
should probably change the documentation to indicate that is the preferred
usage, or maybe make that the only guaranteed usage, and say the results
are undefined if you call it before calling PQgetResult.  That usage also
makes it consistent with PQgetLastQuery being called immediately after
PQsendQuery.


I changed my second example to call PQgetFirstQuery after PQgetResult instead
of before, and that removes the need to call PQconsumeInput and PQisBusy when
you don't mind blocking.  It makes the example super simple:

PQsendQuery(conn, "INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT)
RETURNING id");
query1 = PQgetLastQuery(conn);

/* Duplicate primary key error */
PQsendQuery(conn, "UPDATE test SET id=2 WHERE id=1");
query2 = PQgetLastQuery(conn);

PQsendQuery(conn, "SELECT * FROM test");
query3 = PQgetLastQuery(conn);

while( (result = PQgetResult(conn)) != NULL )
{
curQuery = PQgetFirstQuery(conn);

if (curQuery == query1)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
if (curQuery == query2)
checkResult(conn,result,curQuery,PGRES_FATAL_ERROR);
if (curQuery == query3)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
}

Note that the curQuery == queryX check will work no matter how many results a
query produces.


Oh, that's what the PQgetLastQuery/PQgetNextQuery functions work! I 
didn't understand that before. I'd suggest renaming them to something 
like PQgetSentQuery() and PQgetResultQuery(). The first/last/next names 
made me think that they're used to iterate a list of queries, but in 
fact they're supposed to be used at very different stages.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] check-world failure: dummy_seclabel

2014-12-05 Thread Heikki Linnakangas


On 12/05/2014 07:29 AM, Adam Brightwell wrote:

All,

I've noticed that 'check-world' fails for dummy_seclabel after a 'clean'.
I believe that in commit da34731, the EXTRA_CLEAN statement should have
been removed from 'src/test/modules/dummy_seclabel/Makefile' as well.


Ah, that's why those files kept disappearing.


Attached is a proposed patch that addresses this issue.


Applied, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLOG_PARAMETER_CHANGE (WAL record) missing two params in its desc routine

2014-12-05 Thread Heikki Linnakangas


On 12/05/2014 04:54 AM, Michael Paquier wrote:

Hi all,

While reading the code in this area this morning, I noticed that
wal_log_hints and track_commit_timestamp are not mentioned in the desc
routine of XLOG_CHANGE_PARAMETER. Also, it is not mentioned in
postgresql.conf.sample that a value update of wal_log_hints requires a
system restart.

In order to fix those things, attached are two patches:
- patch 1 should be applied back to 9.4 where wal_log_hints has been added


You got the wal_level and wal_log_hints strings backwards:

rmgr: XLOGlen (rec/tot): 24/50, tx:  0, lsn: 
0/07172488, prev 0/07172418, desc: PARAMETER_CHANGE max_connections=100 
max_worker_processes=8 max_prepared_xacts=0 max_locks_per_xact=64 
wal_level=false wal_log_hints=hot_standby


Fixed that, and changed "true"/"false" to "on"/"off", as that's more 
common phrasing for boolean GUCs.



- patch 2 is master-only, and needs to be applied on top of patch 1.


Same here.

Committed both patches with those fixes, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Amit Langote



From: Amit Kapila [mailto:amit.kapil...@gmail.com] 
Sent: Friday, December 05, 2014 5:10 PM
To: Amit Langote
Cc: Jim Nasby; Robert Haas; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg 
Hackers
Subject: Re: [HACKERS] On partitioning

On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote  
wrote:
> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
> On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote  
> wrote:
> >
> > > The more SQL way would be records (composite types). That would make
> > > catalog inspection a LOT easier and presumably make it easier to change 
> > > the
> > > partitioning key (I'm assuming ALTER TYPE cascades to stored data). 
> > > Records
> > > are stored internally as tuples; not sure if that would be faster than a 
> > > List of
> > > Consts or a pg_node_tree. Nodes would theoretically allow using things 
> > > other
> > > than Consts, but I suspect that would be a bad idea.
> > >
> >
> > While I couldn’t find an example in system catalogs where a 
> > record/composite type is used, there are instances of pg_node_tree at a 
> > number of places like in pg_attrdef and others. Could you please point me 
> > to such a usage for reference?
> >
>
> > I think you can check the same by manually creating table
> > with a user-defined type.
>
> > Create type typ as (f1 int, f2 text);
> > Create table part_tab(c1 int, c2 typ);
>
> Is there such a custom-defined type used in some system catalog? Just not 
> sure how one would put together a custom type to use in a system catalog 
> given the way a system catalog is created. That's my concern but it may not 
> be valid.
>
>
>  I think you are right.  I think in this case we need something similar
> to column pg_index.indexprs which is of type pg_node_tree(which
> seems to be already suggested by Robert). So may be we can proceed
> with this type and see if any one else has better idea.

One point raised about/against pg_node_tree was the values represented therein 
would turn out to be too generalized to be used with advantage during planning. 
But, it seems we could deserialize it in advance back to the internal form 
(like an array of a struct) as part of the cached relation data. This overhead 
would only be incurred in case of partitioned tables. Perhaps this is what 
Robert suggested elsewhere.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New wal format distorts pg_xlogdump --stats

2014-12-05 Thread Heikki Linnakangas


On 12/05/2014 02:31 AM, Andres Freund wrote:

On 2014-12-05 08:58:33 +0900, Michael Paquier wrote:

On Fri, Dec 5, 2014 at 8:09 AM, Andres Freund 
wrote:


On 2014-12-04 16:26:02 +0200, Heikki Linnakangas wrote:

Yeah, that's broken.

I propose the attached. Or does anyone want to argue for adding an
XLogRecGetFPILen() accessor macro for the hole_length in xlogreader.h.

It's

not something a redo routine would need, nor most XLOG-reading

applications,

so I thought it's better to just have pg_xlogdump peek into the
DecodedBkpBlock struct directly.


I think both would be justifyable, so I don't really care for now. One
slight reason for wrapping it in xlogreader.h is that we might add
compression of some form or another soon and it'd possibly be better to
have it in xlogreader.h so pg_xlogdump doesn't have to be changed. But
that's really rather minor.


Good point about the compression patch.

I committed this as posted. It's not yet clear how the compression patch 
is going to do the compression. If we just compress all records above a 
certain size, pg_xlogdump should display the compressed and uncompressed 
sizes separately, and an accessor macro for the backup block's size 
would help much. The compression patch will almost certainly need to 
modify pg_xlogdump --stats anyway, so there's not much point trying to 
guess now what it's going to need.



If we go this road and want to be complete, you may as well add access
macros for the image offset, the block image and for the block data stuff.
That would be handy and consistent with the rest, now both approaches are
fine as long as DecodedBkpBlock is in xlogreader.h.


I don't see the point. Let's introduce that if (which I doubt a bit)
there's a user.


Agreed.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Anssi Kääriäinen

On Fri, 2014-12-05 at 00:21 -0800, Peter Geoghegan wrote:
> On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen
>  wrote:
> > For Django's use case this is a requirement. We must inform the user if
> > the save() action created a new row or if it modified an existing one.
> 
> Can you explain that in more detail, please?

Django has a post_save signal. The signal provide information of the
save operation. One piece of information is a created boolean flag. When
set to True the operation was an insert, on False it was an update.
See
https://docs.djangoproject.com/en/1.7/ref/signals/#django.db.models.signals.post_save
for details.

The created flag is typically used to perform some related action. An
example is User and UserProfile models. Each user must have an
UserProfile, so post_save can be used to create an empty userprofile on
creation of user.

If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in
Django for the existing save() method, then it needs to know if the
result was an UPDATE or INSERT. If we are going to use this for other
operations (for example bulk merge of rows to the database), it would be
very convenient to have per-row updated/created information available so
that we can fire the post_save signals for the rows. If we don't have
that information available, it means we can't fire signals, and no
signals means we can't use the bulk merge operation internally as we
have to fire the signals where that happened before.

Outside of Django there are likely similar reasons to want to know if
the result of an operation was a creation of a new row. The reason could
be creation of related row, doing some action in application layer, or
just UI message telling "object created successfully" vs "object updated
successfully".

 - Anssi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL regression test suite

2014-12-05 Thread Noah Misch

On Thu, Dec 04, 2014 at 02:42:41PM +0200, Heikki Linnakangas wrote:
> On 10/06/2014 04:21 PM, Heikki Linnakangas wrote:
> >This probably needs some further cleanup before it's ready for
> >committing. One issues is that it creates a temporary cluster that
> >listens for TCP connections on localhost, which isn't safe on a
> >multi-user system.
> 
> This issue remains. There isn't much we can do about it; SSL doesn't work
> over Unix domain sockets. We could make it work, but that's a whole
> different feature.

A large subset of the test suite could be made secure.  Omit or lock down
"trustdb", and skip affected tests.  (Perhaps have an --unsafe-tests option to
reactivate them.)  Instead of distributing frozen keys, generate all keys
on-demand.  Ensure that key files have secure file modes from the start.
Having said that, ...

> How do people feel about including this test suite in the source tree? It's
> probably not suitable for running as part of "make check-world", but it's
> extremely handy if you're working on a patch related to SSL. I'd like to
> commit this, even if it has some rough edges. That way we can improve it
> later, rather than have it fall into oblivion. Any objections?

... +1 for having this suite in the tree, even if check-world ignores it.
Echoing Tom's comment, the README should mention its security weakness.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Peter Geoghegan

On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen
 wrote:
> For Django's use case this is a requirement. We must inform the user if
> the save() action created a new row or if it modified an existing one.

Can you explain that in more detail, please?

> Another way to do this would be to expose the "excluded" alias in the
> returning clause. All columns of the excluded alias would be null in
> the case of insert (especially the primary key column), and thus if a
> query
> insert into foobar values(2, '2') on conflict (id) update set 
> other_col=excluded.other_col returning excluded.id
> returns a non-null value, then it was an update.

I don't like that idea much, TBH. Consider this:

postgres=# update upsert u set key = 1 from upsert i returning key;
ERROR:  42702: column reference "key" is ambiguous
LINE 1: update upsert u set key = 1 from upsert i returning key;
^

So, suppose this example was actually an ON CONFLICT UPDATE query. If
I similarly make the aliases in the ON CONFLICT UPDATE
("target"/"excluded") visible in the returning list, it becomes
necessary to qualify every column - an ambiguity is introduced by
making both aliases visible, since any non-qualified column in the
RETURNING clause could be from either the "target" or "excluded"
alias/RTE. This is particularly annoying for the common, simple cases.
Also, when there was an update in respect of a any given slot, how, in
general, can I be sure that *any* visible excluded.* attribute is not
null (which you suggest as a reliable proxy for the update path having
been taken)? For one thing, the unique index that arbitrates whether
or not we take the "alternative path" is not restricting to covering
non-nullable attributes. So does the user end up specifying
system/hidden atrributes, just to make what you outline work? That
seems sort of messy.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Amit Langote


From: Amit Kapila [mailto:amit.kapil...@gmail.com] 
On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote  
wrote:
> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
> On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote  
> wrote:
> >
> > > The more SQL way would be records (composite types). That would make
> > > catalog inspection a LOT easier and presumably make it easier to change 
> > > the
> > > partitioning key (I'm assuming ALTER TYPE cascades to stored data). 
> > > Records
> > > are stored internally as tuples; not sure if that would be faster than a 
> > > List of
> > > Consts or a pg_node_tree. Nodes would theoretically allow using things 
> > > other
> > > than Consts, but I suspect that would be a bad idea.
> > >
> >
> > While I couldn’t find an example in system catalogs where a 
> > record/composite type is used, there are instances of pg_node_tree at a 
> > number of places like in pg_attrdef and others. Could you please point me 
> > to such a usage for reference?
> >
>
> > I think you can check the same by manually creating table
> > with a user-defined type.
>
> > Create type typ as (f1 int, f2 text);
> > Create table part_tab(c1 int, c2 typ);
>
> Is there such a custom-defined type used in some system catalog? Just not 
> sure how one would put together a custom type to use in a system catalog 
> given the way a system catalog is created. That's my concern but it may not 
> be valid.
>

> I think you are right.  I think in this case we need something similar
> to column pg_index.indexprs which is of type pg_node_tree(which
> seems to be already suggested by Robert). So may be we can proceed
> with this type and see if any one else has better idea.

Yeah, with that, I was thinking we may be able to do something like dump a Node 
that describes the range partition bounds or list of allowed values (say, 
RangePartitionValues, ListPartitionValues).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-05 Thread Amit Kapila

On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote 
wrote:
> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
> On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:
> >
> > > The more SQL way would be records (composite types). That would make
> > > catalog inspection a LOT easier and presumably make it easier to
change the
> > > partitioning key (I'm assuming ALTER TYPE cascades to stored data).
Records
> > > are stored internally as tuples; not sure if that would be faster
than a List of
> > > Consts or a pg_node_tree. Nodes would theoretically allow using
things other
> > > than Consts, but I suspect that would be a bad idea.
> > >
> >
> > While I couldn’t find an example in system catalogs where a
record/composite type is used, there are instances of pg_node_tree at a
number of places like in pg_attrdef and others. Could you please point me
to such a usage for reference?
> >
>
> > I think you can check the same by manually creating table
> > with a user-defined type.
>
> > Create type typ as (f1 int, f2 text);
> > Create table part_tab(c1 int, c2 typ);
>
> Is there such a custom-defined type used in some system catalog? Just not
sure how one would put together a custom type to use in a system catalog
given the way a system catalog is created. That's my concern but it may not
be valid.
>

I think you are right.  I think in this case we need something similar
to column pg_index.indexprs which is of type pg_node_tree(which
seems to be already suggested by Robert). So may be we can proceed
with this type and see if any one else has better idea.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

70 matches

Mail list logo