Re: [HACKERS] snapshot too old, configured by time

2016-02-29 Thread Michael Paquier
On Tue, Mar 1, 2016 at 9:35 AM, Andres Freund  wrote:
> On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote:
>> Basically, a connection needs to remain open and interleave
>> commands with other connections, which the isolation tester does
>> just fine; but it needs to do that using a custom postgresql.conf
>> file, which TAP does just fine.  I haven't been able to see the
>> right way to get a TAP test to set up a customized installation to
>> run isolation tests against.  If I can get that working, I have
>> additional tests I can drop into that.

Launching psql from PostgresNode does not hold the connection, we
would need to reinvent/integrate the logic of existing drivers to hold
the connection context properly, but that's utterly complicated with
not that much gain.

> Check contrib/test_decoding's makefile. It does just that with
> isolationtester.

pg_isolation_regress --temp-config is the key item here, you can
enforce a test to run on a server with a wanted configuration set.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-02-29 Thread Michael Paquier
On Tue, Mar 1, 2016 at 4:38 AM, Julien Rouhaud
 wrote:
> On 29/02/2016 20:20, Fabrízio de Royes Mello wrote:
>>
>> On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud
>> > wrote:
>>>
>>> On 04/06/2015 22:10, Guillaume Lelarge wrote:
>>> > 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge > 
>>> > >>:
>>> >
>>> > 2015-01-05 17:40 GMT+01:00 Robert Haas > 
>>> > >>:
>>> >
>>> > On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane
>> 
>>> > >> wrote:
>>> > > I'd be all right with putting the data structure
>> declarations in a file
>>> > > named something like autovacuum_private.h, especially if
>> it carried an
>>> > > annotation that "if you depend on this, don't be surprised
>> if we break
>>> > > your code in future".
>>> >
>>> > Works for me.  I am not in general surprised when we do
>> things that
>>> > break my code, or anyway, the code that I'm responsible for
>>> > maintaining.  But I think it makes sense to segregate this
>> into a
>>> > separate header file so that we are clear that it is only
>>> > exposed for
>>> > the benefit of extension authors, not so that other things in
>>> > the core
>>> > system can touch it.
>>> >
>>> >
>>> > I'm fine with that too. I'll try to find some time to work on that.
>>> >
>>> >
>>> > So I took a look at this this week. I discovered, with the help of a
>>> > coworker, that I can already use the AutoVacuumShmem pointer and read
>>> > the struct. Unfortunately, it doesn't give me as much details as I would
>>> > have liked. The list of databases and tables aren't in shared memory.
>>> > They are local to the process that uses them. Putting them in shared
>>> > memory (if at all possible) would imply a much bigger patch than I was
>>> > willing to write right now.
>>> >
>>> > Thanks anyway for the help.
>>> >
>>> >
>>>
>>> Sorry to revive such an old thread.
>>>
>>> I think some hooks in the autovacuum could be enough to have good
>>> insight without exposing private structure.

Instead of introducing 4 new hooks, which do not represent a general
use actually, why don't you expose a portion of this information in
shared memory as mentioned upthread? This sounds like a good approach
to me. Your extension could then scan them as needed and put that on
view or a function. This information is now private in the autovacuum
processes, exposing them would allow plugin authors to do a bunch of
fancy things I think, in a more flexible way than those hooks. And
there is no need to add more hooks should the structure of the
autovacuum code change for a reason or another in the future.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-29 Thread Michael Paquier
On Tue, Mar 1, 2016 at 11:28 AM, Michael Paquier
 wrote:
> On Tue, Mar 1, 2016 at 6:25 AM, Alvaro Herrera  
> wrote:
>> Michael Paquier wrote:
>>
>>> 9) I have no logical explanation to explain why I am seeing all those
>>> things now.
>>
>> Happens all the time ...
>>
>> Pushed, thanks.
>
> Thanks. I am going to patch by buildfarm scripts to run those tests on
> hamster. Let's see what happens for this machine.

It looks like it works fine (see step recovery-check):
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster=2016-03-01%2002%3A34%3A26
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hamster=2016-03-01%2002%3A34%3A26=recovery-check
Let's that run for a couple of days..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove wal_level archive

2016-02-29 Thread Michael Paquier
On Tue, Mar 1, 2016 at 10:02 AM, Peter Eisentraut  wrote:
> On 2/8/16 7:34 AM, Michael Paquier wrote:
>> -   if (ControlFile->wal_level < WAL_LEVEL_HOT_STANDBY)
>> +   if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
>> Upthread it was mentioned that switching to an approach where enum
>> values are directly listed would be better. The target of an extra
>> patch on top of this one?
>
> I'm not sure what is meant by that.

The following for example, aka using only equal comparators with the
name of wal_level used:
if (ArchiveRecoveryRequested && EnableHotStandby)
{
-   if (ControlFile->wal_level < WAL_LEVEL_HOT_STANDBY)
+   if (ControlFile->wal_level == WAL_LEVEL_ARCHIVE ||
+   ControlFile->wal_level == WAL_LEVEL_MINIMAL)
ereport(ERROR,
But that's nitpicking (this was mentioned upthread), and not something
this patch should care about.

>> -   if (wal_level < WAL_LEVEL_ARCHIVE)
>> -   ereport(ERROR,
>> -
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> -errmsg("replication slots can only be
>> used if wal_level >= archive")));
>> We should still forbid the creation of replication slots if wal_level = 
>> minimal.
>
> I think I took this out because you actually can't get to that check,
> but I put it back in because it seems better for clarity.

It is possible to hit it, take for example the following set of
parameters in postgresql.conf:
max_replication_slots = 3
wal_level = minimal
max_wal_senders = 0
=# select pg_create_physical_replication_slot('toto');
ERROR:  55000: replication slots can only be used if wal_level >= archive
LOCATION:  CheckSlotRequirements, slot.c:766

If this patch gets committed before the one improving the checkpoint
skip logic when wal_level >= hot_standby regarding standby snapshots
(http://www.postgresql.org/message-id/cab7npqqaab0v25tt4sj_mgc3ahfzrioneg4e5ccegvzc0b6...@mail.gmail.com),
something to not forget is that there will be a regression: on idle
systems checkpoints won't be skipped anymore, which is now what
happens with wal_level = archive on HEAD.

Except this last concern, the patch is in a nice shape, and does what
it says it does.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-02-29 Thread Michael Paquier
Hi all,

One year and a half ago Heikki has reported that libpq could run on an
infinite loop in a couple of code paths should an OOM happen in
PQExec():
http://www.postgresql.org/message-id/547480de.4040...@vmware.com
The code paths are:
- BIND messages for getRowDescriptions(), or 'T' messages, fixed per 7b96bf44
- Code paths handling some calls to PQmakeEmptyPGresult(), fixed per 414bef3
- Start message for COPY command, not fixed yet.
The two first problems have been addressed, not the last one. And as
the last thread became long and confusing here is a revival of the
topic on a new thread, with a fresh patch, and a fresh study of the
problem submitted for this CF because it would be nice to get things
fixed, or improved in some cases, but see below...

Here is a summary of what this patch does and improves in some cases:
1) COPY IN and COPY OUT, without the patch, those two remain stuck in
an infinite loop... With the patch an error "out of memory" is
reported to the caller.
I had a look as well at some community utilities, like pgloader and
pg_bulkload, though none of them are using getCopyStart(). Do other
people have ideas to things to look at? pgadmin for example?

Now in the case of COPY_BOTH. let's take a couple of examples:
2) pg_receivexlog:
In the case of pg_receivexlog, the patch allows libpq to throw
correctly an error back to the caller, then pg_receivexlog will try to
send START_REPLICATION again at its next loop:
pg_receivexlog: could not send replication command
"START_REPLICATION": out of memory
Without the patch pg_receivexlog would just ignore the error, and at
the next iteration of ParseInput() the message obtained from server
would be treated because the message cursor does not move on in this
case.
3) pg_recvlogical
Similarly pg_recvlogical fails with the following error with the patch attached:
pg_recvlogical: could not send replication command "START_REPLICATION
SLOT "toto" LOGICAL 0/0": out of memory
And after the failure it loops back to the next attempt, and is able
to perform its streaming stuff. Without the patch, similarly to
pg_receivexlog, COPY_BOTH would take the code path of ParseInput()
once again and it would treat the message, ignoring any OOM problems
on the way.
4) libpqwalreceiver: upon noticing the OOM error with the patch
attached, a standby would fail after sending START_REPLICATION, log
the OOM error properly and attempt to restart later a new WAL receiver
process to try again. Without this patch, the WAL receiver would still
fail to start, and log the following, unhelpful message:
2016-03-01 14:07:00 JST [84058]: [22-1] db=,user=,app=,client= LOG:
invalid record length at 0/3000970
That's not really informational for the user :( Note though that in
this case the WAL receiver does not remain in an infinite loop, and
that it is able to restart properly because it fails with this
"invalid record length error", and then start streaming at its next
attempt when the WAL receiver is started again.

So, that's how things are standing. I still see that it would be a
gain in terms of failure visibility to log properly this OOM failure
in all cases. Now it is true that if there are some applications that
may expect libpq to *not* return an error and treat the COPY start
message at the next loop of ParseInput(), though by looking at what is
in-core things can be handled properly.

Thoughts? I have registered that in the CF app, and a patch is attached.
-- 
Michael
From df931f39efb9e7fd50108345ebeb53a098d0dc29 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 11 Oct 2015 20:46:40 +0900
Subject: [PATCH] Fix OOM error handling in COPY protocol of libpq

An OOM occurring while all the data needed by process from server has been
received can result in an infinite loop when parsing the output message.
getCopyStart is switched to discard a a message read from server in case of
server and any subsequent ones when receiving data from server for
PGASYNC_COPY_OUT, and not wait for any additional data when input is expected
via PGASYNC_COPY_IN. In the case of PGASYNC_COPY_BOTH, both concepts apply.
---
 .../libpqwalreceiver/libpqwalreceiver.c|  1 +
 src/interfaces/libpq/fe-exec.c | 12 
 src/interfaces/libpq/fe-protocol3.c| 71 +-
 3 files changed, 69 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index f670957..5e79b78 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -445,6 +445,7 @@ libpqrcv_PQexec(const char *query)
 		if (PQresultStatus(lastResult) == PGRES_COPY_IN ||
 			PQresultStatus(lastResult) == PGRES_COPY_OUT ||
 			PQresultStatus(lastResult) == PGRES_COPY_BOTH ||
+			PQresultStatus(lastResult) == PGRES_FATAL_ERROR ||
 			PQstatus(streamConn) == CONNECTION_BAD)
 		

[HACKERS] Logic problem in SerializeSnapshot()

2016-02-29 Thread Rushabh Lathia
Hi All,

During the testing of parallel query (with force_parallel_mode = regress),
noticed random server crash with the below stack:

#0  0x003fc84896d5 in memcpy () from /lib64/libc.so.6
#1  0x00a36867 in SerializeSnapshot (snapshot=0x1e49f40,
start_address=0x7f391e9ec728 ) at
snapmgr.c:1523
#2  0x00522a20 in InitializeParallelDSM (pcxt=0x1e49ce0) at
parallel.c:330
#3  0x006dd256 in ExecInitParallelPlan (planstate=0x1f012b0,
estate=0x1f00be8, nworkers=1) at execParallel.c:398
#4  0x006f8abb in ExecGather (node=0x1f00d00) at nodeGather.c:160
#5  0x006de42e in ExecProcNode (node=0x1f00d00) at
execProcnode.c:516
#6  0x006da4fd in ExecutePlan (estate=0x1f00be8,
planstate=0x1f00d00, use_parallel_mode=1 '\001', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x1e5e118) at execMain.c:1633

So started looking into SerializeSnapshot() and with code reading I found
that
we ignore copying the SubXID array if it has overflowed, unless the snapshot
was taken during recovey, and for this we mark the
serialized_snapshot->subxcnt
to 0. But later while copying the SubXID array we check then condition
based on
snapshot->subxcnt. We should check serialized_snapshot->subxcnt rather then
snapshot->subxcnt.

I tried hard to come up with individual test but somehow I was unable to
create testcase.

PFA patch to fix the issue.


regards,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 63e908d..b88e012 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1515,7 +1515,7 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
 	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
 	 * well in that case, so we mustn't lose them.
 	 */
-	if (snapshot->subxcnt > 0)
+	if (serialized_snapshot->subxcnt > 0)
 	{
 		Size		subxipoff = sizeof(SerializedSnapshotData) +
 		snapshot->xcnt * sizeof(TransactionId);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-02-29 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> No, the point of it would be to not have pg_dump scripts overriding
>> installed-by-default ACLs.  A newer PG version might have different
>> ideas about what those should be.  I don't think this is exactly an
>> academic concern, either: wouldn't a likely outcome of your default-roles
>> work be that some built-in functions have different initial ACLs than
>> they do today?  Good luck with that, if pg_upgrade overwrites those
>> ACLs with the previous-version values.

> As it turns out, there isn't such an issue as the default for functions
> is to allow PUBLIC to EXECUTE and therefore we don't dump out ACLs for
> most functions.  The follow-on change to this patch is to modify those
> functions to *not* have the default/NULL ACL (and also drop the explicit
> if (!superuser()) ereport() checks in those functions), which will work
> just fine and won't be overwritten during pg_upgrade because those
> functions currently just have the default ACL, which we don't dump out.

Yes, so it would probably manage to not fail during 9.6 -> 9.7 migration.
But you *won't ever again* get to change the default ACLs on those
functions.  That does not seem like a great bet from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-29 Thread Dilip Kumar
On Mon, Feb 29, 2016 at 5:18 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I didn't reproduce the regression. I had access to multicore machine but
> didn't see either regression on low clients or improvements on high clients.
> In the attached path spinlock delay was exposed in s_lock.h and used
> in LockBufHdr().
> Dilip, could you try this version of patch? Could you also run perf or
> other profiler in the case of regression. It would be nice to compare
> profiles with and without patch. We probably could find the cause of
> regression.
>

OK, I will test it, sometime in this week.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-02-29 Thread Abhijit Menon-Sen
At 2016-02-29 19:56:07 -0600, jim.na...@bluetreble.com wrote:
>
> I don't see why this would be limited to just functions. […] Am I
> missing something?

No, you are not missing anything. The specific problem I was trying to
solve involved a function, so I sketched out a solution for functions.
Once we have some consensus on whether that's an acceptable approach,
I'll extend the patch in whatever way we agree seems appropriate.

> Maybe the better way to handle this would be through ALTER EXTENSION?

That's what this (second) patch does.

> Given the audience for this, I think it'd probably be OK to just
> provide a function that does this, instead of DDL.

That seems like a promising idea. Can you suggest some possible usage?
Thanks.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-02-29 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> To make this work, you'd need a way to distinguish privileges installed
> >> by initdb from those changed later.
> 
> > To replicate whatever the current ACL is, we don't actually need to
> > make such a differentiation.  I'm not against doing so, but the only
> > point of it would be to eliminate a few extra lines being dumped out
> > which re-run those commands that initdb runs on restore.
> 
> No, the point of it would be to not have pg_dump scripts overriding
> installed-by-default ACLs.  A newer PG version might have different
> ideas about what those should be.  I don't think this is exactly an
> academic concern, either: wouldn't a likely outcome of your default-roles
> work be that some built-in functions have different initial ACLs than
> they do today?  Good luck with that, if pg_upgrade overwrites those
> ACLs with the previous-version values.

As it turns out, there isn't such an issue as the default for functions
is to allow PUBLIC to EXECUTE and therefore we don't dump out ACLs for
most functions.  The follow-on change to this patch is to modify those
functions to *not* have the default/NULL ACL (and also drop the explicit
if (!superuser()) ereport() checks in those functions), which will work
just fine and won't be overwritten during pg_upgrade because those
functions currently just have the default ACL, which we don't dump out.

Of course, it's a different story if the user changes the ACL on objects
in pg_catalog and then we change what we think the default ACL should
be, but in such a case, I'm guessing we should probably go with what the
user explicitly asked for anyway and if there's a serious enough change
in the permissions of the function then perhaps we should have a
different function instead of re-defining the existing one.

We do have some fun issues with pg_upgrade by going with this approach
of having pg_dump dump out ACLs- what happens when there's a function or
column which goes away?  If there's a non-NULL ACL on them, the restore
will just outright fail, if we don't do something more.  I'm not a huge
fan of coding into pg_dump the knowledge of every object which exists
for every version of PG we support for pg_dump though.

Regarding the default roles patch, now that it's down to only one
default role, based on the assumption that this approach with pg_dump
will solve all the other concerns, there isn't really much overlap
between the default roles and the function ACLs.  Those functions which
will be able to work with just ACLs won't have a default role and the
functions which we need a default role for will have the default NULL
ACL.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-02-29 Thread Amit Kapila
On Mon, Feb 29, 2016 at 11:10 PM, Robert Haas  wrote:
>
> On Fri, Feb 26, 2016 at 11:37 PM, Amit Kapila 
wrote:
> > On Sat, Feb 27, 2016 at 10:03 AM, Amit Kapila 
> > wrote:
> >>
> >> Here, we can see that there is a gain of ~15% to ~38% at higher client
> >> count.
> >>
> >> The attached document (perf_write_clogcontrollock_data_v6.ods) contains
> >> data, mainly focussing on single client performance.  The data is for
> >> multiple runs on different machines, so I thought it is better to
present in
> >> form of document rather than dumping everything in e-mail.  Do let me
know
> >> if there is any confusion in understanding/interpreting the data.
> >
> > Forgot to mention that all these tests have been done by reverting
> > commit-ac1d794.
>
> OK, that seems better.  But I have a question: if we don't really need
> to make this optimization apply only when everything is on the same
> page, then why even try?  If we didn't try, we wouldn't need the
> all_trans_same_page flag, which would reduce the amount of code
> change.

I am not sure if I understood your question, do you want to know why at the
first place transactions spanning more than one-page call the
function TransactionIdSetPageStatus()?  If we want to avoid trying the
transaction status update when they are on different page, then I think we
need some
major changes in TransactionIdSetTreeStatus().


>  Would that hurt anything? Taking it even further, we could
> remove the check from TransactionGroupUpdateXidStatus too.  I'd be
> curious to know whether that set of changes would improve performance
> or regress it.  Or maybe it does nothing, in which case perhaps
> simpler is better.
>
> All things being equal, it's probably better if the cases where
> transactions from different pages get into the list together is
> something that is more or less expected rather than a
> once-in-a-blue-moon scenario - that way, if any bugs exist, we'll find
> them.  The downside of that is that we could increase latency for the
> leader that way - doing other work on the same page shouldn't hurt
> much but different pages is a bigger hit.  But that hit might be
> trivial enough not to be worth worrying about.
>

In my tests, the check related to same page
in TransactionGroupUpdateXidStatus() doesn't impact performance in any way
and I think the reason is that, it happens rarely that group contain
multiple pages and even it occurs, there is hardly much impact.  So, I will
remove that check and I think thats what you also want for now.

> +   /*
> +* Now that we've released the lock, go back and wake everybody
up.  We
> +* don't do this under the lock so as to keep lock hold times to a
> +* minimum.  The system calls we need to perform to wake other
processes
> +* up are probably much slower than the simple memory writes
> we did while
> +* holding the lock.
> +*/
>
> This comment was true in the place that you cut-and-pasted it from,
> but it's not true here, since we potentially need to read from disk.
>

Okay, will change.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_dump dump catalog ACLs

2016-02-29 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> To make this work, you'd need a way to distinguish privileges installed
>> by initdb from those changed later.

> To replicate whatever the current ACL is, we don't actually need to
> make such a differentiation.  I'm not against doing so, but the only
> point of it would be to eliminate a few extra lines being dumped out
> which re-run those commands that initdb runs on restore.

No, the point of it would be to not have pg_dump scripts overriding
installed-by-default ACLs.  A newer PG version might have different
ideas about what those should be.  I don't think this is exactly an
academic concern, either: wouldn't a likely outcome of your default-roles
work be that some built-in functions have different initial ACLs than
they do today?  Good luck with that, if pg_upgrade overwrites those
ACLs with the previous-version values.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve error handling in pltcl

2016-02-29 Thread Tom Lane
Jim Nasby  writes:
> On 2/28/16 5:50 PM, Jim Nasby wrote:
>> Per discussion in [1], this patch improves error reporting in pltcl.

> I forgot to mention that this work is sponsored by Flight Aware 
> (http://flightaware.com).

Huh ... I use that site.  There's PG and pltcl code behind it?
Cool!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-02-29 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Per discussion about the best approach to reduce the amount of
> > superuser-only capabilities, this patch modifies pg_dump to dump out
> > all ACLs which exist on objects in the pg_catalog schema.
> 
> Um ... surely there are some of those that are installed by default?

There are a few, but not terribly many currently.

> To make this work, you'd need a way to distinguish privileges installed
> by initdb from those changed later.

To replicate whatever the current ACL is, we don't actually need to
make such a differentiation.  I'm not against doing so, but the only
point of it would be to eliminate a few extra lines being dumped out
which re-run those commands that initdb runs on restore.

The downside of doing so would be having to keep track of the exact ACLs
set for every object in pg_catalog which has a non-NULL ACL at initdb
time for every version of PG that the latest version of pg_dump
supports, and making sure that any changes to those get updated in
pg_dump in addition to the relevant system_views.sql change.

That's possible, but I wasn't sure it was worth it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump dump catalog ACLs

2016-02-29 Thread Tom Lane
Stephen Frost  writes:
> Per discussion about the best approach to reduce the amount of
> superuser-only capabilities, this patch modifies pg_dump to dump out
> all ACLs which exist on objects in the pg_catalog schema.

Um ... surely there are some of those that are installed by default?
To make this work, you'd need a way to distinguish privileges installed
by initdb from those changed later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Default Roles

2016-02-29 Thread Stephen Frost
All,

Attached is a stripped-down version of the default roles patch which
includes only the 'pg_signal_backend' default role.  This provides the
framework and structure for other default roles to be added and formally
reserves the 'pg_' role namespace.  This is split into two patches, the
first to formally reserve 'pg_', the second to add the new default role.

Will add to the March commitfest for review.

Thanks!

Stephen
From 4a14522ec7ec7d25c3ce9d07f6525b76f6bab598 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 29 Feb 2016 21:27:46 -0500
Subject: [PATCH 1/2] Reserve the "pg_" namespace for roles

This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.

This will allow for default roles to be provided at initdb time.
---
 doc/src/sgml/ref/psql-ref.sgml  |  8 +--
 src/backend/catalog/catalog.c   |  5 ++--
 src/backend/commands/user.c | 40 
 src/backend/utils/adt/acl.c | 41 +
 src/bin/pg_dump/pg_dumpall.c|  2 ++
 src/bin/pg_upgrade/check.c  | 40 ++--
 src/bin/psql/command.c  |  4 ++--
 src/bin/psql/describe.c |  5 +++-
 src/bin/psql/describe.h |  2 +-
 src/bin/psql/help.c |  4 ++--
 src/include/utils/acl.h |  1 +
 src/test/regress/expected/rolenames.out | 18 +++
 src/test/regress/sql/rolenames.sql  |  8 +++
 13 files changed, 166 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 6d0cb3d..76bb642 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1365,13 +1365,15 @@ testdb=
 
 
   
-\dg[+] [ pattern ]
+\dg[S+] [ pattern ]
 
 
 Lists database roles.
 (Since the concepts of users and groups have been
 unified into roles, this command is now equivalent to
 \du.)
+By default, only user-created roles are shown; supply the
+S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
 If the form \dg+ is used, additional information
@@ -1525,13 +1527,15 @@ testdb=
   
 
   
-\du[+] [ pattern ]
+\du[S+] [ pattern ]
 
 
 Lists database roles.
 (Since the concepts of users and groups have been
 unified into roles, this command is now equivalent to
 \dg.)
+By default, only user-created roles are shown; supply the
+S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
 If the form \du+ is used, additional information
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index bead2c1..d1cf45b 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -184,8 +184,9 @@ IsToastNamespace(Oid namespaceId)
  *		True iff name starts with the pg_ prefix.
  *
  *		For some classes of objects, the prefix pg_ is reserved for
- *		system objects only.  As of 8.0, this is only true for
- *		schema and tablespace names.
+ *		system objects only.  As of 8.0, this was only true for
+ *		schema and tablespace names.  With 9.6, this is also true
+ *		for roles.
  */
 bool
 IsReservedName(const char *name)
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 4baeaa2..ee37397 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/binary_upgrade.h"
+#include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
@@ -312,6 +313,17 @@ CreateRole(CreateRoleStmt *stmt)
 	}
 
 	/*
+	 * Check that the user is not trying to create a role in the reserved
+	 * "pg_" namespace.
+	 */
+	if (IsReservedName(stmt->role))
+		ereport(ERROR,
+(errcode(ERRCODE_RESERVED_NAME),
+ errmsg("role name \"%s\" is reserved",
+	 stmt->role),
+ errdetail("Role names starting with \"pg_\" are reserved.")));
+
+	/*
 	 * Check the pg_authid relation to be certain the role doesn't already
 	 * exist.
 	 */
@@ -1117,6 +1129,7 @@ RenameRole(const char *oldname, const char *newname)
 	int			i;
 	Oid			roleid;
 	ObjectAddress address;
+	Form_pg_authid authform;
 
 	rel = heap_open(AuthIdRelationId, RowExclusiveLock);
 	dsc = RelationGetDescr(rel);
@@ -1136,6 +1149,7 @@ RenameRole(const char *oldname, const char *newname)
 	 */
 
 	roleid = HeapTupleGetOid(oldtuple);
+	authform = (Form_pg_authid) GETSTRUCT(oldtuple);
 
 	if (roleid == GetSessionUserId())
 		ereport(ERROR,
@@ -1146,6 +1160,24 @@ 

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-02-29 Thread Thomas Munro
On Tue, Mar 1, 2016 at 2:46 PM, Amit Langote
 wrote:
>
> Hi,
>
> On 2016/02/29 18:05, Thomas Munro wrote:
>> On Mon, Feb 29, 2016 at 9:05 PM, Amit Langote wrote:
>>> + servers.  A transaction that is run with
>>> causal_reads set
>>> + to on is guaranteed either to see the effects of all
>>> + completed transactions run on the primary with the setting on, or 
>>> to
>>> + receive an error "standby is not available for causal reads".
>>>
>>> "A transaction that is run" means "A transaction that is run on a
>>> standby", right?
>>
>> Well, it could be any server, standby or primary.  Of course standbys
>> are the interesting case since it it was already true that if you run
>> two sequential transactions run on the primary, the second can see the
>> effect of the first, but I like the idea of a general rule that
>> applies anywhere, allowing you not to care which server it is.
>
> I meant actually in context of that sentence only.

Ok, here's a new version that includes that change, fixes a conflict
with recent commit 10b48522 and removes an accidental duplicate copy
of the README file.

-- 
Thomas Munro
http://www.enterprisedb.com


causal-reads-v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-29 Thread Michael Paquier
On Tue, Mar 1, 2016 at 6:25 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>
>> 9) I have no logical explanation to explain why I am seeing all those
>> things now.
>
> Happens all the time ...
>
> Pushed, thanks.

Thanks. I am going to patch by buildfarm scripts to run those tests on
hamster. Let's see what happens for this machine.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump dump catalog ACLs

2016-02-29 Thread Stephen Frost
All,

Per discussion about the best approach to reduce the amount of
superuser-only capabilities, this patch modifies pg_dump to dump out
all ACLs which exist on objects in the pg_catalog schema.  With this
change, follow-on trivial patches will remove explicit superuser()
checks from functions and replace them with 'REVOKE EXECUTE FROM public'
commands, allowing users to then control what users are allowed to
execute those functions.

Started as a new thread to hopefully gain more interest.  Will be
registered in the March commitfest shortly.

Thanks!

Stephen
From b2b01b498f3d9fede2e876785effd48f00feee34 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 29 Feb 2016 21:11:46 -0500
Subject: [PATCH] Make pg_dump dump ACLs for pg_catalog objects

Historically, we've avoided dumping anything about the objects in
the pg_catalog schema.  Unfortunately, this has meant that any
changes or even initial ACLs set for objects in pg_catalog are not
preserved.

Instead, dump out the ACLs which are set on objects in pg_catalog
in the same way we dump ACLs for user objects.

This is implemented by adding the notion of a 'dump component'
(such as an ACL, or comments, or policies) which can be requested
to be dumped out rather than everything.
---
 src/bin/pg_dump/common.c  |2 +-
 src/bin/pg_dump/pg_dump.c | 1576 -
 src/bin/pg_dump/pg_dump.h |   14 +-
 3 files changed, 865 insertions(+), 727 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index f798b15..507b0bf 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -437,7 +437,7 @@ AssignDumpId(DumpableObject *dobj)
 	dobj->dumpId = ++lastDumpId;
 	dobj->name = NULL;			/* must be set later */
 	dobj->namespace = NULL;		/* may be set later */
-	dobj->dump = true;			/* default assumption */
+	dobj->dump = DUMP_COMPONENT_ALL;	/* default assumption */
 	dobj->ext_member = false;	/* default assumption */
 	dobj->dependencies = NULL;
 	dobj->nDeps = 0;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 64c2673..416e6a7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1284,7 +1284,7 @@ checkExtensionMembership(DumpableObject *dobj, DumpOptions *dopt)
 	 * extension contents with something different.
 	 */
 	if (!dopt->binary_upgrade)
-		dobj->dump = false;
+		dobj->dump = DUMP_COMPONENT_NONE;
 	else
 		dobj->dump = ext->dobj.dump;
 
@@ -1306,16 +1306,20 @@ selectDumpableNamespace(NamespaceInfo *nsinfo, DumpOptions *dopt)
 	 * namespaces. If specific namespaces are being dumped, dump just those
 	 * namespaces. Otherwise, dump all non-system namespaces.
 	 */
+
 	if (table_include_oids.head != NULL)
-		nsinfo->dobj.dump = false;
+		nsinfo->dobj.dump = DUMP_COMPONENT_NONE;
 	else if (schema_include_oids.head != NULL)
 		nsinfo->dobj.dump = simple_oid_list_member(_include_oids,
-   nsinfo->dobj.catId.oid);
+   nsinfo->dobj.catId.oid) ?
+			DUMP_COMPONENT_ALL : DUMP_COMPONENT_NONE;
+	else if (strncmp(nsinfo->dobj.name, "pg_catalog", 10) == 0)
+		nsinfo->dobj.dump = DUMP_COMPONENT_ACL;
 	else if (strncmp(nsinfo->dobj.name, "pg_", 3) == 0 ||
 			 strcmp(nsinfo->dobj.name, "information_schema") == 0)
-		nsinfo->dobj.dump = false;
+		nsinfo->dobj.dump = DUMP_COMPONENT_NONE;
 	else
-		nsinfo->dobj.dump = true;
+		nsinfo->dobj.dump = DUMP_COMPONENT_ALL;
 
 	/*
 	 * In any case, a namespace can be excluded by an exclusion switch
@@ -1323,7 +1327,7 @@ selectDumpableNamespace(NamespaceInfo *nsinfo, DumpOptions *dopt)
 	if (nsinfo->dobj.dump &&
 		simple_oid_list_member(_exclude_oids,
 			   nsinfo->dobj.catId.oid))
-		nsinfo->dobj.dump = false;
+		nsinfo->dobj.dump = DUMP_COMPONENT_NONE;
 }
 
 /*
@@ -1342,7 +1346,8 @@ selectDumpableTable(TableInfo *tbinfo, DumpOptions *dopt)
 	 */
 	if (table_include_oids.head != NULL)
 		tbinfo->dobj.dump = simple_oid_list_member(_include_oids,
-   tbinfo->dobj.catId.oid);
+   tbinfo->dobj.catId.oid) ?
+DUMP_COMPONENT_ALL : DUMP_COMPONENT_NONE;
 	else
 		tbinfo->dobj.dump = tbinfo->dobj.namespace->dobj.dump;
 
@@ -1352,7 +1357,7 @@ selectDumpableTable(TableInfo *tbinfo, DumpOptions *dopt)
 	if (tbinfo->dobj.dump &&
 		simple_oid_list_member(_exclude_oids,
 			   tbinfo->dobj.catId.oid))
-		tbinfo->dobj.dump = false;
+		tbinfo->dobj.dump = DUMP_COMPONENT_NONE;
 }
 
 /*
@@ -1381,7 +1386,7 @@ selectDumpableType(TypeInfo *tyinfo, DumpOptions *dopt)
 		if (tytable != NULL)
 			tyinfo->dobj.dump = tytable->dobj.dump;
 		else
-			tyinfo->dobj.dump = false;
+			tyinfo->dobj.dump = DUMP_COMPONENT_NONE;
 		return;
 	}
 
@@ -1401,11 +1406,7 @@ selectDumpableType(TypeInfo *tyinfo, DumpOptions *dopt)
 	if (checkExtensionMembership(>dobj, dopt))
 		return;	/* extension membership overrides all else */
 
-	/* dump only types in dumpable namespaces */
-	if (!tyinfo->dobj.namespace->dobj.dump)
-		tyinfo->dobj.dump = false;
-	else
-		

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-02-29 Thread Jim Nasby

On 2/29/16 7:27 PM, Abhijit Menon-Sen wrote:

1. This adds the 'x'/DEPENDENCY_AUTO_EXTENSION type.
2. This adds an 'ALTER FUNCTION … ADD DEPENDENT FUNCTION …' command.

I split up the two because we may want the new dependency type without
going to the trouble of adding a new command. Maybe extension authors
should just insert an 'x' row into pg_depend directly?


I don't see why this would be limited to just functions. I could 
certainly see an extension that creates ease-of-use views that depend on 
the extension, or tables that have triggers that  Am I missing 
something?



I was inclined to implement it using ALTER FUNCTION, but AlterFunction()
is focused on altering the pg_proc entry for a function, so the new code
didn't fit. Ultimately, ExecAlterExtensionContentsStmt() was the closest
match, so that's where I did it.


Maybe the better way to handle this would be through ALTER EXTENSION?

Given the audience for this, I think it'd probably be OK to just provide 
a function that does this, instead of DDL. I'd be concerned about asking 
users to do raw inserts though. pg_depends isn't the easiest thing to 
grok so I suspect there'd be a lot of problems with that, resulting in 
more raw DML to try and fix things, resulting in pg_depend getting 
completely screwed up...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve error handling in pltcl

2016-02-29 Thread Jim Nasby

On 2/28/16 5:50 PM, Jim Nasby wrote:

Per discussion in [1], this patch improves error reporting in pltcl.


I forgot to mention that this work is sponsored by Flight Aware 
(http://flightaware.com).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Convert pltcl from strings to objects

2016-02-29 Thread Jim Nasby

On 2/29/16 9:57 AM, Tom Lane wrote:

plpgsql already has a similar mechanism (see PLpgSQL_function.use_count)
which you could probably copy.  But I'd advise that this is a separate
matter to be addressed in a separate patch; it has little to do with the
nominal subject matter of this patch.


Ahh, thanks for pointing that out.

Completely agree on it being a separate patch. Flight Aware is a big 
pltcl user as well as a contributor to the TCL community, so there's 
several more patches in the works. This would be one of them.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-02-29 Thread Amit Langote

Hi,

On 2016/02/29 18:05, Thomas Munro wrote:
> On Mon, Feb 29, 2016 at 9:05 PM, Amit Langote wrote:
>> + servers.  A transaction that is run with
>> causal_reads set
>> + to on is guaranteed either to see the effects of all
>> + completed transactions run on the primary with the setting on, or 
>> to
>> + receive an error "standby is not available for causal reads".
>>
>> "A transaction that is run" means "A transaction that is run on a
>> standby", right?
> 
> Well, it could be any server, standby or primary.  Of course standbys
> are the interesting case since it it was already true that if you run
> two sequential transactions run on the primary, the second can see the
> effect of the first, but I like the idea of a general rule that
> applies anywhere, allowing you not to care which server it is.

I meant actually in context of that sentence only.

>> By the way, is there some discussion in our existing
>> documentation to refer to about causal consistency in single node case?  I
>> don't know maybe that will help ease into the new feature.  Grepping the
>> existing source tree doesn't reveal the term "causal", so maybe even a
>> single line in the patch mentioning "single node operation trivially
>> implies (or does it?) causal consistency" would help.  Thoughts?
> 
> Hmm.  Where should such a thing go?  I probably haven't introduced the
> term well enough.  I thought for a moment about putting something
> here:
> 
> http://www.postgresql.org/docs/devel/static/sql-commit.html
> 
> "All changes made by the transaction become visible to others ..." --
> which others?  But I backed out, that succinct account of COMMIT is 20
> years old, and in any case visibility is tied to committing, not
> specifically to the COMMIT command.  But perhaps this patch really
> should include something there that refers back to the causal reads
> section.

I see.  I agree this is not exactly material for the COMMIT page.  Perhaps
somewhere under "Chapter 13. Concurrency Control" with cross-reference
to/from "25.5. Hot Standby".  Might be interesting to hear from others as
well.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-02-29 Thread Abhijit Menon-Sen
At 2016-01-18 11:08:19 +0530, a...@2ndquadrant.com wrote:
>
> I'm proposing to address a part of that problem by allowing extension
> dependencies to be explicitly declared for functions and objects
> created either by a user or dynamically by the extension itself—things
> that need the extension to function, but aren't a part of it.

I didn't hear any further suggestions, so here's a patch for discussion.

1. This adds the 'x'/DEPENDENCY_AUTO_EXTENSION type.
2. This adds an 'ALTER FUNCTION … ADD DEPENDENT FUNCTION …' command.

I split up the two because we may want the new dependency type without
going to the trouble of adding a new command. Maybe extension authors
should just insert an 'x' row into pg_depend directly?

I was inclined to implement it using ALTER FUNCTION, but AlterFunction()
is focused on altering the pg_proc entry for a function, so the new code
didn't fit. Ultimately, ExecAlterExtensionContentsStmt() was the closest
match, so that's where I did it.

Comments welcome. I'll add this patch to the CF.

-- Abhijit
>From 9835f0990a015431393d608c8710d9effe301c9d Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Tue, 1 Mar 2016 06:44:28 +0530
Subject: Add a DEPENDENCY_AUTO_EXTENSION dependency type

This is useful for functions that require the extension to operate, but
do not belong to the extension per se and, in particular, should not be
ignored by pg_dump.
---
 doc/src/sgml/catalogs.sgml   | 13 +
 src/backend/catalog/dependency.c |  2 ++
 src/include/catalog/dependency.h |  9 -
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 951f59b..189b771 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2864,6 +2864,19 @@
   
  
 
+
+
+ DEPENDENCY_AUTO_EXTENSION (x)
+ 
+  
+   The dependent object is not a member of the extension that is the
+   referenced object (and so should not be ignored by pg_dump), but
+   cannot function without it and should be dropped when the
+   extension itself is. The dependent object may be dropped on its
+   own as well.
+  
+ 
+

 
Other dependency flavors might be needed in future.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..a284bed 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -587,6 +587,7 @@ findDependentObjects(const ObjectAddress *object,
 		{
 			case DEPENDENCY_NORMAL:
 			case DEPENDENCY_AUTO:
+			case DEPENDENCY_AUTO_EXTENSION:
 /* no problem */
 break;
 			case DEPENDENCY_INTERNAL:
@@ -786,6 +787,7 @@ findDependentObjects(const ObjectAddress *object,
 subflags = DEPFLAG_NORMAL;
 break;
 			case DEPENDENCY_AUTO:
+			case DEPENDENCY_AUTO_EXTENSION:
 subflags = DEPFLAG_AUTO;
 break;
 			case DEPENDENCY_INTERNAL:
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..380f74a 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -61,6 +61,12 @@
  * created only during initdb.  The fields for the dependent object
  * contain zeroes.
  *
+ * DEPENDENCY_AUTO_EXTENSION ('x'): the dependent object is not a member
+ * of the extension that is the referenced object (and so should not be
+ * ignored by pg_dump), but cannot function without it and should be
+ * dropped when the extension itself is. The dependent object may be
+ * dropped on its own as well.
+ *
  * Other dependency flavors may be needed in future.
  */
 
@@ -70,7 +76,8 @@ typedef enum DependencyType
 	DEPENDENCY_AUTO = 'a',
 	DEPENDENCY_INTERNAL = 'i',
 	DEPENDENCY_EXTENSION = 'e',
-	DEPENDENCY_PIN = 'p'
+	DEPENDENCY_PIN = 'p',
+	DEPENDENCY_AUTO_EXTENSION = 'x'
 } DependencyType;
 
 /*
-- 
2.1.4

>From f06f59e8f7f25406510178fbf62b59dcfd59428c Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Tue, 1 Mar 2016 06:46:11 +0530
Subject: =?UTF-8?q?Add=20experimental=20'ALTER=20EXTENSION=20=E2=80=A6=20A?=
 =?UTF-8?q?DD=20DEPENDENT=20FUNCTION=20=E2=80=A6'=20command?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This uses the new 'x' dependency type.
---
 src/backend/commands/extension.c |  7 +--
 src/backend/nodes/copyfuncs.c|  1 +
 src/backend/nodes/equalfuncs.c   |  1 +
 src/backend/parser/gram.y| 39 ++-
 src/include/nodes/parsenodes.h   |  1 +
 src/include/parser/kwlist.h  |  1 +
 6 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 9d84b79..c33dca6 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -2985,6 +2985,7 @@ ExecAlterExtensionContentsStmt(AlterExtensionContentsStmt *stmt,
 	ObjectAddress object;
 	Relation	relation;
 	Oid			oldExtension;
+	

Re: [HACKERS] create opclass documentation outdated

2016-02-29 Thread Tom Lane
Julien Rouhaud  writes:
> On 25/02/2016 23:12, Vik Fearing wrote:
>> On 02/25/2016 09:13 PM, Julien Rouhaud wrote:
>>> I just saw that the CREATE OPERATOR CLASS documentation doesn't
>>> mention that BRIN indexes also support the STORAGE parameter.

>> I think this is the wrong approach; that parenthetical list now 
>> represents a full 50% of the available AMs.  I submit it should be 
>> removed altogether.

> With further inspection, the "Interfacing Extensions To Indexes"
> (§35.14) documentation also has a list of AMs supporting the STORAGE
> parameter. I believe having at least one page referencing which AMs
> support this parameter could be helpful for people who want to
> implement new opclass.

I think the problem here is more thoroughgoing, namely that neither
create_opclass.sgml nor xindex.sgml was touched at all for BRIN.

In create_opclass.sgml, not only do we have the list of AMs supporting
STORAGE, but there is also a paragraph describing which AMs do what
for input datatypes of FUNCTION members (around line 175).

xindex.sgml has that one list you mentioned, but there's not much
point in fixing that when BRIN indexes fail to be described either in
35.14.2 or 35.14.3, where they certainly should be.  And there are
other places throughout that chapter that say that such-and-such AMs
support each feature as it's discussed.

I'm inclined to feel that the right answer is to fix all these
omissions, not to decide that we're not maintaining this documentation
anymore.  Alvaro, I think this one's in your court.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove wal_level archive

2016-02-29 Thread Peter Eisentraut
On 2/8/16 7:34 AM, Michael Paquier wrote:
> Shouldn't backup.sgml be updated as well? Here is the portion that I
> am referring to:
> To enable WAL archiving, set the 
> configuration parameter to archive or higher,
>  to on,
> 
>  But minimal WAL does not contain enough information to reconstruct 
> the
> -data from a base backup and the WAL logs, so archive or
> +data from a base backup and the WAL logs, so replica or
>  higher must be used to enable WAL archiving
>  () and streaming replication.
> 

Checked for leftovers again and fixed one.

> 
> -In hot_standby level, the same information is logged as
> -with archive, plus information needed to reconstruct
> -the status of running transactions from the WAL. To enable read-only
> As the paragraph about the difference between hot_standby and archive
> is removed, I think that it would be better to mention that setting
> wal_level to replica allows to reconstruct data from a base backup and
> the WAL logs, *and* to run read-only queries when hot_standby is
> enabled.

Well, I think that is really only of historical interest.  The
assumption is, as long as hot_standby = on, you can run read-only
queries.  The WAL level is taken completely out of the mental
consideration, because if you have replicate at all, it's good enough.
That is part of the point of this patch.

> 
> -   if (ControlFile->wal_level < WAL_LEVEL_HOT_STANDBY)
> +   if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
> Upthread it was mentioned that switching to an approach where enum
> values are directly listed would be better. The target of an extra
> patch on top of this one?

I'm not sure what is meant by that.

> 
> -   if (wal_level < WAL_LEVEL_ARCHIVE)
> -   ereport(ERROR,
> -
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> -errmsg("replication slots can only be
> used if wal_level >= archive")));
> We should still forbid the creation of replication slots if wal_level = 
> minimal.

I think I took this out because you actually can't get to that check,
but I put it back in because it seems better for clarity.

From 574dd447b4a077267200d2ca9b8b4e185d4bb052 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 29 Feb 2016 20:01:54 -0500
Subject: [PATCH] Merge wal_level "archive" and "hot_standby" into new name
 "replica"

The distinction between "archive" and "hot_standby" existed only because
at the time "hot_standby" was added, there was some uncertainty about
stability.  This is now a long time ago.  We would like to move forward
with simplifying the replication configuration, but this distinction is
in the way, because a primary server cannot tell (without asking a
standby or predicting the future) which one of these would be the
appropriate level.

Pick a new name for the combined setting to make it clearer that it
covers all (non-logical) backup and replication uses.  The old values
are still accepted but are converted internally.
---
 doc/src/sgml/backup.sgml  |  4 ++--
 doc/src/sgml/config.sgml  | 30 +++
 doc/src/sgml/high-availability.sgml   |  2 +-
 doc/src/sgml/ref/alter_system.sgml|  2 +-
 doc/src/sgml/ref/pgupgrade.sgml   |  2 +-
 src/backend/access/rmgrdesc/xlogdesc.c|  5 +++--
 src/backend/access/transam/xact.c |  2 +-
 src/backend/access/transam/xlog.c | 20 --
 src/backend/access/transam/xlogfuncs.c|  2 +-
 src/backend/postmaster/postmaster.c   |  2 +-
 src/backend/replication/slot.c|  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |  2 +-
 src/bin/pg_controldata/pg_controldata.c   |  6 ++
 src/include/access/xlog.h | 11 +-
 src/include/catalog/pg_control.h  |  2 +-
 src/test/perl/PostgresNode.pm |  2 +-
 17 files changed, 44 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 7413666..9092cf8 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -592,7 +592,7 @@ Setting Up WAL Archiving
 

 To enable WAL archiving, set the 
-configuration parameter to archive or higher,
+configuration parameter to replica or higher,
  to on,
 and specify the shell command to use in the  configuration parameter.  In practice
@@ -1285,7 +1285,7 @@ Standalone Hot Backups
   If more flexibility in copying the backup files is needed, a lower
   level process can be used for standalone hot backups as well.
   To prepare for low level standalone hot backups, set wal_level to
-  archive or higher, archive_mode to
+  replica or higher, archive_mode to
   on, and set up an archive_command that 

Re: [HACKERS] remove wal_level archive

2016-02-29 Thread Peter Eisentraut
On 2/8/16 9:36 AM, David Steele wrote:
> -#define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)
> +#define XLogIsNeeded() (wal_level >= WAL_LEVEL_REPLICA)
> <...>
> -#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)
> +#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
> 
> Since these are identical now shouldn't one be removed?  I searched the
> code and I couldn't find anything that looked dead (i.e. XLogIsNeeded()
> && !XLogStandbyInfoActive()) but it still seems like having both could
> cause confusion.

I think this should eventually be cleaned up, but it doesn't seem
necessary in the first patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-02-29 Thread Andres Freund
On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote:
> Basically, a connection needs to remain open and interleave
> commands with other connections, which the isolation tester does
> just fine; but it needs to do that using a custom postgresql.conf
> file, which TAP does just fine.  I haven't been able to see the
> right way to get a TAP test to set up a customized installation to
> run isolation tests against.  If I can get that working, I have
> additional tests I can drop into that.

Check contrib/test_decoding's makefile. It does just that with
isolationtester.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][REVIEW]: Password identifiers, protocol aging and SCRAM protocol

2016-02-29 Thread Michael Paquier
On Mon, Feb 29, 2016 at 8:43 PM, Valery Popov  wrote:
> vpopov@vpopov-Ubuntu:~/Projects/pwdtest/postgresql$ git branch

Thanks for the input!

> 0001-Add-facility-to-store-multiple-password-verifiers.patch:2547: trailing
> whitespace.
> warning: 1 line adds whitespace errors.
> 0003-Add-pg_auth_verifiers_sanitize.patch:87: indent with spaces.
> if (!superuser())
> warning: 1 line adds whitespace errors.

Argh, yes. Those two ones have slipped though my successive rebases I
think. Will fix in my tree, I don't think that it is worth sending
again the whole series just for that though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-02-29 Thread Kevin Grittner
On Fri, Jan 8, 2016 at 5:22 PM, Alvaro Herrera  wrote:

>> People have said that issuing SQL commands directly from a TAP test
>> via DBD::Pg is not acceptable for a core feature, and (despite
>> assertions to the contrary) I see no way to test this feature with
>> existing testing mechanisms.  The bigger set of work here, if we
>> don't want this feature to go in without any testing scripts (which
>> is not acceptable IMO), is to enhance the isolation tester or
>> hybridize TAP testing with the isolation tester.
>
> Is it possible to use the PostgresNode stuff to test this?  If not,
> perhaps if you restate what additional capabilities you need we could
> look into adding them there.  I suspect that what you need is the
> ability to keep more than one session open and feed them commands;
> perhaps we could have the framework have a function that opens a psql
> process and returns a FD to which the test program can write, using the
> IPC::Run stuff (start / pump / finish).

Resubmitting for the March CF.

The main thing that changed is that I can now run all the
regression and isolation tests using installcheck with
old_snapshot_threshold = 0 and get a clean run.  That probably gets
better overall coverage than specific tests to demonstrate the
"snapshot too old" error, but of course we need those, too.  While
I can do that with hand-run psql sessions or through connectors
from different languages, I have not been able to wrangle the
testing tools we support through the build system into working for
this purpose.  (I had been hoping that the recent improvements to
the TAP testing libraries would give me the traction to get there,
but either it's still not there or my perl-fu is just too weak to
figure out how to use those features -- suggestions welcome.)

Basically, a connection needs to remain open and interleave
commands with other connections, which the isolation tester does
just fine; but it needs to do that using a custom postgresql.conf
file, which TAP does just fine.  I haven't been able to see the
right way to get a TAP test to set up a customized installation to
run isolation tests against.  If I can get that working, I have
additional tests I can drop into that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


snapshot-too-old-v4.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical - logical replication contrib module

2016-02-29 Thread Petr Jelinek

Hi

On 03/02/16 03:25, Steve Singer wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:tested, failed

Here is some more review

+- `pglogical.replication_set_add_table(set_name name, table_name regclass, 
synchronize boolean)`
+  Adds a table to replication set.
+
+  Parameters:
+  - `set_name` - name of the existing replication set
+  - `table_name` - name or OID of the table to be added to the set
+  - `synchronize` - if true, the table data is synchronized on all subscribers
+which are subscribed to given replication set, default false
+

The argument to this function is actually named "relation" not "table_name" 
though we might want to update the function to name the argument table_name.

Also we don't explain what 'synchronize' means I first thought that a value of 
false would mean that existing data won't be copied but any new changes will be.
A value of false actually seems to mean that nothing will happen with the table 
until the synchronize function is manually called. We seem to be using the word 
'synchronize' in different sense in different places I find it confusing (ie 
synchronize_data and syncronize_structure in create_subscription).



False should mean exactly what you thought it would, will have to look 
what's the issue there. Obviously UPDATEs or DELETEs won't really do 
anything when there are no data but INSERTs should be replicated even 
with false.


But I agree we need to define sychronized better, as we discussed we 
also want to change status to replicated instead of synchronized. I am 
btw thinking that default value for synchronizing schema should be false 
in the create_subsription.





*** a/contrib/pglogical/pglogical_sync.c
--- b/contrib/pglogical/pglogical_sync.c
+ static void
+ dump_structure(PGLogicalSubscription *sub, const char *snapshot)
+ {
+   charpg_dump[MAXPGPATH];
+   uint32  version;
+   int res;
+   StringInfoData  command;
+
+   if (find_other_exec_version(my_exec_path, PGDUMP_BINARY, , pg_dump))
+   elog(ERROR, "pglogical subscriber init failed to find pg_dump relative to 
binary %s",
+my_exec_path);
+
+   if (version / 100 != PG_VERSION_NUM / 100)
+   elog(ERROR, "pglogical subscriber init found pg_dump with wrong major 
version %d.%d, expected %d.%d",
+version / 100 / 100, version / 100 % 100,
+PG_VERSION_NUM / 100 / 100, PG_VERSION_NUM / 100 % 100);
+
+   initStringInfo();
+ #if PG_VERSION_NUM < 90500
+   appendStringInfo(, "%s --snapshot=\"%s\" -s -N %s -N pglogical_origin -F c -f 
\"/tmp/pglogical-%d.dump\" \"%s\"",
+ #else
+   appendStringInfo(, "%s --snapshot=\"%s\" -s -N %s -F c -f 
\"/tmp/pglogical-%d.dump\" \"%s\"",

1) I am not sure we can assume/require that the pg_dump binary be in the same 
location as the postgres binary.  I don't know think we've ever required that 
client binaries (ie psql, pg_dump, pg_restore ...) be in the same directory as 
postgres.  pg_upgrade does require this so maybe this isn't a problem in 
practice but I thought I'd point it out. Ideally wouldn't need to call an 
external program to get a schema dump but turning pg_dump into a library is 
beyond the scope of this patch.



Well for now I don't see that as big issue, especially given that the 
pg_dump needs to be same version as the server. We can make it GUC if 
needed but that's not something that seems problematic so far. I agree 
ideal solution would be to have library but that's something that will 
take much longer I am afraid.




2) I don't think we can hard-coded /tmp as the directory for the schema dump.  
I don't think will work on most windows systems and even on a unix system 
$TMPDIR might be set to something else.  Maybe writing this into pgsql_tmp 
would be a better choice.



Yeah I turned that into GUC.


Furtherdown in
pglogical_sync_subscription(PGLogicalSubscription *sub)
+   switch (status)
+   {
+   /* Already synced, nothing to do except cleanup. */
+   case SYNC_STATUS_READY:
+   MemoryContextDelete(myctx);
+   return;
+   /* We can recover from crashes during these. */
+   case SYNC_STATUS_INIT:
+   case SYNC_STATUS_CATCHUP:
+   break;
+   default:
+   elog(ERROR,
+"subscriber %s initialization failed during nonrecoverable step 
(%c), please try the setup again",
+sub->name, status);
+   break;
+   }

I think the default case needs to do something to unregister the background 
worker.  We already discussed trying to get the error message to a user in a 
better way either way there isn't any sense in this background worker being 
launched again if the error is nonrecoverable.



Agreed, for this specific case we can actually pretty easily put the 
error into some catalog and just disable the 

[HACKERS] amcheck (B-Tree integrity checking tool)

2016-02-29 Thread Peter Geoghegan
I was assigned an "action point" during the FOSDEM developer meeting:
"Post new version of btree consistency checker patch". I attach a new
WIP version of my consistency checker tool, amcheck. This patch is
proposed for 9.6, as an extension in contrib -- maybe we can still get
it in. This is the first time I've added any version of this to a
commitfest, although I've posted a couple of rough versions of this in
the past couple of years. The attached version has received a major
overhaul, and is primarily aimed at finding corruption in production
systems, although I think it will still have significant value for
debugging too. Maybe it can help with some of the B-Tree patches in
the final commitfest, for example. I also have some hope that it will
become a learning tool for people interested in how B-Tree indexes
work.

To recap, the extension adds some SQL-callable functions that verify
certain invariant conditions hold within some particular B-Tree index.
These are the conditions that index scans rely on always being true.
The tool's scope may eventually cover other AMs, including heapam, but
nbtree seems like the best place to start.

Note that no function currently checks that the index is consistent
with the heap, which would be very useful (that's probably how I'd
eventually target the heapam, actually).

Invariants


nbtree invariants that the tool verifies with just an AccessShareLock
on the relation are:

* That all items are in the correct, opclass order on each page.

* That the page "high key", if any, actually bounds the items on the page.

* That the last item on a page is less than or equal to the first item
on the next page (the page to its right). The idea here is that the
key space spans multiple pages, not just one page, so it make sense to
check the last item where we can.

With an ExclusiveLock + ShareLock, some addition invariants are verified:

* That child pages actually have their parent's downlink as a lower bound.

* Sane right links and left links at each level.

Obviously, this tool is all about distrusting the structure of a
B-Tree index. That being the case, it's not always totally clear where
to draw the line. I think I have the balance about right, though.

Interface
===

There are only 2 SQL callable functions in the extension, which are
very similar:

bt_index_check(index regclass)

bt_index_parent_check(index regclass)

The latter is more thorough than the former -- it performs all checks,
including those checks that I mentioned required an ExclusiveLock. So,
bt_index_check() only requires an AccessShareLock.
bt_index_parent_check() requires an ExclusiveLock on the index
relation, and a ShareLock on its heap relation, almost like REINDEX.
bt_index_parent_check() performs verification that is a superset of
the verification performed by bt_index_check() -- mostly, the extra
verification/work is that it verifies downlinks against child pages.

Both functions raise an error in the event of observing that an
invariant in a B-Tree was violated, such as items being out of order
on a page. I've written extensive documentation, which goes into
practical aspects of using amcheck effectively. It doesn't go into
significant detail about every invariant that is checked, but gives a
good idea of roughly what checks are performed.

I could almost justify only having one function with an argument about
the downlink/child verification, but that would be a significant
footgun given the varying locking requirements that such a function
would have.

Locking
==

We never rely on something like holding on to a buffer pin as an
interlock for correctness (the vacuum interlock thing isn't generally
necessary, because we don't look at the heap at all). We simply pin +
BT_READ lock a buffer, copy it into local memory allocated by
palloc(), and then immediately release the buffer lock and drop the
pin. This is the same in all instances. There is never more than one
buffer lock or pin held at a time.

We do, on the other hand, have a detailed rationale for why it's okay
that we don't have an ExclusiveLock on the index relation for checks
that span the key space of more than one page by following right links
to compare items across sibling pages. This isn't the same thing as
having an explicit interlock like a pin -- our interlock is one
against *recycling* by vacuum, which is based on recentGlobalXmin.
This rationale requires expert review.

Performance
==

Trying to keep the tool as simple as possible, while still making it
do verification that is as useful as possible was my priority here,
not performance. Still, verification completes fairly quickly.
Certainly, it takes far less time than having to REINDEX the index,
and doesn't need too much memory. I think that in practice most
problems that can be detected by the B-Tree checker functions will be
detected with the lighter variant.

-- 
Peter Geoghegan
From 573810d8d3c994ce1a16ecffb2f5d208c0ff93e3 Mon Sep 17 00:00:00 2001

[HACKERS] pg_resetxlog reference page reorganization

2016-02-29 Thread Peter Eisentraut
The pg_resetxlog reference page has grown over the years into an
unnavigable jungle, so here is a patch that reorganizes it to be more in
the style of the other ref pages, with a normal options list.
From a9024195e9f7a0b47e592f39938bdc9743152a70 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 29 Feb 2016 18:48:34 -0500
Subject: [PATCH] doc: Reorganize pg_resetxlog reference page

The pg_resetxlog reference page didn't have a proper options list, only
running text listing the options and some explanations of them.  This
might have worked when there were only a few options, but the list has
grown over the releases, and now it's hard to find an option and its
associated explanation.  So write out the options list as on other
reference pages.
---
 doc/src/sgml/ref/pg_resetxlog.sgml | 222 -
 1 file changed, 144 insertions(+), 78 deletions(-)

diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 1bcc5a7..fd9d0be 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -22,15 +22,9 @@
  
   
pg_resetxlog
-   -c xid,xid
-f
-n
-   -o oid
-   -x xid
-   -e xid_epoch
-   -m mxid,mxid
-   -O mxoff
-   -l xlogfile
+   option
-D datadir
   
  
@@ -76,78 +70,108 @@ Description
execute any data-modifying operations in the database before you dump,
as any such action is likely to make the corruption worse.
   
+ 
 
-  
-   The -o, -x, -e,
-   -m, -O,
-   -c
-   and -l
-   options allow the next OID, next transaction ID, next transaction ID's
-   epoch, next and oldest multitransaction ID, next multitransaction offset,
-   oldest and newest transaction IDs for which the commit time can be retrieved,
-   and WAL
-   starting address values to be set manually.  These are only needed when
-   pg_resetxlog is unable to determine appropriate values
-   by reading pg_control.  Safe values can be determined as
-   follows:
+ 
+  Options
 
-   
+  
+   
+-f
 
  
-  A safe value for the next transaction ID (-x)
-  can be determined by looking for the numerically largest
-  file name in the directory pg_clog under the data directory,
-  adding one,
-  and then multiplying by 1048576.  Note that the file names are in
-  hexadecimal.  It is usually easiest to specify the option value in
-  hexadecimal too. For example, if 0011 is the largest entry
-  in pg_clog, -x 0x120 will work (five
-  trailing zeroes provide the proper multiplier).
+  Force pg_resetxlog to proceed even if it cannot determine
+  valid data for pg_control, as explained above.
  
 
+   
 
+   
+-n
 
  
-  A safe value for the next multitransaction ID (first part of -m)
-  can be determined by looking for the numerically largest
-  file name in the directory pg_multixact/offsets under the
-  data directory, adding one, and then multiplying by 65536.
-  Conversely, a safe value for the oldest multitransaction ID (second part of
-  -m)
-  can be determined by looking for the numerically smallest
-  file name in the same directory and multiplying by 65536.
-  As above, the file names are in hexadecimal, so the easiest way to do
-  this is to specify the option value in hexadecimal and append four zeroes.
+  The -n (no operation) option instructs
+  pg_resetxlog to print the values reconstructed from
+  pg_control and values about to be changed, and then exit
+  without modifying anything. This is mainly a debugging tool, but can be
+  useful as a sanity check before allowing pg_resetxlog
+  to proceed for real.
  
 
+   
 
+   
+-V
+--version
+Display version information, then exit.
+   
+
+   
+-?
+--help
+Show help, then exit.
+   
+  
+
+  
+   The following options are only needed when
+   pg_resetxlog is unable to determine appropriate values
+   by reading pg_control.  Safe values can be determined as
+   described below.  For values that take numeric arguments, hexadecimal
+   values can be specified by using the prefix 0x.
+  
+
+  
+   
+-c xid,xid
 
  
-  A safe value for the next multitransaction offset (-O)
-  can be determined by looking for the numerically largest
-  file name in the directory pg_multixact/members under the
-  data directory, adding one, and then multiplying by 52352.  As above,
-  the file names are in hexadecimal.  There is no simple recipe such as
-  the ones above of appending zeroes.
+  Manually set the oldest and newest transaction IDs for which the commit
+  time can be retrieved.
  
-
 
-
  
   A safe value for the oldest transaction ID for which the commit time can
-  be retrieved (first part of -c) can be determined by looking
+  be retrieved (first part) can be determined by looking
   for the numerically smallest file 

Re: [HACKERS] psql completion for ids in multibyte string

2016-02-29 Thread Thomas Munro
On Fri, Feb 26, 2016 at 6:34 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, this is the second patch plitted out. This allows
> multibyte names to be completed in psql.
>
> At Fri, 06 Nov 2015 11:47:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20151106.114717.170526268.horiguchi.kyot...@lab.ntt.co.jp>
>> At Thu, 5 Nov 2015 18:32:59 +0900, Amit Langote 
>>  wrote in <563b224b.3020...@lab.ntt.co.jp>
>> > On 2015/11/05 18:10, Kyotaro HORIGUCHI wrote:
>> > > Hello. I don't know whether this is a bug fix or improvement,
>> >
>> > Would it be 50-50? :-)
>>
>> Yeah, honestly saying, I doubt that this is valuable but feel
>> uneasy to see some of the candidates vanish as completon proceeds
>> for no persuasive reason.

+1 from me, it's entirely reasonable to want to name database objects
in any human language and use auto-completion.  It's not working today
as you showed.

> The current version of tab-completion failed to complete names
> with multibyte characters.
>
> =# create table いろは (あ int);
> =# create table いこい (あ int);
> =# drop table 
> "いろは" hint_plan.   pg_toast.
> "いこい" information_schema.  pg_toast_temp_1.
> ALL IN TABLESPACEpg_catalog.  public.
> dbms_stats.  pg_temp_1.
> postgres=# alter table "い
> =# drop table "い
> =# drop table "い   /* No candidate offered */
>
> This is because _complet_from_query counts the string length in
> bytes, instead of characters. With this patch the candidates will
> appear.
>
> =# drop table "い
> "いこい"  "いろは"
> postgres=# drpo table "い

The patch looks correct to me: it counts characters rather than bytes,
which is the right thing to do because the value is passed to substr()
which also works in characters rather than bytes.  I tested with
"éclair", and without the patch, tab completion doesn't work if you
press tab after 'é'.  With the patch it does.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-02-29 Thread Alvaro Herrera
Petr Jelinek wrote:

> I wonder if it would be acceptable to create new info flag for RM_SEQ_ID
> that would behave just like XLOG_SEQ_LOG but would be used only for the
> nontransactional updates (nextval) so that decoding could easily
> differentiate between transactional and non-transactional update of sequence
> and then just either call the callback immediately or add the change to
> reorder buffer based on that. The redo code could just have simple OR
> expression to behave same with both of the info flags.
> 
> Seems like simpler solution than building all the tracking code on the
> decoding side to me.

Given the mess in Craig's description, the new info flag sounds a much
more reasonable approach to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GCC 6 warning fixes

2016-02-29 Thread Thomas Munro
On Sat, Feb 20, 2016 at 5:14 PM, Peter Eisentraut  wrote:
> Here are three patches to fix new warnings in GCC 6.
>
> 0001 is apparently a typo.

Right, looks like it.  Builds and tests OK with this change (though I
didn't get any warning from GCC6.0.0 -Wall for this one).

> 0002 was just (my?) stupid code to begin with.

Right, it makes sense to define QL_HELP in just one translation unit
with external linkage.  Builds and works fine.  I got the 'defined but
not used' warning from GCC6 and it went away with this patch.

> 0003 is more of a workaround.  There could be other ways address this, too.

This way seems fine to me (you probably want the function to continue
to exist rather than, say, becoming a macro evaluating to false on
non-WIN32, if this gets backpatched).  I got this warning from GCC6
and it went away with this patch.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to install config/missing

2016-02-29 Thread Alvaro Herrera
Tom Lane wrote:

> Given our inability to come to a consensus on rejiggering the uses of
> "missing", I think maybe we should just apply the original patch and
> call it good.

For the record: Tom applied this patch as commit dccf8e9e608824ce15.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-02-29 Thread Petr Jelinek



On 29/02/16 03:23, Craig Ringer wrote:

On 17 December 2015 at 10:08, Craig Ringer > wrote:

On 15 December 2015 at 20:17, Andres Freund > wrote:


I think this is quite the wrong approach. You're calling the logical
decoding callback directly from decode.c, circumventing
reorderbuffer.c. While you don't need the actual reordering, you
*do*
need the snapshot integration.


Yeah. I thought I could get away without it because I could use
Form_pg_sequence.sequence_name to bypass any catalog lookups at all,
but per upthread that field should be ignored and can't be used.

As you since noticed you can't just
look into the sequence tuple and be happy with that, you need to do
pg_class lookups et al. Furhtermore callbacks can validly expect
to have
a snapshot set.


Good point. Just because I don't need one doesn't mean others won't,
and all the other callbacks do.

I'll have a read over reorderbuffer.c and see if I can integrate
this properly.

At the very least what you need to do is to SetupHistoricSnapshot(),
then lookup the actual identity of the sequence using
RelidByRelfilenode() and only then you can call the callback.


Yeah. That means it's safe for the callback to do a syscache lookup
for the sequence name by oid.


I've revisited this patch in an attempt to get a corrected version ready
for the next CF.

Turns out it's not that simple.


Sequences advances aren't part of a transaction (though they always
happen during one) and proceed whether the xact that happened to trigger
them commits or rolls back. So it doesn't make sense to add them to a
reorder buffer for an xact. We want to send it immediately, not
associate it with some arbitrary xact that just happened to use the last
value in the cache that might not commit for ages.


But then when do we send it? If sent at the moment of decoding the
advance from WAL then it'll always be sent to the client between the
commit of one xact and the begin of another, because it'll be sent while
we're reading xlog and populating reorder buffers. For advance of an
already-created sequence advance that's what we want, but CREATE
SEQUENCE, ALTER SEQUENCE etc also log RM_SEQ_ID XLOG_SEQ_LOG operations.
The xact that created the sequence isn't committed yet so sending the
advance to the downstream will lead to attempting to advance a sequence
that doesn't yet exist. Similarly ALTER SEQUENCE emits a new
XLOG_SEQ_LOG with the updated Form_pg_sequence. All fine for physical
replication but rather more challenging for logical replication since
sometimes we want the sequence change to be associated with a particular
xact and sometimes not.


So the reorder buffer has to keep track of sequences. It must check to
see if a catalog change is a sequence creation and if so mark the
sequence as pending, keeping a copy of the Form_pg_sequence that's
updated to the latest version as the xact proceeds and writes updates to
the sequence. On commit the sequence advance is replayed at the end of
the xact using the snapshot of the newly committed xact; on rollback
it's discarded since the sequence never became visible to anyone else.
We can safely assert that that sequence will not be updated by any other
xact until this one commits.


In reorderbuffer.c, there's a test for relation->rd_rel->relkind ==
RELKIND_SEQUENCE that skips changes to sequence relations. We'll want to
replicate sequence catalog updates as DDL via event triggers and deparse
so that's OK, but I think we'll need to make a note of sequence creation
here to mark new sequences as uncommitted.


If we see a sequence change that isn't for an uncommitted newly-created
sequence we make an immediate call through the reorder buffer to send
the sequence update info to the client. That's OK even if it's something
like ALTER SEQUENCE ... INCREMENT 10 RENAME TO othername; . The ALTER
... RENAME part gets sent with the xact that did it when/if it commits
since it's part of the pg_class tuple for the sequence; the INCREMENT
gets sent immediately since it's part of the Form_pg_sequence. That
matches what happens locally, and it means there's no need to keep track
of every sequence, only new ones.


On commit or rollback of an xact that creates a sequence we pop the
sequence oid from the ordered list of uncommitted sequence oids that
must be kept by the decoding session.


If we land up decoding the same WAL again we might send sequence updates
that temporarily move a sequence backwards then advance it again when we
replay the newer updates. That's the same as hot standby and seems fine.
You obviously won't be able to safely get new values from sequences
that're replicated from an upstream on the downstream anyway - and I
anticipate that logical decoding receivers will probably want 

Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-29 Thread Alvaro Herrera
Michael Paquier wrote:

> 9) I have no logical explanation to explain why I am seeing all those
> things now.

Happens all the time ...

Pushed, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Issue with NULLS LAST, with postgres_fdw sort pushdown

2016-02-29 Thread Rajkumar Raghuwanshi
Hi,

I am testing postgres_fdw sort pushdown feature for PostgreSQL 9.6 DB, and
I observed below issue.

*Observation: *If giving nulls last option with the order by clause as
'desc nulls last', remote query is not considering nulls last and giving
wrong result in 9.6 version. while in 9.5 it is giving proper result.

for testing, I have a table "fdw_sort_test" in foreign server for which
postgres_fdw, foreign table created in local server.

 db2=# select * from fdw_sort_test ;
 id | name
+--
  1 | xyz
  3 |
  2 | abc
  4 | pqr
(4 rows)

   on version 9.6 :

 db1=# select * from fdw_sort_test order by name desc
nulls last;
  id | name
 +--
   3 |
   1 | xyz
   4 | pqr
   2 | abc
 (4 rows)

 db1=# explain verbose select * from fdw_sort_test
order by name desc nulls last;
QUERY
PLAN
 --
--
  Foreign Scan on public.fdw_sort_test
(cost=100.00..129.95 rows=561 width=122)
Output: id, name
Remote SQL: SELECT id, name FROM
public.fdw_sort_test ORDER BY name DESC
 (3 rows)


on version 9.5 :
 db1=# select * from fdw_sort_test order by name desc
nulls last;
   id | name
  +--
1 | xyz
4 | pqr
2 | abc
3 |
  (4 rows)

 db1=# explain verbose select * from fdw_sort_test
order by name desc nulls last;
QUERY
PLAN
 --

  Sort  (cost=152.44..153.85 rows=561 width=122)
Output: id, name
Sort Key: fdw_sort_test.name DESC NULLS LAST
->  Foreign Scan on public.fdw_sort_test
(cost=100.00..126.83 rows=561 width=122)
  Output: id, name
  Remote SQL: SELECT id, name FROM
public.fdw_sort_test

*steps to reproduce : *

--connect to sql
\c postgres postgres
--create role and database db1, will act as local server
create role db1 password 'db1' superuser login;
create database db1 owner=db1;
grant all on database db1 to db1;

--create role and database db2, will act as foreign server
create role db2 password 'db2' superuser login;
create database db2 owner=db2;
grant all on database db2 to db2;

--connect to db2 and create a table
\c db2 db2
create table fdw_sort_test (id integer, name varchar(50));
insert into fdw_sort_test values (1,'xyz');
insert into fdw_sort_test values (3,null);
insert into fdw_sort_test values (2,'abc');
insert into fdw_sort_test values (4,'pqr');

--connect to db1 and create postgres_fdw
\c db1 db1
create extension postgres_fdw;
create server db2_link_server foreign data wrapper postgres_fdw options
(host 'db2_machine_ip', dbname 'db2', port 'db_machine_port_no');
create user mapping for db1 server db2_link_server options (user 'db2',
password 'db2');

--create a foreign table
create foreign table fdw_sort_test (id integer, name varchar(50)) server
db2_link_server;

--run the below query and checkout the output
select * from fdw_sort_test order by name desc nulls last;

--check the explain plan
explain plan select * from fdw_sort_test order by name desc nulls last;

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] get current log file

2016-02-29 Thread Armor
Yes, if we cannot find a perfect solution, we need to wait. 
Actually, the customer need a unified interface to access the status of 
database, so we implement it. 


--
Jerry Yu
https://github.com/scarbrofair


 




-- Original --
From:  "Robert Haas";;
Date:  Fri, Feb 26, 2016 07:33 PM
To:  "Armor"; 
Cc:  "Tom Lane"; "Euler Taveira"; 
"Alvaro Herrera"; 
"pgsql-hackers"; 
Subject:  Re: [HACKERS] get current log file



On Fri, Feb 26, 2016 at 8:31 AM, Armor  wrote:
> I think I know what you are concerned about. May be I did not explain my
> solution very clearly.
> (i) Using a variable named last_syslogger_file_time replace
> first_syslogger_file_time in syslogger.c. When postmaster initialize logger
> process,   last_syslogger_file_time will be assign the time stamp when
> logger start, then fork the child process for logger. Later logger will
> create a log file based on last_syslogger_file_time . And
> last_syslogger_file_time in the postmaster process will be inherited by
> other  auxiliary processes
> (ii) when pgstat process initialize, it will read  last_syslogger_file_time
> from pg stat file of last time (because pgstat process will write it to pg
> stat file). And then pgstat process will get last_syslogger_file_time
> inherit from postmaster,  if this version of  last_syslogger_file_time is
> larger then that read from the stat file, it means logger create a new log
> file so use it as the latest value; else means pgstat process crashed
> before, so it need to use the value from stat file as the latest.
> (iii) when logger rotate a log file, it will assign time stamp to
> last_syslogger_file_time  and send it to pg_stat process. And pg_stat
> process will write last_syslogger_file_time to stat file so can be read by
> other backends.
> () Adding a stat function named pg_stat_get_log_file_name, when user
> call it, it will read  last_syslogger_file_time from stat file and construct
> the log file name based on log file name format and
> last_syslogger_file_time, return the log file name eventually.
>
> However, there is a risk for this solution: when logger create a new log
> file and then try to send new last_syslogger_file_time to pg_stat process,
> and pg_stat process crash at this moment, so the new pg_stat process cannot
> get the latest  last_syslogger_file_time. However, I think this case is a
> corner case.

I don't think we're going to accept this feature if it might fail in
corner cases.  And that design seems awfully complex.

The obvious way to implement this, to me at least, seems to be for the
syslogger to write a file someplace in the data directory containing
the name of the current log file.  When it switches log files, it
rewrites that file.  When you want to know what the current logfile
is, you read that file.

But there's one thing I'm slightly baffled about: why would you
actually need this?  I mean, it seems like a good idea to set
log_filename to a pattern that makes the name of the current logfile
pretty well predictable.  If not, maybe you should just fix that.
Also, if not on Windows, if you do get confused about which logfile is
active, you could just use lsof on the log_directory to figure out
which file the syslogger has open.  I just can't really remember
having a problem with this, and I'm wondering why someone would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Minor typo in syncrep.c

2016-02-29 Thread Alvaro Herrera
Amit Langote wrote:
> On 2016/02/03 17:50, Amit Langote wrote:
> > Attached patch removes an extraneous word in the comment above
> 
> I kept reading and found one place in a comment within the function where
> a word is most probably missing,  Attached fixes it.

Pushed both, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-02-29 Thread Petr Jelinek

Hi,

attached is the newest version of the patch.

I removed the registry, renamed the 'send' to 'emit', documented the 
callback parameters properly. I also added the test to ddl.sql for the 
serialization and deserialization (and of course found a bug there) and 
in general fixed all the stuff Andres reported.


(see more inline)

On 28/02/16 22:55, Andres Freund wrote:






+void
+ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn,
+ bool transactional, const 
char *prefix, Size msg_sz,
+ const char *msg)
+{
+   ReorderBufferTXN *txn = NULL;
+
+   if (transactional)
+   {
+   ReorderBufferChange *change;
+
+   txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+   Assert(xid != InvalidTransactionId);
+   Assert(txn != NULL);
+
+   change = ReorderBufferGetChange(rb);
+   change->action = REORDER_BUFFER_CHANGE_MESSAGE;
+   change->data.msg.transactional = true;
+   change->data.msg.prefix = pstrdup(prefix);
+   change->data.msg.message_size = msg_sz;
+   change->data.msg.message = palloc(msg_sz);
+   memcpy(change->data.msg.message, msg, msg_sz);
+
+   ReorderBufferQueueChange(rb, xid, lsn, change);
+   }
+   else
+   {
+   rb->message(rb, txn, lsn, transactional, prefix, msg_sz, msg);
+   }
+}



This approach prohibts catalog access when processing a nontransaction
message as there's no snapshot set up.



Hmm I do see usefulness in having snapshot, although I wonder if that does
not kill the point of non-transactional messages.


I don't see how it would? It'd obviously have to be the catalog/historic
snapshot a transaction would have had if it started in that moment in
the original WAL stream?



Question is then though which snapshot should the message see,
base_snapshot of transaction?


Well, there'll not be a transaction, but something like snapbuild.c's
->snapshot ought to do the trick.



Ok I added interface which returns either existing snapshot or makes new 
one, seems like the most reasonable thing to do to me.





That would mean we'd have to call SnapBuildProcessChange for
non-transactional messages which we currently avoid. Alternatively we
could probably invent lighter version of that interface that would
just make sure builder->snapshot is valid and if not then build it


I think the latter is probably the direction we should go in.



I am honestly sure if that's a win or not.


I think it'll be confusing (bug inducing) if there's no snapshot for
non-transactional messages but for transactional ones, and it'll
severely limit the usefulness of the interface.



Nono, I meant I am not sure if special interface is a win over just 
using SnapBuildProcessChange() in practice.



--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 2f037a757d9cec09f04457d82cdd1256b8255b78 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 24 Feb 2016 17:02:36 +0100
Subject: [PATCH] Logical Decoding Messages

---
 contrib/test_decoding/Makefile  |  2 +-
 contrib/test_decoding/expected/ddl.out  | 21 --
 contrib/test_decoding/expected/messages.out | 56 
 contrib/test_decoding/sql/ddl.sql   |  3 +-
 contrib/test_decoding/sql/messages.sql  | 17 +
 contrib/test_decoding/test_decoding.c   | 19 ++
 doc/src/sgml/func.sgml  | 45 +
 doc/src/sgml/logicaldecoding.sgml   | 37 +++
 src/backend/access/rmgrdesc/Makefile|  4 +-
 src/backend/access/rmgrdesc/logicalmsgdesc.c| 41 
 src/backend/access/transam/rmgr.c   |  1 +
 src/backend/replication/logical/Makefile|  2 +-
 src/backend/replication/logical/decode.c| 49 ++
 src/backend/replication/logical/logical.c   | 38 +++
 src/backend/replication/logical/logicalfuncs.c  | 27 
 src/backend/replication/logical/message.c   | 87 +
 src/backend/replication/logical/reorderbuffer.c | 67 +++
 src/backend/replication/logical/snapbuild.c | 19 ++
 src/bin/pg_xlogdump/rmgrdesc.c  |  1 +
 src/include/access/rmgrlist.h   |  1 +
 src/include/catalog/pg_proc.h   |  4 ++
 src/include/replication/logicalfuncs.h  |  2 +
 src/include/replication/message.h   | 41 
 src/include/replication/output_plugin.h | 13 
 src/include/replication/reorderbuffer.h | 21 ++
 src/include/replication/snapbuild.h |  2 +
 26 files changed, 608 insertions(+), 12 deletions(-)
 create mode 100644 

Re: [HACKERS] [COMMITTERS] pgsql: Add isolationtester spec for old heapam.c bug

2016-02-29 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> guaibasaurus is not a particularly slow machine, and it's not using any
> >> special build flags AFAICT.  So I'm not sure what to make of this case,
> >> except that it proves the timing problem can manifest on normal builds.
> 
> > Hmm, I suppose I could fix this by using three different advisory locks
> > rather than a single one.  (My assumption is that the timing dependency
> > is the order in which the backends are awakened when the advisory lock
> > is released.)  I would release the locks one by one rather than all
> > together.
> 
> Sounds plausible.

Pushed.  We'll see how they behave now.

> You would probably need several seconds' pg_sleep() in between the
> lock releases to ensure that even on slow/overloaded machines, there's
> enough time for all wakened backends to do what they're supposed to
> do.

I don't really like sleeps in tests, because instead of 0.01 seconds in
the normal case it now takes 10.01 seconds.  But fixing it is more
trouble than it's probably worth, so I added 5s sleeps.

If someone wants to get rid of that idle time (i.e. have a mechanism
that causes it to stop sleeping when the update is done) I would be
happy to give it a look.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-02-29 Thread Julien Rouhaud
On 29/02/2016 20:20, Fabrízio de Royes Mello wrote:
> 
> On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud
> > wrote:
>>
>> On 04/06/2015 22:10, Guillaume Lelarge wrote:
>> > 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge  
>> > >>:
>> >
>> > 2015-01-05 17:40 GMT+01:00 Robert Haas  
>> > >>:
>> >
>> > On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane
> 
>> > >> wrote:
>> > > I'd be all right with putting the data structure
> declarations in a file
>> > > named something like autovacuum_private.h, especially if
> it carried an
>> > > annotation that "if you depend on this, don't be surprised
> if we break
>> > > your code in future".
>> >
>> > Works for me.  I am not in general surprised when we do
> things that
>> > break my code, or anyway, the code that I'm responsible for
>> > maintaining.  But I think it makes sense to segregate this
> into a
>> > separate header file so that we are clear that it is only
>> > exposed for
>> > the benefit of extension authors, not so that other things in
>> > the core
>> > system can touch it.
>> >
>> >
>> > I'm fine with that too. I'll try to find some time to work on that.
>> >
>> >
>> > So I took a look at this this week. I discovered, with the help of a
>> > coworker, that I can already use the AutoVacuumShmem pointer and read
>> > the struct. Unfortunately, it doesn't give me as much details as I would
>> > have liked. The list of databases and tables aren't in shared memory.
>> > They are local to the process that uses them. Putting them in shared
>> > memory (if at all possible) would imply a much bigger patch than I was
>> > willing to write right now.
>> >
>> > Thanks anyway for the help.
>> >
>> >
>>
>> Sorry to revive such an old thread.
>>
>> I think some hooks in the autovacuum could be enough to have good
>> insight without exposing private structure.
>>
> 
> Interesting idea...
> 

Thanks for looking at it!

> 
>> Please find attached a patch that adds some hooks to the autovacuum, and
>> as an example a quick proof of concept extension that use them and allow
>> to see what are the autovacuum worker todo list, skipped tables and so on.
>>
>> I'm not really sure about which information should be provided, so I'm
>> open to any suggestion to improve this.
>>
> 
> I have a look at the patch and it's compile without warning and without
> regression.
> 
> But something goes wrong when installing the extension:
> 
> [...]
> /usr/bin/install -c -m 644  '/data/home/fabrizio/pgsql/share/doc/extension/'
> /usr/bin/install: missing destination file operand after
> ‘/data/home/fabrizio/pgsql/share/doc/extension/’
> Try '/usr/bin/install --help' for more information.
> make: *** [install] Error 1
> 

Oups, I'm not really sure what I removed and shouldn't have. I attached
v2 of the extension with a "standard" Makefile, which I just tested and
works fine.

> Regards,
> 
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
>>> Timbira: http://www.timbira.com.br
>>> Blog: http://fabriziomello.github.io
>>> Linkedin: http://br.linkedin.com/in/fabriziomello
>>> Twitter: http://twitter.com/fabriziomello
>>> Github: http://github.com/fabriziomello


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


pg_stat_autovacuum_v2.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-02-29 Thread Fabrízio de Royes Mello
On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud 
wrote:
>
> On 04/06/2015 22:10, Guillaume Lelarge wrote:
> > 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge  > >:
> >
> > 2015-01-05 17:40 GMT+01:00 Robert Haas  > >:
> >
> > On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane  > > wrote:
> > > I'd be all right with putting the data structure declarations
in a file
> > > named something like autovacuum_private.h, especially if it
carried an
> > > annotation that "if you depend on this, don't be surprised if
we break
> > > your code in future".
> >
> > Works for me.  I am not in general surprised when we do things
that
> > break my code, or anyway, the code that I'm responsible for
> > maintaining.  But I think it makes sense to segregate this into
a
> > separate header file so that we are clear that it is only
> > exposed for
> > the benefit of extension authors, not so that other things in
> > the core
> > system can touch it.
> >
> >
> > I'm fine with that too. I'll try to find some time to work on that.
> >
> >
> > So I took a look at this this week. I discovered, with the help of a
> > coworker, that I can already use the AutoVacuumShmem pointer and read
> > the struct. Unfortunately, it doesn't give me as much details as I would
> > have liked. The list of databases and tables aren't in shared memory.
> > They are local to the process that uses them. Putting them in shared
> > memory (if at all possible) would imply a much bigger patch than I was
> > willing to write right now.
> >
> > Thanks anyway for the help.
> >
> >
>
> Sorry to revive such an old thread.
>
> I think some hooks in the autovacuum could be enough to have good
> insight without exposing private structure.
>

Interesting idea...


> Please find attached a patch that adds some hooks to the autovacuum, and
> as an example a quick proof of concept extension that use them and allow
> to see what are the autovacuum worker todo list, skipped tables and so on.
>
> I'm not really sure about which information should be provided, so I'm
> open to any suggestion to improve this.
>

I have a look at the patch and it's compile without warning and without
regression.

But something goes wrong when installing the extension:

fabrizio@bagual:~/Downloads/pg_stat_autovacuum
$ pg_config
BINDIR = /data/home/fabrizio/pgsql/bin
DOCDIR = /data/home/fabrizio/pgsql/share/doc
HTMLDIR = /data/home/fabrizio/pgsql/share/doc
INCLUDEDIR = /data/home/fabrizio/pgsql/include
PKGINCLUDEDIR = /data/home/fabrizio/pgsql/include
INCLUDEDIR-SERVER = /data/home/fabrizio/pgsql/include/server
LIBDIR = /data/home/fabrizio/pgsql/lib
PKGLIBDIR = /data/home/fabrizio/pgsql/lib
LOCALEDIR = /data/home/fabrizio/pgsql/share/locale
MANDIR = /data/home/fabrizio/pgsql/share/man
SHAREDIR = /data/home/fabrizio/pgsql/share
SYSCONFDIR = /data/home/fabrizio/pgsql/etc
PGXS = /data/home/fabrizio/pgsql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/home/fabrizio/pgsql' '--enable-cassert'
'--enable-coverage' '--enable-tap-tests' '--enable-depend'
CC = gcc
CPPFLAGS = -DFRONTEND -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-fprofile-arcs -ftest-coverage
CFLAGS_SL = -fpic
LDFLAGS = -L../../src/common -Wl,--as-needed
-Wl,-rpath,'/home/fabrizio/pgsql/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 9.6devel

fabrizio@bagual:~/Downloads/pg_stat_autovacuum
$ make USE_PGXS=1 install
/bin/mkdir -p '/data/home/fabrizio/pgsql/lib'
/bin/mkdir -p '/data/home/fabrizio/pgsql/share/extension'
/bin/mkdir -p '/data/home/fabrizio/pgsql/share/extension'
/bin/mkdir -p '/data/home/fabrizio/pgsql/share/doc/extension'
/usr/bin/install -c -m 755  pg_stat_autovacuum.so
'/data/home/fabrizio/pgsql/lib/pg_stat_autovacuum.so'
/usr/bin/install -c -m 644 .//pg_stat_autovacuum.control
'/data/home/fabrizio/pgsql/share/extension/'
/usr/bin/install -c -m 644 .//pg_stat_autovacuum--0.0.1.sql
'/data/home/fabrizio/pgsql/share/extension/'
/usr/bin/install -c -m 644  '/data/home/fabrizio/pgsql/share/doc/extension/'
/usr/bin/install: missing destination file operand after
‘/data/home/fabrizio/pgsql/share/doc/extension/’
Try '/usr/bin/install --help' for more information.
make: *** [install] Error 1

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-02-29 Thread Petr Jelinek

On 29/02/16 18:19, Dmitry Dolgov wrote:

 > I'd strongly prefer the jsonb_array_insert naming though
 > I don't think it's a good idea to use set when this is used on
object, I think that we should throw error in that case.

Well, I thought it's wrong to introduce different functions and
behaviour patterns for objects and arrays, because it's the one data
type after all. But it's just my opinion of course.



The problem I see with that logic is because we don't keep order of keys 
in the jsonb object the "insert" in the name will have misleading 
implications from the point of the user. Also it does not do anything 
for object that "set" doesn't do already, so why have two interfaces for 
doing same thing.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump / copy bugs with "big lines" ?

2016-02-29 Thread Alvaro Herrera
A customer of ours was hit by this recently and I'd like to get it fixed
for good.

Robert Haas wrote:
> On Mon, Apr 6, 2015 at 1:51 PM, Jim Nasby  wrote:
> > In any case, I don't think it would be terribly difficult to allow a bit
> > more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR there's
> > some 1GB limits there too.
> 
> The point is, those limits are there on purpose.  Changing things
> arbitrarily wouldn't be hard, but doing it in a principled way is
> likely to require some thought.  For example, in the COPY OUT case,
> presumably what's happening is that we palloc a chunk for each
> individual datum, and then palloc a buffer for the whole row.

Right.  There's a serious problem here already, and users are being
bitten by it in existing releases.  I think we should provide a
non-invasive fix for back-branches which we could apply as a bug fix.
Here's a proposed patch for the localized fix; it just adds a flag to
StringInfo allowing the string to grow beyond the 1GB limit, but only
for strings which are specifically marked.  That way we avoid having to
audit all the StringInfo-using code.  In this patch, only the COPY path
is allowed to grow beyond 1GB, which is enough to close the current
problem -- or at least my test case for it.

If others can try this patch to ensure it enables pg_dump to work on
their databases, it would be great.

(In this patch there's a known buglet which is that the UINT64_FORMAT
patched in the error message doesn't allow for translatability.)

Like Robert, I don't like this approach for the long term, however.  I
think it would be saner to have CopySendData not keep a single gigantic
string in memory; it would be better to get the bytes out to the client
sooner than end-of-row.  To avoid causing a performance hit, we would
only flush when the size of the output buffer is about to reach the
allocation limit (MaxAllocSize); so for regular-sized tuples, we would
only do it at end-of-row, keeping the current behavior.  I don't have a
patch for this yet; I'm going to try that next.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3eba9ef..fcc4fe6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -445,6 +445,15 @@ SendCopyEnd(CopyState cstate)
 	}
 }
 
+/*
+ * Prepare for output
+ */
+static void
+CopyStartSend(CopyState cstate)
+{
+	allowLongStringInfo(cstate->fe_msgbuf);
+}
+
 /*--
  * CopySendData sends output data to the destination (file or frontend)
  * CopySendString does the same for null-terminated strings
@@ -1865,6 +1874,8 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+	CopyStartSend(cstate);
+
 	if (cstate->binary)
 	{
 		/* Binary per-tuple header */
diff --git a/src/backend/lib/stringinfo.c b/src/backend/lib/stringinfo.c
index 7d03090..8c08eb7 100644
--- a/src/backend/lib/stringinfo.c
+++ b/src/backend/lib/stringinfo.c
@@ -47,12 +47,24 @@ initStringInfo(StringInfo str)
 {
 	int			size = 1024;	/* initial default buffer size */
 
-	str->data = (char *) palloc(size);
+	str->data = (char *) palloc(size);	/* no need for "huge" at this point */
 	str->maxlen = size;
+	str->allowlong = false;
 	resetStringInfo(str);
 }
 
 /*
+ * allocLongStringInfo
+ *
+ * Mark the StringInfo as a candidate for a "huge" allocation
+ */
+void
+allowLongStringInfo(StringInfo str)
+{
+	str->allowlong = true;
+}
+
+/*
  * resetStringInfo
  *
  * Reset the StringInfo: the data buffer remains valid, but its
@@ -64,6 +76,7 @@ resetStringInfo(StringInfo str)
 	str->data[0] = '\0';
 	str->len = 0;
 	str->cursor = 0;
+	str->allowlong = false;
 }
 
 /*
@@ -244,7 +257,8 @@ appendBinaryStringInfo(StringInfo str, const char *data, int datalen)
 void
 enlargeStringInfo(StringInfo str, int needed)
 {
-	int			newlen;
+	int64		newlen;
+	Size		limit;
 
 	/*
 	 * Guard against out-of-range "needed" values.  Without this, we can get
@@ -252,16 +266,19 @@ enlargeStringInfo(StringInfo str, int needed)
 	 */
 	if (needed < 0)/* should not happen */
 		elog(ERROR, "invalid string enlargement request size: %d", needed);
-	if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
+
+	/* choose the proper limit and verify this allocation wouldn't exceed it */
+	limit = str->allowlong ? MaxAllocHugeSize : MaxAllocSize;
+	if (((Size) needed) >= (limit - (Size) str->len))
 		ereport(ERROR,
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
  errmsg("out of memory"),
- errdetail("Cannot enlarge string buffer containing %d bytes by %d more bytes.",
+ errdetail("Cannot enlarge string buffer containing "INT64_FORMAT" bytes by %d more bytes.",
 		   str->len, needed)));
 
 	needed += str->len + 1;		/* total space required now */
 
-	/* 

Re: [HACKERS] Publish autovacuum informations

2016-02-29 Thread Julien Rouhaud
On 04/06/2015 22:10, Guillaume Lelarge wrote:
> 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge  >:
> 
> 2015-01-05 17:40 GMT+01:00 Robert Haas  >:
> 
> On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane  > wrote:
> > I'd be all right with putting the data structure declarations in a 
> file
> > named something like autovacuum_private.h, especially if it carried 
> an
> > annotation that "if you depend on this, don't be surprised if we 
> break
> > your code in future".
> 
> Works for me.  I am not in general surprised when we do things that
> break my code, or anyway, the code that I'm responsible for
> maintaining.  But I think it makes sense to segregate this into a
> separate header file so that we are clear that it is only
> exposed for
> the benefit of extension authors, not so that other things in
> the core
> system can touch it.
> 
> 
> I'm fine with that too. I'll try to find some time to work on that.
> 
> 
> So I took a look at this this week. I discovered, with the help of a
> coworker, that I can already use the AutoVacuumShmem pointer and read
> the struct. Unfortunately, it doesn't give me as much details as I would
> have liked. The list of databases and tables aren't in shared memory.
> They are local to the process that uses them. Putting them in shared
> memory (if at all possible) would imply a much bigger patch than I was
> willing to write right now.
> 
> Thanks anyway for the help.
> 
> 

Sorry to revive such an old thread.

I think some hooks in the autovacuum could be enough to have good
insight without exposing private structure.

Please find attached a patch that adds some hooks to the autovacuum, and
as an example a quick proof of concept extension that use them and allow
to see what are the autovacuum worker todo list, skipped tables and so on.

I'm not really sure about which information should be provided, so I'm
open to any suggestion to improve this.
-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index e2859df..e219b78 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -123,6 +123,12 @@ intautovacuum_vac_cost_limit;
 
 intLog_autovacuum_min_duration = -1;
 
+/* Hook for plugins to get informations */
+autovacuum_list_tables_hook_type autovacuum_list_tables_hook = NULL;
+autovacuum_begin_table_hook_type autovacuum_begin_table_hook = NULL;
+autovacuum_end_table_hook_type autovacuum_end_table_hook = NULL;
+autovacuum_database_finished_hook_type autovacuum_database_finished_hook = 
NULL;
+
 /* how long to keep pgstat data in the launcher, in milliseconds */
 #define STATS_READ_DELAY 1000
 
@@ -2178,6 +2184,13 @@ do_autovacuum(void)

  ALLOCSET_DEFAULT_MINSIZE,

  ALLOCSET_DEFAULT_MAXSIZE);
 
+   /* inform potential extension about the table we plan to clean */
+   if (autovacuum_list_tables_hook)
+   {
+   (*autovacuum_list_tables_hook) (MyWorkerInfo->wi_proc->pid,
+   MyDatabaseId, table_oids);
+   }
+
/*
 * Perform operations on collected tables.
 */
@@ -2243,6 +2256,12 @@ do_autovacuum(void)
if (skipit)
{
LWLockRelease(AutovacuumScheduleLock);
+
+   /* warn potential extension we won't work on this table 
*/
+   if (autovacuum_end_table_hook)
+   (*autovacuum_end_table_hook) 
(MyWorkerInfo->wi_proc->pid,
+   MyDatabaseId, relid, false);
+
continue;
}
 
@@ -2263,6 +2282,12 @@ do_autovacuum(void)
{
/* someone else vacuumed the table, or it went away */
LWLockRelease(AutovacuumScheduleLock);
+
+   /* warn potential extension we won't work on this table 
*/
+   if (autovacuum_end_table_hook)
+   (*autovacuum_end_table_hook) 
(MyWorkerInfo->wi_proc->pid,
+   MyDatabaseId, relid, false);
+
continue;
}
 
@@ -2316,6 +2341,12 @@ do_autovacuum(void)
if (!tab->at_relname || !tab->at_nspname || !tab->at_datname)
goto deleted;
 
+   /* warn potential extension we're starting to work on this 
table */
+   if 

Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-02-29 Thread Teodor Sigaev

Thank you for review!


I'd like to see comments too! but more so in the code. :) I've had a look over
this, and it seems like a great area in which we could improve on, and your
reported performance improvements are certainly very interesting too. However
I'm finding the code rather hard to follow, which might be a combination of my
lack of familiarity with the index code, but more likely it's the lack of

I've added comments, fixed a found bugs.



comments to explain what's going on. Let's just take 1 function as an example:

Here there's not a single comment, so I'm just going to try to work out what's
going on based on the code.

+static void
+compileScanKeys(IndexScanDesc scan)
+{
+GISTScanOpaqueso = (GISTScanOpaque) scan->opaque;
+int*stack,
+stackPos = -1,
+i;
+
+if (scan->numberOfKeys <= 1 || so->useExec == false)
+return;
+
+Assert(scan->numberOfKeys >=3);

Why can numberOfKeys never be 2? I looked at what calls this and I can't really
Because here they are actually an expression, expression could contain 1 or tree 
or more nodes but could not two (operation AND/OR plus two arguments)



work it out. I'm really also not sure what useExec means as there's no comment
fixed. If useExec == false then SkanKeys are implicitly ANDed and stored in just 
array.



in that struct member, and what if numberOfKeys == 1 and useExec == false, won't
this Assert() fail? If that's not a possible situation then why not?

fixed





+ScanKey key = scan->keyData + i;
Is there a reason not to use keyData[i]; ?

That's the same ScanKey key = >keyData[i];
I prefer first form as more clear but I could be wrong - but there are other 
places in code where pointer arithmetic is used.



+if (stackPos >= 0 && (key->sk_flags & (SK_OR | SK_AND)))
+{
+Assert(stackPos >= 1 && stackPos < scan->numberOfKeys);
stackPos >= 1? This seems unnecessary and confusing as the if test surely makes
that impossible.




+
+so->leftArgs[i] = stack[stackPos - 1];
Something is broken here as stackPos can be 0 (going by the if() not the
Assert()), therefore that's stack[-1].

fixed


stackPos is initialised to -1, so this appears to always skip the first element
of the keyData array. If that's really the intention, then wouldn't it be better
to just make the initial condition of the for() look i = 1 ?

done


I'd like to review more, but it feels like a job that's more difficult than it
needs to be due to lack of comments.

Would it be possible to update the patch to try and explain things a little 
better?

Hope, I made cleaner..


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


index_or-2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce lock levels others reloptions in ALTER TABLE

2016-02-29 Thread Fabrízio de Royes Mello
On Mon, Feb 29, 2016 at 2:58 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
> Hi all,
>
> Some time ago we added [1] the infrastructure to allow different lock
levels for relation options.
>
> So per discussion [2] the attached patch reduce lock levels down to
ShareUpdateExclusiveLock for:
> - fastupdate
> - fillfactor
> - gin_pending_list_limit
> - seq_page_cost
> - random_page_cost
> - n_distinct
> - n_distinct_inherited
> - buffering
>
> Att,
>
>
> [1]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=47167b7907a802ed39b179c8780b76359468f076
> [2]
http://www.postgresql.org/message-id/20150731022857.gc11...@alap3.anarazel.de
>

Added to the next commitfest:

https://commitfest.postgresql.org/9/558/

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


[HACKERS] Reduce lock levels others reloptions in ALTER TABLE

2016-02-29 Thread Fabrízio de Royes Mello
Hi all,

Some time ago we added [1] the infrastructure to allow different lock
levels for relation options.

So per discussion [2] the attached patch reduce lock levels down to
ShareUpdateExclusiveLock for:
- fastupdate
- fillfactor
- gin_pending_list_limit
- seq_page_cost
- random_page_cost
- n_distinct
- n_distinct_inherited
- buffering

Att,


[1]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=47167b7907a802ed39b179c8780b76359468f076
[2]
http://www.postgresql.org/message-id/20150731022857.gc11...@alap3.anarazel.de


-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 86b9ae1..8128dd4 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -76,7 +76,7 @@ static relopt_bool boolRelOpts[] =
 			"fastupdate",
 			"Enables \"fast update\" feature for this GIN index",
 			RELOPT_KIND_GIN,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		true
 	},
@@ -100,7 +100,7 @@ static relopt_int intRelOpts[] =
 			"fillfactor",
 			"Packs table pages only to this percentage",
 			RELOPT_KIND_HEAP,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100
 	},
@@ -109,7 +109,7 @@ static relopt_int intRelOpts[] =
 			"fillfactor",
 			"Packs btree index pages only to this percentage",
 			RELOPT_KIND_BTREE,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100
 	},
@@ -118,7 +118,7 @@ static relopt_int intRelOpts[] =
 			"fillfactor",
 			"Packs hash index pages only to this percentage",
 			RELOPT_KIND_HASH,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100
 	},
@@ -127,7 +127,7 @@ static relopt_int intRelOpts[] =
 			"fillfactor",
 			"Packs gist index pages only to this percentage",
 			RELOPT_KIND_GIST,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
 	},
@@ -136,7 +136,7 @@ static relopt_int intRelOpts[] =
 			"fillfactor",
 			"Packs spgist index pages only to this percentage",
 			RELOPT_KIND_SPGIST,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100
 	},
@@ -250,7 +250,7 @@ static relopt_int intRelOpts[] =
 			"gin_pending_list_limit",
 			"Maximum size of the pending list for this GIN index, in kilobytes.",
 			RELOPT_KIND_GIN,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		-1, 64, MAX_KILOBYTES
 	},
@@ -297,7 +297,7 @@ static relopt_real realRelOpts[] =
 			"seq_page_cost",
 			"Sets the planner's estimate of the cost of a sequentially fetched disk page.",
 			RELOPT_KIND_TABLESPACE,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		-1, 0.0, DBL_MAX
 	},
@@ -306,7 +306,7 @@ static relopt_real realRelOpts[] =
 			"random_page_cost",
 			"Sets the planner's estimate of the cost of a nonsequentially fetched disk page.",
 			RELOPT_KIND_TABLESPACE,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		-1, 0.0, DBL_MAX
 	},
@@ -315,7 +315,7 @@ static relopt_real realRelOpts[] =
 			"n_distinct",
 			"Sets the planner's estimate of the number of distinct values appearing in a column (excluding child relations).",
 			RELOPT_KIND_ATTRIBUTE,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		0, -1.0, DBL_MAX
 	},
@@ -324,7 +324,7 @@ static relopt_real realRelOpts[] =
 			"n_distinct_inherited",
 			"Sets the planner's estimate of the number of distinct values appearing in a column (including child relations).",
 			RELOPT_KIND_ATTRIBUTE,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		0, -1.0, DBL_MAX
 	},
@@ -339,7 +339,7 @@ static relopt_string stringRelOpts[] =
 			"buffering",
 			"Enables buffering build for this GiST index",
 			RELOPT_KIND_GIST,
-			AccessExclusiveLock
+			ShareUpdateExclusiveLock
 		},
 		4,
 		false,
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 7c88ddc..3232cda 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -2065,19 +2065,19 @@ select * from my_locks order by 1;
 commit;
 begin; alter table alterlock set (fillfactor = 100);
 select * from my_locks order by 1;
-  relname  |max_lockmode 
+-
- alterlock | AccessExclusiveLock
- pg_toast  | AccessExclusiveLock
+  relname  |   max_lockmode   
+---+--
+ alterlock | ShareUpdateExclusiveLock
+ pg_toast  | ShareUpdateExclusiveLock
 (2 rows)
 
 commit;
 begin; alter table alterlock reset (fillfactor);
 select * 

Re: [HACKERS] WIP: Upper planner pathification

2016-02-29 Thread Tom Lane
Robert Haas  writes:
> I'll abstain from the question of whether this patch is too late in
> coming (but the answer is probably "yes") and instead volunteer to
> review it.

OK, I've put it into the commitfest.  Thanks for volunteering!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Upper planner pathification

2016-02-29 Thread Robert Haas
On Sun, Feb 28, 2016 at 3:03 PM, Tom Lane  wrote:
> So, where to go from here?  I'm acutely aware that we're hard up against
> the final 9.6 commitfest, and that we discourage major patches arriving
> so late in a devel cycle.  But I simply couldn't get this done any faster.
> I don't really want to hold it over for the 9.7 devel cycle.  It's been
> enough trouble maintaining this patch in the face of conflicting commits
> over the last year or so (it's probably still got bugs related to parallel
> query...), and there definitely are conflicting patches in the upcoming
> 'fest.  And the lack of this infrastructure is blocking progress on FDWs
> and some other things.
>
> So I'd really like to get this into 9.6.  I'm happy to put it into the
> March commitfest if someone will volunteer to review it.

I'll abstain from the question of whether this patch is too late in
coming (but the answer is probably "yes") and instead volunteer to
review it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-02-29 Thread Robert Haas
On Fri, Feb 26, 2016 at 11:37 PM, Amit Kapila  wrote:
> On Sat, Feb 27, 2016 at 10:03 AM, Amit Kapila 
> wrote:
>>
>> Here, we can see that there is a gain of ~15% to ~38% at higher client
>> count.
>>
>> The attached document (perf_write_clogcontrollock_data_v6.ods) contains
>> data, mainly focussing on single client performance.  The data is for
>> multiple runs on different machines, so I thought it is better to present in
>> form of document rather than dumping everything in e-mail.  Do let me know
>> if there is any confusion in understanding/interpreting the data.
>
> Forgot to mention that all these tests have been done by reverting
> commit-ac1d794.

OK, that seems better.  But I have a question: if we don't really need
to make this optimization apply only when everything is on the same
page, then why even try?  If we didn't try, we wouldn't need the
all_trans_same_page flag, which would reduce the amount of code
change.  Would that hurt anything? Taking it even further, we could
remove the check from TransactionGroupUpdateXidStatus too.  I'd be
curious to know whether that set of changes would improve performance
or regress it.  Or maybe it does nothing, in which case perhaps
simpler is better.

All things being equal, it's probably better if the cases where
transactions from different pages get into the list together is
something that is more or less expected rather than a
once-in-a-blue-moon scenario - that way, if any bugs exist, we'll find
them.  The downside of that is that we could increase latency for the
leader that way - doing other work on the same page shouldn't hurt
much but different pages is a bigger hit.  But that hit might be
trivial enough not to be worth worrying about.

+   /*
+* Now that we've released the lock, go back and wake everybody up.  We
+* don't do this under the lock so as to keep lock hold times to a
+* minimum.  The system calls we need to perform to wake other processes
+* up are probably much slower than the simple memory writes
we did while
+* holding the lock.
+*/

This comment was true in the place that you cut-and-pasted it from,
but it's not true here, since we potentially need to read from disk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-29 Thread Robert Haas
On Sat, Feb 27, 2016 at 7:05 PM, Noah Misch  wrote:
> On Fri, Feb 26, 2016 at 04:16:58PM +0530, Robert Haas wrote:
>> Committed these patches after revising the comment you wrote and
>> adding documentation.
>
> I've modified buildfarm member mandrill to use force_parallel_mode=regress and
> max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
> fails the stats test, like Tom witnessed:
> http://www.postgresql.org/message-id/30385.1456077...@sss.pgh.pa.us

Thank you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-02-29 Thread Dmitry Dolgov
> I'd strongly prefer the jsonb_array_insert naming though
> I don't think it's a good idea to use set when this is used on object, I
think that we should throw error in that case.

Well, I thought it's wrong to introduce different functions and behaviour
patterns for objects and arrays, because it's the one data type after all.
But it's just my opinion of course.

> Also this patch needs documentation.

I've added new version in attachments, thanks.


jsonb_insert_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-29 Thread Pavel Stehule
> 0003 is the most controversial. It removes the ability to pass message
> as keyword argument. My reasoning was that keyword arguments are
> usually optional and configure extra aspects of the function call
> while message is required and fundamental so therefore it should be
> positional. If you allow it as keyword as well, you have to deal with
> the ambiguity of writing plpy.info('a message', message='a keyword arg
> message, does this overwrite the first one or what?').
>

I though about it before and I prefer variant with possibility to enter
message as keyword parameter. The advantage of this solution is simple
usage dictionary value as parameter with possibility to set all fields.

We can check collision and we can raise a error. Same technique is used in
plpgsql:

postgres=# do $$ begin raise warning 'kuku' using message='NAZDAR'; end; $$;
ERROR:  RAISE option already specified: MESSAGE
CONTEXT:  PL/pgSQL function inline_code_block line 1 at RAISE
postgres=#

What do you think?

Pavel


>
> For the code with my patches on top on I ran the PL/Python tests for
> 2.4, 2.5, 2.6, 2.7 and 3.5. Everything passed.
>
> Can you have a look at the patches, fold the ones you agree with on
> top of yours and send the final version? With that I think this will
> be Ready for Committer.
>


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-29 Thread Catalin Iacob
On 2/26/16, Pavel Stehule  wrote:
> Sending updated version

I did some changes on top of your last version as that was easier than
commenting about them, see attached.

0001 and 0005 are comment changes.

0002 is really needed, without it the tests fail on Python2.4.

0004 removes more code related to the unused position.

0003 is the most controversial. It removes the ability to pass message
as keyword argument. My reasoning was that keyword arguments are
usually optional and configure extra aspects of the function call
while message is required and fundamental so therefore it should be
positional. If you allow it as keyword as well, you have to deal with
the ambiguity of writing plpy.info('a message', message='a keyword arg
message, does this overwrite the first one or what?').

For the code with my patches on top on I ran the PL/Python tests for
2.4, 2.5, 2.6, 2.7 and 3.5. Everything passed.

Can you have a look at the patches, fold the ones you agree with on
top of yours and send the final version? With that I think this will
be Ready for Committer.


0001-Comment-improvements.patch
Description: binary/octet-stream


0002-Fix-tests-for-Python-2.4.patch
Description: binary/octet-stream


0003-Pass-message-only-as-positional-argument.patch
Description: binary/octet-stream


0004-Get-rid-of-setting-unused-position.patch
Description: binary/octet-stream


0005-Improve-comments.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Convert pltcl from strings to objects

2016-02-29 Thread Tom Lane
Jim Nasby  writes:
> On 2/25/16 9:30 AM, Alvaro Herrera wrote:
>> Refcounting the prodesc would let it live until the cursor's closed,
>> then free it.

> I'm also not sure how the reference would get decremented... via 
> ResourceOwner somehow?

plpgsql already has a similar mechanism (see PLpgSQL_function.use_count)
which you could probably copy.  But I'd advise that this is a separate
matter to be addressed in a separate patch; it has little to do with the
nominal subject matter of this patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo fix

2016-02-29 Thread Tom Lane
Amit Langote  writes:
> Attached fixes a typo:

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-02-29 Thread Artur Zakirov

Hello, Dmitry

This is my initial review for you patch. Below are my comments.

Introduction


This patch introduce new operator and new functions.
New operator:
- ??
New functions:
- phraseto_tsquery([ config regconfig , ] query text)
- setweight(tsquery, "char")
- tsquery_phrase(query1 tsquery, query2 tsquery)
- tsquery_phrase(query1 tsquery, query2 tsquery, distance integer)

New regression tests are included in the patch. Information about new 
features is provided in the documentation.


Initial Run
---

The patch applies correctly to HEAD. Regression tests pass successfully, 
without errors. It seems that the patch work as you expected.


Performance
---

I have not tested possible performance issues yet. Maybe later I will 
prepare some data to test performance.


Coding
--

I agree with others that there is a lack of comments for new functions. 
Most of them without comments.



../pg_patches/phrase_search.patch:1086: trailing whitespace.
 * QI_VALSTOP nodes should be cleaned and
../pg_patches/phrase_search.patch:1124: trailing whitespace.
if (priority < parentPriority)
../pg_patches/phrase_search.patch:1140: trailing whitespace.
if (priority < parentPriority)
../pg_patches/phrase_search.patch:1591: space before tab in indent.
/*  (a|b) ? c => (a?c) | (b?c) */
../pg_patches/phrase_search.patch:1603: space before tab in indent.
/*  !a ? b=> b & !(a?b) */


It is git apply output. There are trailing whitespaces in the code.


+typedef struct MorphOpaque
+{
+   Oid cfg_id;
+   int qoperator;  /* query operator */
+} MorphOpaque;


Maybe you should move structure definition to the top of the to_tsany.c


+   pushValue(state,
+ prs.words[count].word,
+ prs.words[count].len,
+ weight,
+ ((prs.words[count].flags 
& TSL_PREFIX) || prefix) ?
+   true :
+   false);


There is not need in ternary operator here. You can write:

pushValue(state,
  prs.words[count].word,
  prs.words[count].len,
  weight,
  (prs.words[count].flags & TSL_PREFIX) || prefix);


 /*
+ * Wrapper of check condition function for TS_execute.
+ * We are using the fact GIN_FALSE = 0 and GIN_MAYBE state
+ * should be treated as true, so, any non-zero value should be
+ * converted to boolean true.
+ */
+static bool
+checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
+{
+   return !!checkcondition_gin_internal((GinChkVal *) checkval, val, data);
+}


Maybe write like this?

static bool
checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
	return checkcondition_gin_internal((GinChkVal *) checkval, val, data) 
!= GIN_FALSE;

}

Then you don't need in the comment above of the function.


+#define PRINT_PRIORITY(x)  ( (((QueryOperator*)(x))->oper == OP_NOT) ? 5 : 
QO_PRIORITY(x) )


What is mean "5" here? You can define a new value near the:

#define OP_OR   1
#define OP_AND  2
#define OP_NOT  3
#define OP_PHRASE   4


+Datum
+tsquery_setweight(PG_FUNCTION_ARGS)
+{
+   TSQuery in = PG_GETARG_TSQUERY(0);
+   charcw = PG_GETARG_CHAR(1);
+   TSQuery out;
+   QueryItem  *item;
+   int w = 0;
+
+   switch (cw)
+   {
+   case 'A':
+   case 'a':
+   w = 3;
+   break;
+   case 'B':
+   case 'b':
+   w = 2;
+   break;
+   case 'C':
+   case 'c':
+   w = 1;
+   break;
+   case 'D':
+   case 'd':
+   w = 0;
+   break;
+   default:
+   /* internal error */
+   elog(ERROR, "unrecognized weight: %d", cw);
+   }
+
+   out = (TSQuery) palloc(VARSIZE(in));
+   memcpy(out, in, VARSIZE(in));
+   item = GETQUERY(out);
+
+   while(item - GETQUERY(out) < out->size)   
+   {
+   if (item->type == QI_VAL)
+   item->qoperand.weight |= (1 << w);
+
+   item++;
+   }
+
+   PG_FREE_IF_COPY(in, 0);
+   PG_RETURN_POINTER(out);
+}


This function has the duplicated piece from the tsvector_setweight() 
from tsvector_op.c. You can define new function for it.



+/*
+ * Check if datatype is the specified 

Re: [HACKERS]WIP: Covering + unique indexes.

2016-02-29 Thread Anastasia Lubennikova

25.02.2016 21:39, Jeff Janes:

As promised, here's the new version of the patch "including_columns_4.0".
I fixed all issues except some points mentioned below.

Thanks for the update patch.  I get a compiler warning:

genam.c: In function 'BuildIndexValueDescription':
genam.c:259: warning: unused variable 'tupdesc'


Thank you for the notice, I'll fix it in the next update.

Also, I can't create a primary key INCLUDING columns directly:

jjanes=# create table foobar (a int, b int, c int);
jjanes=# alter table foobar add constraint foobar_pkey primary key
(a,b) including (c);
ERROR:  syntax error at or near "including"

But I can get there using a circuitous route:

jjanes=# create unique index on foobar (a,b) including (c);
jjanes=# alter table foobar add constraint foobar_pkey primary key
using index foobar_a_b_c_idx;

The description of the table's index knows to include the including column:

jjanes=# \d foobar
 Table "public.foobar"
  Column |  Type   | Modifiers
+-+---
  a  | integer | not null
  b  | integer | not null
  c  | integer |
Indexes:
 "foobar_pkey" PRIMARY KEY, btree (a, b) INCLUDING (c)


Since the machinery appears to all be in place to have primary keys
with INCLUDING columns, it would be nice if the syntax for adding
primary keys allowed one to implement them directly.

Is this something or future expansion, or could it be added at the
same time as the main patch?


Good point.
At quick glance, this looks easy to implement it. The only problem is 
that there are too many places in code which must be updated.
I'll try to do it, and if there would be difficulties, it's fine with me 
to delay this feature for the future work.


I found one more thing to do. Pgdump does not handle included columns 
now. I will fix it in the next version of the patch.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compilation broken when OPTIMIZER_DEBUG is set

2016-02-29 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Feb 29, 2016 at 8:44 PM, salvador fandino  wrote:
>> allpaths.c: In function ‘debug_print_rel’:
>> allpaths.c:2943:50: error: ‘RelOptInfo {aka struct RelOptInfo}’ has no
>> member named ‘width’

> Indeed. The width is now part of rel->reltarget.width. See for example
> the attached while I bumped on this email..

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL log only necessary part of 2PC GID

2016-02-29 Thread Pavan Deolasee
Hello Hackers,

The maximum size of the GID, used as a 2PC identifier is currently defined
as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID
used by the applications though may be much smaller than that. So IMO
instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we
should just WAL log strlen(gid) bytes.

The attached patch does that. The changes are limited to twophase.c and
some simple crash recovery tests seem to be work ok. In terms of
performance, a quick test shows marginal improvement in tps using the
script that Stas Kelvich used for his work on speeding up twophase
transactions. The only change I made is to keep the :scale unchanged
because increasing the :scale in every iteration will result in only a
handful updates (not sure why Stas had that in his original script)

\set naccounts 10 * :scale
\setrandom from_aid 1 :naccounts
\setrandom to_aid 1 :naccounts
\setrandom delta 1 100
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
:from_aid;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
:to_aid;
PREPARE TRANSACTION ':client_id.:scale';
COMMIT PREPARED ':client_id.:scale';

The amount of WAL generated during a 60s run shows a decline of about 25%
with default settings except full_page_writes which is turned off.

HEAD: 861 WAL bytes / transaction
PATCH: 670 WAL bytes / transaction

Actually, the above numbers probably include a lot of WAL generated because
of HOT pruning and page defragmentation. If we just look at the WAL
overhead caused by 2PC, the decline is somewhere close to 50%. I took
numbers using simple 1PC for reference and to understand the overhead of
2PC.

HEAD (1PC): 382 bytes / transaction

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


reduce_gid_wal.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Upper planner pathification

2016-02-29 Thread Simon Riggs
On 28 February 2016 at 20:03, Tom Lane  wrote:


> So, where to go from here?  I'm acutely aware that we're hard up against
> the final 9.6 commitfest, and that we discourage major patches arriving
> so late in a devel cycle.  But I simply couldn't get this done any faster.
> I don't really want to hold it over for the 9.7 devel cycle.  It's been
> enough trouble maintaining this patch in the face of conflicting commits
> over the last year or so (it's probably still got bugs related to parallel
> query...), and there definitely are conflicting patches in the upcoming
> 'fest.  And the lack of this infrastructure is blocking progress on FDWs
> and some other things.
>

Thanks for working on this; it is important.

I'm disappointed to see you do this because of FDWs, with the "some other
things" like parallel aggregation not getting a mention by name.

While I wouldn't mind seeing this go in, what worries me is the multiple
other patches that now need to be rewritten to exploit this and since some
aren't mentioned would it be reasonable to imagine those other things won't
be prioritised for this release? Or will we be deciding to elongate the
integration phase to cope with this? Delay or favour forks, which should we
choose?

Anyway, glad to see you will now experience the problems of maintaining
large patches across multiple releases and/or the difficulty of arguing in
favour of patches that still require work going in at the last minute. Not
with relish, just so that understanding isn't limited to the usual suspects
of feature-crime.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-29 Thread Amit Kapila
On Thu, Feb 25, 2016 at 2:54 AM, Peter Eisentraut  wrote:
>
> Could you enhance the documentation about the difference between "wait
> event type name" and "wait event name" (examples?)?
>

I am planning to add possible values for each of the wait event type and
wait event and will add few examples as well.  Let me know if you want to
see something else with respect to documentation?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-29 Thread Amit Kapila
On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:
>
> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
wrote:
> >> I wouldn't bother tinkering with it at this point.  The value isn't
> >> going to be recorded on disk anywhere, so it will be easy to change
> >> the way it's computed in the future if we ever need to do that.
> >>
> >
> > Okay. Find the rebased patch attached with this mail.  I have moved
> > this patch to upcoming CF.
>
> I would call the functions pgstat_report_wait_start() and
> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
> pgstat_report_end_waiting().
>
> I think pgstat_get_wait_event_type should not return HWLock, a term
> that appears nowhere in our source tree at present.  How about just
> "Lock"?
>
> I think the wait event types should be documented - and the wait
> events too, perhaps.
>

By above do you mean to say that we should document the name of each wait
event type and wait event.  Documenting wait event names is okay, but we
have approximately 65~70 wait events (considering individuals LWLocks,
Tranches, Locks, etc), if we want to document all the events, then I think
we can have a separate table having columns (wait event name, description)
just below pg_stat_activity and have link to that table in wait_event row
of pg_stat_activity table.  Does that matches your thought or you have
something else in mind?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Compilation broken when OPTIMIZER_DEBUG is set

2016-02-29 Thread Michael Paquier
On Mon, Feb 29, 2016 at 8:44 PM, salvador fandino  wrote:
> gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
> -Wendif-labels -Wmissing-format-attribute -Wformat-security
> -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O0
> -DOPTIMIZER_DEBUG -I../../../../src/include -D_GNU_SOURCE   -c -o allpaths.o
> allpaths.c
> allpaths.c: In function ‘debug_print_rel’:
> allpaths.c:2943:50: error: ‘RelOptInfo {aka struct RelOptInfo}’ has no
> member named ‘width’
>   printf("): rows=%.0f width=%d\n", rel->rows, rel->width);
>   ^
>
> git bisect points to commit 19a541143a09c067ec8cac77ec6a64eb5b1b662b "Add an
> explicit representation of the output targetlist to Paths."

Indeed. The width is now part of rel->reltarget.width. See for example
the attached while I bumped on this email..
-- 
Michael


optimizer-debug-fix.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reworks of CustomScan serialization/deserialization

2016-02-29 Thread Kouhei Kaigai
Hello,

I'd like to adjust a few of custom-scan interface prior to v9.6 freeze.

The major point is serialization/deserialization mechanism.
Now, extension has to give LibraryName and SymbolName to reproduce
same CustomScanMethods on the background worker process side. Indeed,
it is sufficient information to pull the table of function pointers.

On the other hands, we now have different mechanism to wrap private
information - extensible node. It requires extensions to register its
ExtensibleNodeMethods identified by name, usually, on _PG_init() time.
It is also reasonable way to reproduce same objects on background
worker side.

However, mixture of two different ways is not good. My preference is
what extensible-node is doing rather than what custom-scan is currently
doing.
The attached patch allows extension to register CustomScanMethods once,
then readFunc.c can pull this table by CustomName in string form.


The minor one is header file location of CustomMethods declaration.
These are currently declared at relation.h, plannodes.h and execnodes.h.
These files are very primitive, so we put these lines:

  struct ParallelContext; /* avoid including parallel.h here */
  struct shm_toc; /* avoid including shm_toc.h here */
  struct ExplainState;/* avoid including explain.h here */

to avoid inclusion of other headers here.

It seems to me CustomMethods shall be moved to somewhere appropriate,
like fdwapi.h for FDW. If we put "struct CustomMethods;" on these
primitive header files instead, it will work.

I'm not 100% certain whether "nodes/custom-apis.h" is the best location,
but somewhere we can put these declarations rather than the primitive
header files might be needed.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


pgsql-v9.6-custom-scan-serialization-reworks.1.patch
Description: pgsql-v9.6-custom-scan-serialization-reworks.1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-02-29 Thread Michael Paquier
On Mon, Feb 29, 2016 at 7:42 PM, Alexander Korotkov
 wrote:
> On Fri, Feb 19, 2016 at 4:08 AM, Michael Paquier 
> wrote:
>> This is basically a copy of RewindTest.pm. This is far from generic.
>> If this patch gets committed first I would suggest to wait for the
>> more-generic routines that would be added in PostgresNode.pm and then
>> come back to it.
>
>
> Yes, that's it. Now, with committed changes to PostgresNode.pm, I get rid of
> separate WALTest.pm.

The test is cleaner in this shape, thanks.

+ # Run few queries on both master and stanbdy and check their results match.
s/stanbdy/standby
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-29 Thread Alexander Korotkov
On Sat, Feb 27, 2016 at 2:44 AM, Andres Freund  wrote:

> On 2016-02-02 13:12:50 +0300, Alexander Korotkov wrote:
> > On Tue, Feb 2, 2016 at 12:43 AM, Andres Freund 
> wrote:
> >
> > > On 2016-02-01 13:06:57 +0300, Alexander Korotkov wrote:
> > > > On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov <
> > > > a.korot...@postgrespro.ru> wrote:
> > > > >> ClientBasePatch
> > > > >> 11974419382
> > > > >> 8125923126395
> > > > >> 3231393151
> > > > >> 64387339496830
> > > > >> 128306412350610
> > > > >>
> > > > >> Shared Buffer= 512MB
> > > > >> max_connections=150
> > > > >> Scale Factor=300
> > > > >>
> > > > >> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
> > > > >>
> > > > >> ClientBasePatch
> > > > >> 11716916454
> > > > >> 8108547105559
> > > > >> 32241619262818
> > > > >> 64206868233606
> > > > >> 128137084217013
> > >
> > > So, there's a small regression on low client counts. That's worth
> > > addressing.
> > >
> >
> > Interesting. I'll try to reproduce it.
>
> Any progress here?
>

I didn't reproduce the regression. I had access to multicore machine but
didn't see either regression on low clients or improvements on high clients.
In the attached path spinlock delay was exposed in s_lock.h and used
in LockBufHdr().
Dilip, could you try this version of patch? Could you also run perf or
other profiler in the case of regression. It would be nice to compare
profiles with and without patch. We probably could find the cause of
regression.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Compilation broken when OPTIMIZER_DEBUG is set

2016-02-29 Thread salvador fandino
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-g -O0 -DOPTIMIZER_DEBUG -I../../../../src/include -D_GNU_SOURCE   -c -o
allpaths.o allpaths.c
allpaths.c: In function ‘debug_print_rel’:
allpaths.c:2943:50: error: ‘RelOptInfo {aka struct RelOptInfo}’ has no
member named ‘width’
  printf("): rows=%.0f width=%d\n", rel->rows, rel->width);
  ^

git bisect points to commit 19a541143a09c067ec8cac77ec6a64eb5b1b662b "Add
an explicit representation of the output targetlist to Paths."


Re: [HACKERS][REVIEW]: Password identifiers, protocol aging and SCRAM protocol

2016-02-29 Thread Valery Popov

Hi, Michael



23.02.2016 10:17, Michael Paquier пишет:

Attached is a set of patches implementing a couple of things that have
been discussed, so let's roll in.

Those 4 patches are aimed at putting in-core basics for the concept I
call password protocol aging, which is a way to allow multiple
password protocols to be defined in Postgres, and aimed at easing
administration as well as retirement of outdated protocols, which is
something that is not doable now in Postgres.

The second set of patch 0005~0008 introduces a new protocol, SCRAM.
9) 0009 is the SCRAM authentication itself

The theme with password checking is interesting for me, and I can give
review for CF for some features.
I think that review of all suggested features will require a lot of 
time.
Is it possible to make subset of patches concerning only password 
strength

and its aging?
The patches you have applied are non-independent. They should be apply
consequentially one by one.
Thus the patch 0009 can't be applied without git error  before 0001.
In this conditions all patches were successfully applied and compiled.
All tests successfully passed.

If you want to focus on the password protocol aging, you could just
have a look at 0001~0004.

OK, I will review patches 0001-0004, for starting.


Below are the results of compiling and testing.

I've got the last version of sources from 
git://git.postgresql.org/git/postgresql.git.


vpopov@vpopov-Ubuntu:~/Projects/pwdtest/postgresql$ git branch
* master

Then I've applied patches 0001-0004 with two warnings:
vpopov@vpopov-Ubuntu:~/Projects/pwdtest/postgresql$ git apply 
0001-Add-facility-to-store-multiple-password-verifiers.patch
0001-Add-facility-to-store-multiple-password-verifiers.patch:2547: 
trailing whitespace.

warning: 1 line adds whitespace errors.
vpopov@vpopov-Ubuntu:~/Projects/pwdtest/postgresql$ git apply 
0002-Introduce-password_protocols.patch
vpopov@vpopov-Ubuntu:~/Projects/pwdtest/postgresql$ git apply 
0003-Add-pg_auth_verifiers_sanitize.patch

0003-Add-pg_auth_verifiers_sanitize.patch:87: indent with spaces.
if (!superuser())
warning: 1 line adds whitespace errors.
vpopov@vpopov-Ubuntu:~/Projects/pwdtest/postgresql$ git apply 
0004-Remove-password-verifiers-for-unsupported-protocols-.patch
The compilation with option ./configure --enable-debug --enable-nls 
--enable-cassert  --enable-tap-tests --with-perl

was successful.
Regression tests and all TAP-tests also passed successfully.

Also I've applied patches 0005-0008 into clean sources directory with no 
warnings.
vpopov@vpopov-Ubuntu:~/Projects/pwdtest2/postgresql$ git apply 
0005-Move-sha1.c-to-src-common.patch
vpopov@vpopov-Ubuntu:~/Projects/pwdtest2/postgresql$ git apply 
0006-Refactor-sendAuthRequest.patch
vpopov@vpopov-Ubuntu:~/Projects/pwdtest2/postgresql$ git apply 
0007-Refactor-RandomSalt-to-handle-salts-of-different-len.patch
vpopov@vpopov-Ubuntu:~/Projects/pwdtest2/postgresql$ git apply 
0008-Move-encoding-routines-to-src-common.patch
The compilation with option ./configure --enable-debug --enable-nls 
--enable-cassert  --enable-tap-tests --with-perl

was successful.
Regression and the TAP-tests also passed successfully.

The patch 0009 depends on all previous patches 0001-0008: first we need 
to apply patches 0001-0008, then 0009.

Then, all patches were successfully compiled.
All test passed.

--
Regards,
Valery Popov
Postgres Professional http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-02-29 Thread Michael Paquier
On Mon, Feb 29, 2016 at 6:05 PM, Thomas Munro
 wrote:
> "All changes made by the transaction become visible to others ..." --
> which others?  But I backed out, that succinct account of COMMIT is 20
> years old, and in any case visibility is tied to committing, not
> specifically to the COMMIT command.  But perhaps this patch really
> should include something there that refers back to the causal reads
> section.

Luckily enough, read uncommitted behaves like read-committed in PG,
making this true :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Confusing with commit time usage in logical decoding

2016-02-29 Thread Artur Zakirov

Hello,

I read this message 
http://www.postgresql.org/message-id/56d4197e.9050...@informatik.uni-kl.de


Is this a bug or a typo? In DecodeCommit() in decode.c instead of:

if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
{
origin_lsn = parsed->origin_lsn;
commit_time = parsed->origin_timestamp;
}

should be:

if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
{
origin_lsn = parsed->origin_lsn;
commit_time = parsed->origin_timestamp;
}
else
commit_time = parsed->xact_time;

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-29 Thread Kyotaro HORIGUCHI
Sorry, I misread the previous patch. It actually worked.


At Sun, 28 Feb 2016 04:04:37 +0900, Masahiko Sawada  
wrote in 
> The changes from previous version are,
> - Fix parser, lexer bugs.
> - Add regression test patch based on patch Suraji submitted.

Thank you for the new patch. The parser almost looks to work as
expected, but the following warnings were seen on build.

> In file included from syncgroup_gram.y:138:0:
> syncgroup_scanner.l:23:12: warning: ‘xd_size’ defined but not used 
> [-Wunused-variable]
>  static int xd_size; /* actual size of xd_string */
> ^
> syncgroup_scanner.l:24:12: warning: ‘xd_len’ defined but not used 
> [-Wunused-variable]
>  static int xd_len; /* string length of xd_string  */


Some random comments follow.


Commnents for the lexer part.

===
> +node_name[^\ \,\[\]]

This accepts 'abc^Id' as a name, which is wrong behavior (but
such appliction names are not allowed anyway. If you assume so,
I'd like to see a comment for that.). And the excessive escaping
make it hard to read a bit.  The pattern can be written as the
following more precisely. (but I don't know whether it is
generally easy to read..)

| node_name [\x20-\x7f]{-}[ \[\],]

===
The pattern name {node_name} gives me a bit
uneasiness. node_name_cont or name_chars would be preferable.

===
> [1-9][0-9]* {

I see no necessity to inhibit 0-prefixed integers as NUM. Would
you mind allowing [0-9]+ there?

===
addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
char ychar) requires differnt character types. Is there any reason
for that?

===
I personally don't like addlit*string() things for such simple
syntax but itself is acceptble enough for me. However it uses
StringInfo to hold double-quoted names, which pallocs 1024 bytes
of memory chunk for every double-quoted name. The chunks are
finally stacked up left uncollected until the current
memorycontext is deleted or reset (It is deleted just after
finishing config file processing). Addition to that, setting
s_s_names runs the parser twice. It seems to me too greedy and
seems that static char [NAMEDATALEN] is enough using the v12 way
without palloc/repalloc.


Comments for parser part.

===
The rule "result" in syncgruop_gram.y sets malloced chunk to
SyncRepStandbys ignoring exiting content so repetitive setting to
the gud s_s_names causes a memory leak. Using
SyncRepClearStandbyGroupList would be enough.

===
The meaning of SyncGroupNode.type seems oscure. The member seems
to be referred to decide how to treat the node, but the following
code will break the assumption.

> group_node->type = SYNC_REP_GROUP_GROUP | SYNC_REP_GROUP_MAIN;

It seems me that *_MAIN is an equivalent of *_GROUP &&
sync_method = *_PRIORITY. If so, *_MAIN is useless. The reader of
SyncGroupNode needs not to see wheter it was in traditional
s_s_names or in new format.

===
Bare names in s_s_names are down-cased and double-quoted ones are
not. The parser of this patch doesn't for both.

===
xd_stringdup() doesn't make a copy of the string against its
name. It's error-prone.

===
I found that the name SyncGroupName.wait_num is not
instinctive. How about sync_num, sync_member_num or
sync_standby_num? If the last is preferable, .members also should
be .standbys .


Comment for the quorum commit body part.
===
I am quite uncomfortable with the existence of
WanSnd.sync_standby_priority. It represented the pirority in the
old linear s_s_names format but nested groups or even
single-level quarum list obviously doesn't fit it. Can we get rid
of sync_standby_priority, even though we realize atmost
n-priority for now?

===
The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
have specific code for every prioritizing method (which are
priority, quorum, nested and so). Is there any reson to use it as
a callback of SyncGroupNode?



Others - random commnets
===
SyncRepClearStandbyGroupList is defined in syncrep.c but the
other related functions are defined in syncgroup_gram.y. It would
be better to place them together.

===
SyncRepStandbys are to be in multilevel and the struct is
naturally allowed to be so but SyncRepClearStandbyGroupList
assumes it in single level. Make the function to free multilevel
or explicitly inhibit multilevel using asserttion.

===
-   errdetail("The transaction has already committed locally, but might 
not have been replicated to the standby.")));
+   errdetail("The transaction has already committed locally, but might 
not have been replicated to the standby(s).")));

The message doesn't contain specific number of standbys so just
using plural seems to be enough for me. And besides, the message
should describe the situation more precisely. Word correction is
left to anyone else:)

+   errdetail("The transaction has already committed locally, but might 
not have been replicated to some of the required standbys.")));

===
+ * 

Re: [HACKERS] Relation extension scalability

2016-02-29 Thread Dilip Kumar
On Wed, Feb 10, 2016 at 7:06 PM, Dilip Kumar  wrote:

>
I have tested Relation extension patch from various aspects and performance
results and other statistical data are explained in the mail.

Test 1: Identify the Heavy Weight lock is the Problem or the Actual Context
Switch
1. I converted the RelationExtensionLock to simple LWLock and tested with
single Relation. Results are as below

This is the simple script of copy 1 record in one transaction of size 4
Bytes
clientbaselwlockmulti_extend by 50 block
1155156160
2282276284
4248319428
8161267675
16   143241889

LWLock performance is better than base, obvious reason may be because we
have saved some instructions by converting to LWLock but it don't scales
any better compared to base code.


Test2: Identify that improvement in case of multiextend is becuase of
avoiding context switch or some other factor, like reusing blocks b/w
backend by putting in FSM..

1. Test by just extending multiple blocks and reuse in it's own backend
(Don't put in FSM)
Insert 1K record data don't fits in shared buffer 512MB Shared Buffer


ClientBaseExtend 800 block self useExtend 1000 Block
1  117  131 118
2  111  203 140
351  242 178
451  231 190
552  259 224
651  263 243
743  253  254
843  240  254
16  40  190  243

We can see the same improvement in case of self using the blocks also, It
shows that Sharing the blocks between the backend was not the WIN but
avoiding context switch was the measure win.

2. Tested the Number of ProcSleep during the Run.
This is the simple script of copy 1 record in one transaction of size 4
Bytes

* BASE CODE*
*PATCH MULTI EXTEND*
ClientBase_TPSProcSleep CountExtend By 10 BlockProc
Sleep Count
2280   457,506
31162,641
32351,098,701
358   141,624
42161,155,735
368   188,173

What we can see in above test that, in Base code performance is degrading
after 2 threads, while Proc Sleep count in increasing with huge amount.

Compared to that in Patch, with extending 10 blocks at a time Proc Sleep
reduce to ~1/8 and we can see it is constantly scaling.

Proc Sleep test for Insert test when data don't fits in shared buffer and
inserting big record of 1024 bytes, is currently running
once I get the data will post the same.

Posting the re-based version and moving to next CF.

Open points:
1. After getting the Lock recheck the FSM if some other back end has
already added extra blocks and reuse them.
2. Is it good idea to have user level parameter for extend_by_block or we
can try some approach to internally identify how many blocks are needed and
as per the need only add the blocks, this will make it more flexible.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 86b9ae1..78e81dd 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -268,6 +268,16 @@ static relopt_int intRelOpts[] =
 #endif
 	},
 
+	{
+		{
+			"extend_by_blocks",
+			"Number of blocks to be added to relation in every extend call",
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
+		},
+		1, 1, 1
+	},
+
 	/* list terminator */
 	{{NULL}}
 };
@@ -1291,7 +1301,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
 		{"user_catalog_table", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, user_catalog_table)}
+		offsetof(StdRdOptions, user_catalog_table)},
+		{"extend_by_blocks", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, extend_by_blocks)}
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, );
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..eb3ce17 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -238,6 +238,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	BlockNumber targetBlock,
 otherBlock;
 	bool		needLock;
+	int			extraBlocks;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -443,25 +444,50 @@ RelationGetBufferForTuple(Relation relation, 

[HACKERS] A trivial fix on extensiblenode

2016-02-29 Thread Kouhei Kaigai
Hello,

RegisterExtensibleNodeMethods() initializes its hash table
with keysize=NAMEDATALEN, instead of EXTNODENAME_MAX_LEN.

The attached patch fixes it.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 




pgsql-v9.6-trivial-fix-extensiblenode.patch
Description: pgsql-v9.6-trivial-fix-extensiblenode.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-02-29 Thread Thomas Munro
On Mon, Feb 29, 2016 at 9:05 PM, Amit Langote
 wrote:
>
> Hi Thomas,
>
> On 2016/02/29 15:20, Thomas Munro wrote:
>> Thanks for looking at the patch!  Here is a new version with the
>> following changes:
>>
>> 1.  Some draft user documentation has been added, as requested.
>
> Just to clarify, in:
>
> + servers.  A transaction that is run with
> causal_reads set
> + to on is guaranteed either to see the effects of all
> + completed transactions run on the primary with the setting on, or to
> + receive an error "standby is not available for causal reads".
>
> "A transaction that is run" means "A transaction that is run on a
> standby", right?

Well, it could be any server, standby or primary.  Of course standbys
are the interesting case since it it was already true that if you run
two sequential transactions run on the primary, the second can see the
effect of the first, but I like the idea of a general rule that
applies anywhere, allowing you not to care which server it is.

> By the way, is there some discussion in our existing
> documentation to refer to about causal consistency in single node case?  I
> don't know maybe that will help ease into the new feature.  Grepping the
> existing source tree doesn't reveal the term "causal", so maybe even a
> single line in the patch mentioning "single node operation trivially
> implies (or does it?) causal consistency" would help.  Thoughts?

Hmm.  Where should such a thing go?  I probably haven't introduced the
term well enough.  I thought for a moment about putting something
here:

http://www.postgresql.org/docs/devel/static/sql-commit.html

"All changes made by the transaction become visible to others ..." --
which others?  But I backed out, that succinct account of COMMIT is 20
years old, and in any case visibility is tied to committing, not
specifically to the COMMIT command.  But perhaps this patch really
should include something there that refers back to the causal reads
section.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-02-29 Thread Amit Langote

Hi Thomas,

On 2016/02/29 15:20, Thomas Munro wrote:
> Thanks for looking at the patch!  Here is a new version with the
> following changes:
> 
> 1.  Some draft user documentation has been added, as requested.

Just to clarify, in:

+ servers.  A transaction that is run with
causal_reads set
+ to on is guaranteed either to see the effects of all
+ completed transactions run on the primary with the setting on, or to
+ receive an error "standby is not available for causal reads".

"A transaction that is run" means "A transaction that is run on a
standby", right?  By the way, is there some discussion in our existing
documentation to refer to about causal consistency in single node case?  I
don't know maybe that will help ease into the new feature.  Grepping the
existing source tree doesn't reveal the term "causal", so maybe even a
single line in the patch mentioning "single node operation trivially
implies (or does it?) causal consistency" would help.  Thoughts?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers