date:20170614

2017-06-14 19:56 GMT+02:00 Andres Freund :

> On 2017-06-12 10:32:57 -0400, Tom Lane wrote:
> > "Daniel Verite"  writes:
> > > PGC_POSTMASTER implies that it's an instance-wide setting.
> > > Is is intentional? I can understand that it's more secure for this not
> to
> > > be changeable in an existing session, but it's also much less usable
> if you
> > > can't set it per-database and per-user.
> > > Maybe it should be PGC_SUSET ?
> >
> > Bearing in mind that I'm not really for this at all...
>
> FWIW, I agree that this isn't something we should do.  For one the GUC
> would really have to be GUC_REPORT, which'll cost everyone, and will
> break things like pgbouncer.   I also don't think it's a good solution to
> the problem at hand - there *are* cases where application
> *intentionally* use PQexec() with multiple statements, namely when
> aggregate latency is an issue. Since it's an application writer's choice
> whether to use it, it seems to make not that much sense to have a
> serverside guc - it can't really be sensible set.   If you want to do
> something here, you should probably work on convincing ORM etc. writers
> to use PQexecParams().
>

sometimes you are without possibility to check a control what application
does. The tools on server side is one possibility.

Regards

Pavel


>
> Greetings,
>
> Andres Freund
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] v10beta pg_catalog diagrams

2017-06-15 5:06 GMT+02:00 Andres Freund :

> On 2017-06-14 20:02:27 -0700, Andres Freund wrote:
> > On June 14, 2017 7:53:05 PM PDT, Pavel Stehule 
> wrote:
> > >2017-06-14 19:49 GMT+02:00 Andres Freund :
> > >> > These constraints can slowdown creating/dropping database objects -
> > >> mainly
> > >> > temp tables.
> > >>
> > >> How so?
> > >>
> > >
> > >execution RI triggers
> >
> > Those would obviously bit be fired, given Peter's description?
>
> Gah, stupid autocorrect.  *not* be fired.  Unless you mean something
> else than the speed of catalog manipulations themselves, in which case I
> still don't understand.
>

No, just this. I missed it.

Regards

Pavel


>
> - Andres
>

Re: [HACKERS] improve release-note for pg_current_logfile()

2017-06-14 Thread Tatsuo Ishii

>> Add function pg_current_logfile() to read syslog's current stderr and csvlog 
>> output file names (Gilles Darold)
> 
> This confused me because "syslog" is one of method for logging as well
> as stderr and csvlog. I guess it is intended syslogger, but I think that
> "logging collector" is more convenient for users because this is used in
> the pg_current_logfile() documentation.

His proposal is changing the line above to:

Add function pg_current_logfile() to read logging collector's current stderr 
and csvlog output file names (Gilles Darold)

Looks reasonable fix to me. If there's no objection, I will commit
this.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] v10beta pg_catalog diagrams

On 2017-06-14 20:02:27 -0700, Andres Freund wrote:
> On June 14, 2017 7:53:05 PM PDT, Pavel Stehule  
> wrote:
> >2017-06-14 19:49 GMT+02:00 Andres Freund :
> >> > These constraints can slowdown creating/dropping database objects -
> >> mainly
> >> > temp tables.
> >>
> >> How so?
> >>
> >
> >execution RI triggers
> 
> Those would obviously bit be fired, given Peter's description?

Gah, stupid autocorrect.  *not* be fired.  Unless you mean something
else than the speed of catalog manipulations themselves, in which case I
still don't understand.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] v10beta pg_catalog diagrams

2017-06-15 5:02 GMT+02:00 Andres Freund :

>
>
> On June 14, 2017 7:53:05 PM PDT, Pavel Stehule 
> wrote:
> >2017-06-14 19:49 GMT+02:00 Andres Freund :
> >
> >> On 2017-06-14 06:05:24 +0200, Pavel Stehule wrote:
> >> > 2017-06-14 5:53 GMT+02:00 Peter Eisentraut <
> >> peter.eisentr...@2ndquadrant.com
> >> > >:
> >> >
> >> > > On 6/13/17 17:08, Andres Freund wrote:
> >> > > > I wondered before if we shouldn't introduce "information only"
> >> > > > unenforced foreign key constraints for the catalogs.  We kind
> >of
> >> > > > manually do that via oidjoins, it'd be nicer if we'd a function
> >> > > > rechecking fkeys, and the fkeys were in the catalog...
> >> > >
> >> > > I don't see why we couldn't just add a full complement of primary
> >and
> >> > > foreign key constraints (and unique constraints and perhaps some
> >check
> >> > > constraints).  The argument is that they wouldn't normally do
> >anything,
> >> > > but they would help with documentation and browsing tools, and
> >they
> >> > > wouldn't hurt anything.
> >>
> >> Well, unique constraints are a bit more complicated because they rely
> >on
> >> an index, and we wouldn't e.g. maintain indexes with WHERE clauses or
> >> other expressions correctly.  I'd be a bit wary of declaring such
> >> indexes as actually being fully valid, because we have planner logic
> >> that does planning based on various constraints now, it'd certainly
> >be
> >> annoying if some "re-check constraint" type queries would actually
> >have
> >> their joins optimized away or such...
> >>
> >> > These constraints can slowdown creating/dropping database objects -
> >> mainly
> >> > temp tables.
> >>
> >> How so?
> >>
> >
> >execution RI triggers
>
> Those would obviously bit be fired, given Peter's description?
>

ok


>
> Andres
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

Re: [HACKERS] v10beta pg_catalog diagrams



On June 14, 2017 7:53:05 PM PDT, Pavel Stehule  wrote:
>2017-06-14 19:49 GMT+02:00 Andres Freund :
>
>> On 2017-06-14 06:05:24 +0200, Pavel Stehule wrote:
>> > 2017-06-14 5:53 GMT+02:00 Peter Eisentraut <
>> peter.eisentr...@2ndquadrant.com
>> > >:
>> >
>> > > On 6/13/17 17:08, Andres Freund wrote:
>> > > > I wondered before if we shouldn't introduce "information only"
>> > > > unenforced foreign key constraints for the catalogs.  We kind
>of
>> > > > manually do that via oidjoins, it'd be nicer if we'd a function
>> > > > rechecking fkeys, and the fkeys were in the catalog...
>> > >
>> > > I don't see why we couldn't just add a full complement of primary
>and
>> > > foreign key constraints (and unique constraints and perhaps some
>check
>> > > constraints).  The argument is that they wouldn't normally do
>anything,
>> > > but they would help with documentation and browsing tools, and
>they
>> > > wouldn't hurt anything.
>>
>> Well, unique constraints are a bit more complicated because they rely
>on
>> an index, and we wouldn't e.g. maintain indexes with WHERE clauses or
>> other expressions correctly.  I'd be a bit wary of declaring such
>> indexes as actually being fully valid, because we have planner logic
>> that does planning based on various constraints now, it'd certainly
>be
>> annoying if some "re-check constraint" type queries would actually
>have
>> their joins optimized away or such...
>>
>> > These constraints can slowdown creating/dropping database objects -
>> mainly
>> > temp tables.
>>
>> How so?
>>
>
>execution RI triggers

Those would obviously bit be fired, given Peter's description?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assorted leaks of PGresults

2017-06-14 Thread Michael Paquier

On Thu, Jun 15, 2017 at 9:36 AM, Tom Lane  wrote:
> Coverity complained that libpqwalreceiver.c's libpqrcv_PQexec()
> potentially leaks a PGresult.  It's right: if PQconsumeInput() fails after
> we've already collected at least one PGresult, the function just returns
> NULL without remembering to free last_result.  That's easy enough to fix,
> just insert "PQclear(lastResult)" before the return.  I do not think this
> is a significant leak, because the walreceiver process would just exit
> anyway after the failure.  But we ought to fix it in case somebody copies
> the logic into someplace less forgiving.

Indeed.

> In fact ... I looked through other callers of PQconsumeInput() to see if
> we had the same wrong coding pattern elsewhere, and we do, in two places
> in postgres_fdw/connection.c.  One of those is new as of last week, in
> commit ae9bfc5d6.

It may be actually the reason why coverity complained about the one in
libpqwalreceiver.c. I have noticed a couple of times in the past that
similar code patterns are more likely to be checked again if one of
them is added or changed. Impossible to say if that's true, but I
suspect that there is some underground intelligence to improve failure
detection, or at least re-check them.

> What's worse, in those cases we have code inside the
> loop that might throw elog(ERROR), resulting in a permanently leaked
> PGresult --- and since the backend process would keep going, this could
> possibly be repeated and build up to a leak that amounted to something
> significant.  We need to have PG_TRY blocks in both these places that
> ensure that any held PGresult gets freed before we exit the functions.
>
> Further review showed an additional leak introduced by ae9bfc5d6, to wit
> that pgfdw_exec_cleanup_query() just forgets to clear the PGresult
> returned by pgfdw_get_cleanup_result() in its normal non-error path.
>
> In short, I think we need the attached patch in HEAD.  It needs to be
> back-patched too, though the code seems to be a bit different in the
> back branches.

I had a look at this patch, checking the surroundings and I am not
noticing any other leaks in postgres_fdw or other modules.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] v10beta pg_catalog diagrams

2017-06-14 19:49 GMT+02:00 Andres Freund :

> On 2017-06-14 06:05:24 +0200, Pavel Stehule wrote:
> > 2017-06-14 5:53 GMT+02:00 Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com
> > >:
> >
> > > On 6/13/17 17:08, Andres Freund wrote:
> > > > I wondered before if we shouldn't introduce "information only"
> > > > unenforced foreign key constraints for the catalogs.  We kind of
> > > > manually do that via oidjoins, it'd be nicer if we'd a function
> > > > rechecking fkeys, and the fkeys were in the catalog...
> > >
> > > I don't see why we couldn't just add a full complement of primary and
> > > foreign key constraints (and unique constraints and perhaps some check
> > > constraints).  The argument is that they wouldn't normally do anything,
> > > but they would help with documentation and browsing tools, and they
> > > wouldn't hurt anything.
>
> Well, unique constraints are a bit more complicated because they rely on
> an index, and we wouldn't e.g. maintain indexes with WHERE clauses or
> other expressions correctly.  I'd be a bit wary of declaring such
> indexes as actually being fully valid, because we have planner logic
> that does planning based on various constraints now, it'd certainly be
> annoying if some "re-check constraint" type queries would actually have
> their joins optimized away or such...
>
> > These constraints can slowdown creating/dropping database objects -
> mainly
> > temp tables.
>
> How so?
>

execution RI triggers

Regards

Pavel


>
> - Andres
>

Re: [HACKERS] Update comments in nodeModifyTable.c

2017-06-14 Thread Etsuro Fujita


On 2017/06/07 0:30, Robert Haas wrote:

On Mon, Jun 5, 2017 at 4:45 AM, Etsuro Fujita
 wrote:

While working on [1], I noticed that the comment in ExecModifyTable:

  * Foreign table updates have a wholerow attribute when the
  * relation has an AFTER ROW trigger.

is not 100% correct because a foreign table has a wholerow attrubute when
the relation has an AFTER ROW or BEFORE ROW trigger (see
rewriteTargetListUD).  So I'd propose s/an AFTER ROW trigger/a row-level
trigger/.  Attached is a patch for that.


That seems better, but looking at rewriteTargetListUD, it seems that
the actual rule is that this happens when there is a row-level on
either UPDATE or DELETE.  If there is only a row-level trigger on
INSERT, then it is not done.  Perhaps we should try to include that
detail in the comment as well.


Agreed, but I think it's better to add that detail to this comment in 
ExecInitModifyTable:


 * Initialize the junk filter(s) if needed.  INSERT queries need a 
filter

 * if there are any junk attrs in the tlist.  UPDATE and DELETE always
 * need a filter, since there's always a junk 'ctid' or 'wholerow'
 * attribute present --- no need to look first.

I'd also like to propose to update the third sentence in this comment, 
since there isn't necessarily a ctid or wholerow in the UPDATE/DELETE 
tlist when the result relation is a foreign table.


Attached is an updated version of the patch.

Best regards,
Etsuro Fujita
diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index cf555fe..07bc870 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1554,7 +1554,7 @@ ExecModifyTable(ModifyTableState *node)
 * the old relation tuple.
 *
 * Foreign table updates have a wholerow 
attribute when the
-* relation has an AFTER ROW trigger.  Note 
that the wholerow
+* relation has a row-level trigger.  Note that 
the wholerow
 * attribute does not carry system columns.  
Foreign table
 * triggers miss seeing those, except that we 
know enough here
 * to set t_tableOid.  Quite separately from 
this, the FDW may
@@ -2025,7 +2025,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
 * Initialize the junk filter(s) if needed.  INSERT queries need a 
filter
 * if there are any junk attrs in the tlist.  UPDATE and DELETE always
 * need a filter, since there's always a junk 'ctid' or 'wholerow'
-* attribute present --- no need to look first.
+* attribute present if not foreign table, and if foreign table, there
+* are always junk attributes present the FDW needs to identify the 
exact
+* row to update or delete --- no need to look first.  For foreign 
tables,
+* there's also a wholerow attribute when the relation has a row-level
+* trigger on UPDATE/DELETE but not on INSERT.
 *
 * If there are multiple result relations, each one needs its own junk
 * filter.  Note multiple rels are only possible for UPDATE/DELETE, so 
we
@@ -2093,7 +2097,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
else if (relkind == 
RELKIND_FOREIGN_TABLE)
{
/*
-* When there is an AFTER 
trigger, there should be a
+* When there is a row-level 
trigger, there should be a
 * wholerow attribute.
 */
j->jf_junkAttNo = 
ExecFindJunkAttribute(j, "wholerow");

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] type of release note of PG10

2017-06-14 Thread Tatsuo Ishii

> Hi,
> 
> I found a typo in the PG10 release note and attached is a patch
> to fix it.

Fix committed. Thanks!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is rotten in publication drop

2017-06-14 Thread Noah Misch

On Fri, Jun 09, 2017 at 11:45:58AM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 6/8/17 23:53, Tom Lane wrote:
> >> ! ERROR:  publication "addr_pub" does not exist
> 
> > The \d+ command attempts to print out any publications that the relation
> > is part of.  To find the publications it is part of, it runs this query:
> 
> > "SELECT pub.pubname\n"
> > " FROM pg_catalog.pg_publication pub,\n"
> > "  pg_catalog.pg_get_publication_tables(pub.pubname)\n"
> > "WHERE relid = '%s'\n"
> > "ORDER BY 1;",
> 
> > pg_get_publication_tables() calls GetPublicationByName(), which throws
> > this error.
> 
> > So I suppose that if a publication is dropped between the time
> > pg_publication is read and the function is called, you could get this error.
> 
> Yeah, I'd suspected as much but not tracked it down yet.
> 
> > How could we improve that?
> 
> What we've done in many comparable situations is to allow a
> catalog-probing function to return NULL instead of failing
> when handed an OID or other identifier that it can't locate.
> Here it seems like pg_get_publication_tables() needs to use
> missing_ok = TRUE and then return zero rows for a null result.
> Arguably, a nonexistent publication publishes no tables, so
> it's not clear that's not the Right Thing anyway.
> 
> BTW, isn't the above command a hugely inefficient way of finding
> the publications for the target rel?  Unless you've got a rather
> small number of rather restricted publications, seems like it's
> going to take a long time.  Maybe we don't care too much about
> manual invocations of \d+, but I bet somebody will carp if there's
> not a better way to find this out.  Maybe a better answer is to
> define a more suitable function pg_publications_for_table(relid)
> and let it have the no-error-for-bad-OID behavior.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-14 Thread Noah Misch

On Tue, Jun 13, 2017 at 11:16:54AM +0900, Michael Paquier wrote:
> On Thu, Jun 8, 2017 at 1:15 PM, Michael Paquier
>  wrote:
> > 3) In walreceiver.c's pg_stat_get_wal_receiver's:
> > - Launch pg_stat_get_wal_receiver and take a checkpoint on it.
> > - Pass the lookups of pid and ready_to_display.
> > - Make the WAL receiver stop.
> > - The view reports inconsistent data. If the WAL receiver data was
> > just initialized, the caller would get back PID values or similar 0.
> > Particularly a WAL receiver marked with !ready_to_display could show
> > data to the caller. That's not cool.
> 
> This can actually leak password information to any user who is a
> member of the group DEFAULT_ROLE_READ_ALL_STATS, though the window to
> put hands on this password information is very small, it can be
> possible if the WAL receiver gets down and restarted for a reason or
> another during a maintenance window for example:
> 1) The user queries pg_stat_wal_receiver, for example take a
> breakpoint just after the check on walrcv->ready_to_display.
> 2) Restart the primary once, forcing the standby to spawn a new WAL receiver.
> 3) Breakpoint on the WAL receiver process, with something like that to help:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -243,6 +243,7 @@ WalReceiverMain(void)
> 
> /* Fetch information required to start streaming */
> walrcv->ready_to_display = false;
> +   pg_usleep(1000L); /* 10s */
> strlcpy(conninfo, (char *) walrcv->conninfo, MAXCONNINFO);
> strlcpy(slotname, (char *) walrcv->slotname, NAMEDATALEN);
> startpoint = walrcv->receiveStart;
> 4) continue the session querying pg_stat_wal_receiver, at this stage
> it has information with the WAL receiver data set as
> !ready_to_display. If the connection string includes a password, this
> becomes visible as well.
> 
> If queried at high frequency, pg_stat_wal_receiver may show up some
> information. Postgres 9.6 includes this leak as well, but there is no real
> leak non-superusers will just see NULL fields for this view. In Postgres
> 10 though, any member of this group for statistics can see leaking
> information. Based on that, this is an open item, so I have added it back
> now to the list, moved from the section for older bugs.

Formally, the causative commit is the one that removed the superuser() test,
namely 25fff40.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Simon,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 weekly progress reports (week 2)

2017-06-14 Thread Alvaro Herrera

Shubham Barai wrote:
> Hi,
> 
> I have made some changes in tests and pushed them to my branch.
> 
> Thanks for helping me out with testing.
> 
> Now, current head produces false positives but, with my patch, it doesn't.
> 
> Here is the link for updated tests:
> https://github.com/shubhambaraiss/postgres/commit/2c02685a50a2b30654beb5c52542a57a46219c39

Nice.  Please provide patches, so that it can be considered for commit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-14 Thread Masahiko Sawada

On Thu, Jun 15, 2017 at 7:35 AM, Petr Jelinek
 wrote:
> On 13/06/17 21:49, Peter Eisentraut wrote:
>> On 6/13/17 02:33, Noah Misch wrote:
 Steps to reproduce -
 X cluster -> create 100 tables , publish all tables  (create publication 
 pub
 for all tables);
 Y Cluster -> create 100 tables ,create subscription(create subscription sub
 connection 'user=centos host=localhost' publication pub;
 Y cluster ->drop subscription - drop subscription sub;

 check the log file on Y cluster.

 Sometime , i have seen this error on psql prompt and drop subscription
 operation got failed at first attempt.

 postgres=# drop subscription sub;
 ERROR:  tuple concurrently updated
 postgres=# drop subscription sub;
 NOTICE:  dropped replication slot "sub" on publisher
 DROP SUBSCRIPTION
>>>
>>> [Action required within three days.  This is a generic notification.]
>>
>> It's being worked on.  Let's see by Thursday.
>>
>
> Attached fixes it (it was mostly about order of calls). I also split the
> SetSubscriptionRelState into 2 separate interface while I was changing
> it, because now that the update_only bool was added it has become quite
> strange to have single interface for what is basically two separate
> functions.
>
> There are still couple of remaining issues from this thread though.
> Namely the AccessExclusiveLock of the pg_subscription catalog which is
> not very pretty, but we need a way to block launcher from accessing the
> subscription which is being dropped and make sure it will not start new
> workers for it afterwards. Question is how however as by the time
> launcher can lock individual subscription it is already processing it.
> So it looks to me like we'd need to reread the catalog with new snapshot
> after the lock was acquired which seems bit wasteful (I wonder if we
> could just AcceptInvalidationMessages and refetch from syscache). Any
> better ideas?
>
> Other related problem is locking of subscriptions during operations on
> them, especially AlterSubscription seems like it should lock the
> subscription itself. I did that in 0002.
>

Thank you for the patch! Sorry I don't have a time for it today but
I'll review these patches tomorrow.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Assorted leaks of PGresults

Coverity complained that libpqwalreceiver.c's libpqrcv_PQexec()
potentially leaks a PGresult.  It's right: if PQconsumeInput() fails after
we've already collected at least one PGresult, the function just returns
NULL without remembering to free last_result.  That's easy enough to fix,
just insert "PQclear(lastResult)" before the return.  I do not think this
is a significant leak, because the walreceiver process would just exit
anyway after the failure.  But we ought to fix it in case somebody copies
the logic into someplace less forgiving.

In fact ... I looked through other callers of PQconsumeInput() to see if
we had the same wrong coding pattern elsewhere, and we do, in two places
in postgres_fdw/connection.c.  One of those is new as of last week, in
commit ae9bfc5d6.  What's worse, in those cases we have code inside the
loop that might throw elog(ERROR), resulting in a permanently leaked
PGresult --- and since the backend process would keep going, this could
possibly be repeated and build up to a leak that amounted to something
significant.  We need to have PG_TRY blocks in both these places that
ensure that any held PGresult gets freed before we exit the functions.

Further review showed an additional leak introduced by ae9bfc5d6, to wit
that pgfdw_exec_cleanup_query() just forgets to clear the PGresult
returned by pgfdw_get_cleanup_result() in its normal non-error path.

In short, I think we need the attached patch in HEAD.  It needs to be
back-patched too, though the code seems to be a bit different in the
back branches.

regards, tom lane

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 1b691fb..9818d27 100644
*** a/contrib/postgres_fdw/connection.c
--- b/contrib/postgres_fdw/connection.c
*** pgfdw_exec_query(PGconn *conn, const cha
*** 485,491 
   *
   * This function offers quick responsiveness by checking for any interruptions.
   *
!  * This function emulates the PQexec()'s behavior of returning the last result
   * when there are many.
   *
   * Caller is responsible for the error handling on the result.
--- 485,491 
   *
   * This function offers quick responsiveness by checking for any interruptions.
   *
!  * This function emulates PQexec()'s behavior of returning the last result
   * when there are many.
   *
   * Caller is responsible for the error handling on the result.
*** pgfdw_exec_query(PGconn *conn, const cha
*** 493,532 
  PGresult *
  pgfdw_get_result(PGconn *conn, const char *query)
  {
! 	PGresult   *last_res = NULL;
  
! 	for (;;)
  	{
! 		PGresult   *res;
! 
! 		while (PQisBusy(conn))
  		{
! 			int			wc;
  
! 			/* Sleep until there's something to do */
! 			wc = WaitLatchOrSocket(MyLatch,
!    WL_LATCH_SET | WL_SOCKET_READABLE,
!    PQsocket(conn),
!    -1L, PG_WAIT_EXTENSION);
! 			ResetLatch(MyLatch);
  
! 			CHECK_FOR_INTERRUPTS();
  
! 			/* Data available in socket */
! 			if (wc & WL_SOCKET_READABLE)
! 			{
! if (!PQconsumeInput(conn))
! 	pgfdw_report_error(ERROR, NULL, conn, false, query);
  			}
- 		}
  
! 		res = PQgetResult(conn);
! 		if (res == NULL)
! 			break;/* query is complete */
  
  		PQclear(last_res);
! 		last_res = res;
  	}
  
  	return last_res;
  }
--- 493,542 
  PGresult *
  pgfdw_get_result(PGconn *conn, const char *query)
  {
! 	PGresult   *volatile last_res = NULL;
  
! 	/* In what follows, do not leak any PGresults on an error. */
! 	PG_TRY();
  	{
! 		for (;;)
  		{
! 			PGresult   *res;
  
! 			while (PQisBusy(conn))
! 			{
! int			wc;
  
! /* Sleep until there's something to do */
! wc = WaitLatchOrSocket(MyLatch,
! 	   WL_LATCH_SET | WL_SOCKET_READABLE,
! 	   PQsocket(conn),
! 	   -1L, PG_WAIT_EXTENSION);
! ResetLatch(MyLatch);
  
! CHECK_FOR_INTERRUPTS();
! 
! /* Data available in socket? */
! if (wc & WL_SOCKET_READABLE)
! {
! 	if (!PQconsumeInput(conn))
! 		pgfdw_report_error(ERROR, NULL, conn, false, query);
! }
  			}
  
! 			res = PQgetResult(conn);
! 			if (res == NULL)
! break;			/* query is complete */
  
+ 			PQclear(last_res);
+ 			last_res = res;
+ 		}
+ 	}
+ 	PG_CATCH();
+ 	{
  		PQclear(last_res);
! 		PG_RE_THROW();
  	}
+ 	PG_END_TRY();
  
  	return last_res;
  }
*** pgfdw_exec_cleanup_query(PGconn *conn, c
*** 1006,1011 
--- 1016,1022 
  		pgfdw_report_error(WARNING, result, conn, true, query);
  		return ignore_errors;
  	}
+ 	PQclear(result);
  
  	return true;
  }
*** pgfdw_exec_cleanup_query(PGconn *conn, c
*** 1028,1083 
  static bool
  pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime, PGresult **result)
  {
! 	PGresult   *last_res = NULL;
  
! 	for (;;)
  	{
! 		PGresult   *res;
! 
! 		while (PQisBusy(conn))
  		{
! 			int			wc;
! 			TimestampTz now = GetCurrentTimestamp();
! 			long		secs;
! 			int			microsecs;
! 			long		cur_timeout;
  
! 			/* If timeout has expi

Re: [HACKERS] logical replication: \dRp+ and "for all tables"

2017-06-14 Thread Mark Kirkwood


On 15/06/17 11:10, Tom Lane wrote:


Jeff Janes  writes:

On Sat, Jun 10, 2017 at 7:42 AM, Tom Lane  wrote:

In the second place, this really fails to respond to what I'd call
the main usability problem with \dRp+, which is that the all-tables
property is likely to lead to an unreadably bulky list of affected tables.
What I'd say the patch ought to do is *replace* \dRp+'s list of affected
tables with a notation like "(all tables)" when puballtables is true.

I'd considered that, but I find the pager does a fine job of dealing with
the bulkiness of the list.

Have you tried it with a few tens of thousands of tables?  Even if your
pager makes it work comfortably, others might find it less satisfactory.


I thought it might be a good idea to not only
point out that it is all tables, but also remind people of exactly what
tables those are currently (in case it had slipped their mind that all
tables will include table from other schemas not in their search_path, for
example)

I'm not really buying that.  If they don't know what "all tables" means,
a voluminous list isn't likely to help much.

I was hoping we'd get some more votes in this thread, but it seems like
we've only got three, and by my count two of them are for just printing
"all tables".




I'd certainly prefer to see 'all tables' - in addition to being more 
compact, it also reflects more correctly how the publication was defined.


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication: \dRp+ and "for all tables"

2017-06-14 Thread David G. Johnston

On Wed, Jun 14, 2017 at 4:10 PM, Tom Lane  wrote:

> I was hoping we'd get some more votes in this thread, but it seems like
> we've only got three, and by my count two of them are for just printing
> "all tables".

The following looks right - given a publication it would nice to know if
its for all tables or not.

  \dRp
!List of publications
! Name|  Owner   | All Tables | Inserts |
Updates | Deletes
!
+--++-+-+-
!  testpib_ins_trunct | regress_publication_user | f  | t   | f
  | f
!  testpub_default| regress_publication_user | f  | t   | t
  | t

This [I couldn't find a regression diff entry where "All Tables" is true :(
...]

  \dRp+ testpub3
!Publication testpub3
!  All Tables | Inserts | Updates | Deletes
! +-+-+-
!  f  | t   | t   | t
  Tables:
  "public.testpub_tbl3"
  "public.testpub_tbl3a"

I agree with Tom and Masahiko Sawada, if "All Tables" is false we continue
to show "Tables:\n\t[tables]" but when "All Tables" is true we'd write
something like "Tables: All Tables in Database".  The user can query the
database if they wish to know the names of all those tables exactly.

I suppose we could go further here, say by simplifying "public.tbl1,
public.tbl2", when public only contains two tables, to "public.*".  Or
consider never listing more than some small number of rows and provide a
"show all" option (\dRp++ or just a function/view) that would list every
single table.  But I would go with the default "+" behavior being to show
table names when the listing of tables is fixed and to say "All Tables in
Database" when it is dynamic.  In "+" mode that makes the "All Tables"
boolean redundant though I'd keep it around for consistency.

David J.

Re: [HACKERS] subscription worker signalling wal writer too much

On 2017-06-14 16:24:27 -0700, Jeff Janes wrote:
> On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund  wrote:
> 
> > On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes 
> > wrote:
> > >
> > > > If I publish a pgbench workload and subscribe to it, the subscription
> > > > worker is signalling the wal writer thousands of times a second, once
> > for
> > > > every async commit.  This has a noticeable performance cost.
> > > >
> > >
> > > I've used a local variable to avoid waking up the wal writer more than
> > once
> > > for the same page boundary.  This reduces the number of wake-ups by about
> > > 7/8.
> >
> > Maybe I'm missing something here, but isn't that going to reduce our
> > guarantees about when asynchronously committed xacts are flushed out?
> > You can easily fit a number of commits into the same page...   As this
> > isn't specific to logical-rep, I don't think that's ok.
> >
> 
> The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
> this changes that. (Also, it isn't really a guarantee, the fsync can take
> many seconds to complete once we do initiate it, and there is absolutely
> nothing we can do about that, other than do the fsync synchronously in the
> first place).

Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
this afaics would allow wal writer to go into sleep mode with half a
page filled, and it'd not be woken up again until the page is filled.
No?


> > Have you chased down why there's that many wakeups?  Normally I'd have
> > expected that a number of the SetLatch() calls get consolidated
> > together, but I guess walwriter is "too quick" in waking up and
> > resetting the latch?

> I'll have to dig into that some more.  The 7/8 reduction I cited was just
> in calls to SetLatch from that part of the code, I didn't measure whether
> the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
> determined that reduction, so it wasn't truly wake ups I measured.  Actual
> wake ups were measured only indirectly via the impact on performance.  I'll
> need to figure out how to instrument that without distorting the
> performance too much in the process..

I'd suspect that just measuring the number of kill() calls should be
doable, if measured via perf or something like hta.t


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes

On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund  wrote:

> On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes 
> wrote:
> >
> > > If I publish a pgbench workload and subscribe to it, the subscription
> > > worker is signalling the wal writer thousands of times a second, once
> for
> > > every async commit.  This has a noticeable performance cost.
> > >
> >
> > I've used a local variable to avoid waking up the wal writer more than
> once
> > for the same page boundary.  This reduces the number of wake-ups by about
> > 7/8.
>
> Maybe I'm missing something here, but isn't that going to reduce our
> guarantees about when asynchronously committed xacts are flushed out?
> You can easily fit a number of commits into the same page...   As this
> isn't specific to logical-rep, I don't think that's ok.
>

The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
this changes that. (Also, it isn't really a guarantee, the fsync can take
many seconds to complete once we do initiate it, and there is absolutely
nothing we can do about that, other than do the fsync synchronously in the
first place).

The reason for kicking the wal writer at page boundaries is so that hint
bits can get set earlier than they otherwise could. But I don't think
kicking it multiple times per page boundary can help in that effort.

>
> Have you chased down why there's that many wakeups?  Normally I'd have
> expected that a number of the SetLatch() calls get consolidated
> together, but I guess walwriter is "too quick" in waking up and
> resetting the latch?
>

I'll have to dig into that some more.  The 7/8 reduction I cited was just
in calls to SetLatch from that part of the code, I didn't measure whether
the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
determined that reduction, so it wasn't truly wake ups I measured.  Actual
wake ups were measured only indirectly via the impact on performance.  I'll
need to figure out how to instrument that without distorting the
performance too much in the process..

Cheers,

Jeff

Re: [HACKERS] Document bug regarding read only transactions

2017-06-14 Thread Tatsuo Ishii

> It used to be true.  Tom changed it in commit
> 05d8a561ff85db1545f5768fe8d8dc9d99ad2ef7, back in 2010.

Thank you for the info. For a record, I will add it to the commit
message.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication: \dRp+ and "for all tables"

Jeff Janes  writes:
> On Sat, Jun 10, 2017 at 7:42 AM, Tom Lane  wrote:
>> In the second place, this really fails to respond to what I'd call
>> the main usability problem with \dRp+, which is that the all-tables
>> property is likely to lead to an unreadably bulky list of affected tables.
>> What I'd say the patch ought to do is *replace* \dRp+'s list of affected
>> tables with a notation like "(all tables)" when puballtables is true.

> I'd considered that, but I find the pager does a fine job of dealing with
> the bulkiness of the list.

Have you tried it with a few tens of thousands of tables?  Even if your
pager makes it work comfortably, others might find it less satisfactory.

> I thought it might be a good idea to not only
> point out that it is all tables, but also remind people of exactly what
> tables those are currently (in case it had slipped their mind that all
> tables will include table from other schemas not in their search_path, for
> example)

I'm not really buying that.  If they don't know what "all tables" means,
a voluminous list isn't likely to help much.

I was hoping we'd get some more votes in this thread, but it seems like
we've only got three, and by my count two of them are for just printing
"all tables".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-14 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Andrew> I will post a further status update before 23:59 BST on 14th
 Andrew> Jun.

Unfortunately I've been delayed over the past couple of days, but I have
Thomas' latest patchset in hand and will be working on it over the rest
of the week. Status update by 23:59 BST on Sun 18th, by which time I
hope to have everything finalized (all three issues, not just the
inheritance one).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-14 Thread Petr Jelinek

On 14/06/17 20:57, Andres Freund wrote:
> Hi,
> 
> On 2017-06-13 00:50:20 -0700, Andres Freund wrote:
>> Just to be clear: The patch, after the first point above (which I did),
>> looks ok.  I'm just looking for comments.
> 
> And pushed.
> 

Thanks!

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-14 Thread Petr Jelinek

On 13/06/17 21:49, Peter Eisentraut wrote:
> On 6/13/17 02:33, Noah Misch wrote:
>>> Steps to reproduce -
>>> X cluster -> create 100 tables , publish all tables  (create publication pub
>>> for all tables);
>>> Y Cluster -> create 100 tables ,create subscription(create subscription sub
>>> connection 'user=centos host=localhost' publication pub;
>>> Y cluster ->drop subscription - drop subscription sub;
>>>
>>> check the log file on Y cluster.
>>>
>>> Sometime , i have seen this error on psql prompt and drop subscription
>>> operation got failed at first attempt.
>>>
>>> postgres=# drop subscription sub;
>>> ERROR:  tuple concurrently updated
>>> postgres=# drop subscription sub;
>>> NOTICE:  dropped replication slot "sub" on publisher
>>> DROP SUBSCRIPTION
>>
>> [Action required within three days.  This is a generic notification.]
> 
> It's being worked on.  Let's see by Thursday.
> 

Attached fixes it (it was mostly about order of calls). I also split the
SetSubscriptionRelState into 2 separate interface while I was changing
it, because now that the update_only bool was added it has become quite
strange to have single interface for what is basically two separate
functions.

There are still couple of remaining issues from this thread though.
Namely the AccessExclusiveLock of the pg_subscription catalog which is
not very pretty, but we need a way to block launcher from accessing the
subscription which is being dropped and make sure it will not start new
workers for it afterwards. Question is how however as by the time
launcher can lock individual subscription it is already processing it.
So it looks to me like we'd need to reread the catalog with new snapshot
after the lock was acquired which seems bit wasteful (I wonder if we
could just AcceptInvalidationMessages and refetch from syscache). Any
better ideas?

Other related problem is locking of subscriptions during operations on
them, especially AlterSubscription seems like it should lock the
subscription itself. I did that in 0002.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 005eeb820f4e0528513744136582c4489e2429e3 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 14 Jun 2017 08:14:20 +0200
Subject: [PATCH 2/2] Lock subscription when altering it

---
 src/backend/commands/subscriptioncmds.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 14c8f3f..e0ec8ea 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -642,6 +642,13 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
 	   stmt->subname);
 
 	subid = HeapTupleGetOid(tup);
+
+	/*
+	 * Lock the subscription so nobody else can do anything with it (including
+	 * the replication workers).
+	 */
+	LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock);
+
 	sub = GetSubscription(subid, false);
 
 	/* Form a new tuple. */
-- 
2.7.4

From 9011698ae800e0f45f960e91f6b16eab15634fac Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Tue, 13 Jun 2017 19:26:51 +0200
Subject: [PATCH 1/2] Improve the pg_subscription_rel handling

Split the SetSubscriptionRelState into separate Add and Update
functions, removing the unsafe upsert logic as callers are supposed to
know whether they are updating or adding new row.

Reorder the code in the above mentioned functions to avoid "tuple
updated concurrently" warnings.
---
 src/backend/catalog/pg_subscription.c   | 131 +++-
 src/backend/commands/subscriptioncmds.c |  14 +--
 src/backend/replication/logical/tablesync.c |  33 ---
 src/include/catalog/pg_subscription_rel.h   |   6 +-
 4 files changed, 98 insertions(+), 86 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index c5b2541..fd19675 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -225,24 +225,15 @@ textarray_to_stringlist(ArrayType *textarray)
 }
 
 /*
- * Set the state of a subscription table.
- *
- * If update_only is true and the record for given table doesn't exist, do
- * nothing.  This can be used to avoid inserting a new record that was deleted
- * by someone else.  Generally, subscription DDL commands should use false,
- * workers should use true.
- *
- * The insert-or-update logic in this function is not concurrency safe so it
- * might raise an error in rare circumstances.  But if we took a stronger lock
- * such as ShareRowExclusiveLock, we would risk more deadlocks.
+ * Add new state record for a subscription table.
  */
 Oid
-SetSubscriptionRelState(Oid subid, Oid relid, char state,
-		XLogRecPtr sublsn, bool update_only)
+AddSubscriptionRelState(Oid subid, Oid relid, char state,
+		XLogRecPtr sublsn)
 {
 	Relation	rel;
 	HeapTuple	tup;
-	Oid			subrelid = InvalidOid;
+	Oid			subrelid;
 	bool		nulls[Natts_p

Re: [HACKERS] memory fields from getrusage()

2017-06-14 Thread Justin Pryzby

On Tue, Jun 13, 2017 at 12:16:00PM -0400, Robert Haas wrote:
> It might be worth adding platform-specific code for common platforms.

All I care (which linux happily/happens to support) is maxrss; I was probably
originally interested in this while digging into an issue with hash agg.

I think it's fine to show zeros for unsupported fields; that's what getusage(2)
and time(1) do after all.

pryzbyj@pryzbyj:~$ sh -c 'command time -v find /dev' 2>&1 >/dev/null |grep -Fw 0
User time (seconds): 0.00
System time (seconds): 0.00
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0

> it would be a good idea to install code specific to Linux that
> displays all and only those values that are meaningful on Linux, and
> (less importantly) similarly for macOS.  Linux is such a common
> platform that reporting bogus zero values and omitting other fields
> that are actually meaningful does not seem like a very good plan.

That has the issue that it varies not just by OS but also by OS version.  For
example PG already shows context switches and FS in/out puts, but they're
nonzero only since linux 2.6 (yes, 2.4 is ancient and unsupported but still).

   ru_nvcsw (since Linux 2.6)
   ru_inblock (since Linux 2.6.22)

..and other fields are "currently unused", but maybe supported in the past or
future(?)
   ru_ixrss (unmaintained)
  This field is currently unused on Linux.

Are you thinking of something like this, maybe hidden away in a separate file
somewhere?

#if defined(__linux__) || defined(BSD)
appendStringInfo(&str, "!\t%ld max resident, %ld shared, %ld unshared 
data, %ld unshared stack (kB)\n", r.ru_maxrss, r.ru_ixrss, r.ru_idrss, 
r.ru_isrss);
#elif defined(__darwin__)
appendStringInfo(&str, "!\t%ld max resident, %ld shared, %ld unshared 
data, %ld unshared stack (kB)\n", r.ru_maxrss/1024, r.ru_ixrss/1024, 
r.ru_idrss/1024, r.ru_isrss/1024);
#endif /* __linux__ */

Justin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm failures on woodlouse (in ecpg-check)

2017-06-14 Thread Christian Ullrich


* I wrote:


If there is interest in fixing this issue that is apparently limited
to VS 2013, I will try and produce a proper patch.


Patch.

Perhaps surprisingly, the bug is in the failing test cases themselves, 
not ecpg. The CRT has two modes for its locale implementation: When 
called in a "global" mode thread, setlocale() affects all global mode 
threads; when called from a per-thread mode thread, it affects that 
thread only. [1]


In the threaded test cases, many global mode threads call setlocale() 
simultaneously, getting in each other's way. The fix is to switch the 
worker threads to per-thread mode first.


Without this patch, I see crashes in approximately 25 percent of runs 
(four crashes in 16 cycles of vcregress ecpgcheck, ~10 in 50 runs of 
thread.exe alone). With the patch, I have no crashes in 100 ecpgcheck runs.


On the other hand, this affects only the buildfarm VM I mentioned 
earlier. I have another VM where it does not ever crash even without the 
patch -- such are the joys of multithreading. Both of them should have 
plenty of cores, both physical and virtual.



There are some alternatives to fixing it this way, but I think this is 
the best approach:


- Selecting per-thread mode in ecpglib takes the choice away from the
  developer who might want shared locale.

- Adding locking around setlocale() is difficult because ecpglib already
  uses a wrapper around the CRT function, provided by libpgport.

  These test cases are the only consumers of the wrapper that have any
  concept of multithreading. Supporting it in libpgport for the sole
  benefit of threaded ECPG applications on Windows does not seem to
  be a good idea, and re-wrapping the wrapper in ecpglib is not only
  beyond my abilities to write, but is also going to be unmaintainable.

- Adding locking around every setlocale() call in ecpglib is just ugly.


While working on this, I also noticed that there seem to be two separate 
partial implementations of a pthread synchronization emulation for 
Win32. One is in ecpglib, uses mutexes and provides 
PTHREAD_MUTEX_INITIALIZER and pthread_once(), the other has the header 
in src/port and the implementation in libpq, uses critical sections and 
does not cover either feature.


Should the two be merged at some point?


[1] https://msdn.microsoft.com/en-us/library/ms235302(v=vs.120).aspx

--
Christian
From 5dee698f4cef684a320ced59b19cd4fea61319fb Mon Sep 17 00:00:00 2001
From: Christian Ullrich 
Date: Wed, 14 Jun 2017 22:18:18 +0200
Subject: [PATCH] Make setlocale() aware of multithreading to avoid crash.

---
 src/interfaces/ecpg/test/expected/thread-alloc.c   | 39 +++--
 .../ecpg/test/expected/thread-descriptor.c | 19 +++---
 src/interfaces/ecpg/test/expected/thread-prep.c| 67 --
 src/interfaces/ecpg/test/expected/thread-thread.c  | 60 ++-
 .../ecpg/test/expected/thread-thread_implicit.c| 60 ++-
 src/interfaces/ecpg/test/thread/alloc.pgc  |  5 ++
 src/interfaces/ecpg/test/thread/descriptor.pgc |  5 ++
 src/interfaces/ecpg/test/thread/prep.pgc   |  5 ++
 src/interfaces/ecpg/test/thread/thread.pgc |  6 ++
 .../ecpg/test/thread/thread_implicit.pgc   |  6 ++
 10 files changed, 163 insertions(+), 109 deletions(-)

diff --git a/src/interfaces/ecpg/test/expected/thread-alloc.c 
b/src/interfaces/ecpg/test/expected/thread-alloc.c
index 49f1cd1..1312580 100644
--- a/src/interfaces/ecpg/test/expected/thread-alloc.c
+++ b/src/interfaces/ecpg/test/expected/thread-alloc.c
@@ -22,6 +22,7 @@ main(void)
 #define WIN32_LEAN_AND_MEAN
 #include 
 #include 
+#include 
 #else
 #include 
 #endif
@@ -99,7 +100,7 @@ struct sqlca_t *ECPGget_sqlca(void);
 
 #endif
 
-#line 24 "alloc.pgc"
+#line 25 "alloc.pgc"
 
 
 #line 1 "regression.h"
@@ -109,14 +110,14 @@ struct sqlca_t *ECPGget_sqlca(void);
 
 
 
-#line 25 "alloc.pgc"
+#line 26 "alloc.pgc"
 
 
 /* exec sql whenever sqlerror  sqlprint ; */
-#line 27 "alloc.pgc"
+#line 28 "alloc.pgc"
 
 /* exec sql whenever not found  sqlprint ; */
-#line 28 "alloc.pgc"
+#line 29 "alloc.pgc"
 
 
 #ifdef WIN32
@@ -127,59 +128,63 @@ static void* fn(void* arg)
 {
int i;
 
+#ifdef WIN32
+   _configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
+#endif
+
/* exec sql begin declare section */
  
 
   

-#line 39 "alloc.pgc"
+#line 44 "alloc.pgc"
  int value ;
  
-#line 40 "alloc.pgc"
+#line 45 "alloc.pgc"
  char name [ 100 ] ;
  
-#line 41 "alloc.pgc"
+#line 46 "alloc.pgc"
  char ** r = NULL ;
 /* exec sql end declare section */
-#line 42 "alloc.pgc"
+#line 47 "alloc.pgc"
 
 
value = (long)arg;
sprintf(name, "Connection: %d", value);
 
{ ECPGconnect(__LINE__, 0, "ecpg1_regression" , NULL, NULL , name, 0); 
-#line 47 "alloc.pgc"
+#line 52 "alloc.pgc"
 
 if (sqlca.sqlcode < 0) sqlprint();}
-#line 47 "alloc.pgc"
+#line 52 "alloc.pgc"
 
{ ECPGsetcommit(__LINE__, "on", NULL);
-#line 48 "al

Re: [HACKERS] subscription worker signalling wal writer too much

On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes  wrote:
> 
> > If I publish a pgbench workload and subscribe to it, the subscription
> > worker is signalling the wal writer thousands of times a second, once for
> > every async commit.  This has a noticeable performance cost.
> >
> 
> I've used a local variable to avoid waking up the wal writer more than once
> for the same page boundary.  This reduces the number of wake-ups by about
> 7/8.

Maybe I'm missing something here, but isn't that going to reduce our
guarantees about when asynchronously committed xacts are flushed out?
You can easily fit a number of commits into the same page...   As this
isn't specific to logical-rep, I don't think that's ok.

Have you chased down why there's that many wakeups?  Normally I'd have
expected that a number of the SetLatch() calls get consolidated
together, but I guess walwriter is "too quick" in waking up and
resetting the latch?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes

On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes  wrote:

> If I publish a pgbench workload and subscribe to it, the subscription
> worker is signalling the wal writer thousands of times a second, once for
> every async commit.  This has a noticeable performance cost.
>

I've used a local variable to avoid waking up the wal writer more than once
for the same page boundary.  This reduces the number of wake-ups by about
7/8.

I'm testing it by doing 1e6 transactions over 8 clients while replication
is in effect, then waiting for the logical replica to catch up.  This cycle
takes 183.1 seconds in HEAD, and 162.4 seconds with the attached patch.
N=14, p-value for difference of the means 6e-17.

If I suppress all wake-ups just to see what would happen, it further
reduces the runtime to 153.7.

Cheers,

Jeff

wake_wal_writer_less_aggressively.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

I wrote:
> But surely the silent treatment should only apply to DSM_OP_CREATE?

Oh ... scratch that, it *does* only apply to DSM_OP_CREATE.

The lack of any other message before the 'could not map' failure must,
then, mean that dsm_attach() couldn't find an entry in shared memory
that it wanted to attach to.  But how could that happen?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

Robert Haas  writes:
> On Wed, Jun 14, 2017 at 3:33 PM, Tom Lane  wrote:
>> So the first problem here is the lack of supporting information for the
>> 'could not map' failure.

> Hmm.  I think I believed at the time I wrote dsm_attach() that
> somebody might want to try to soldier on after failing to map a DSM,
> but that doesn't seem very likely  any more.

Well, if they do, they shouldn't be passing elevel == ERROR.

>> AFAICS, this must mean either that dsm_attach()
>> returned without ever calling dsm_impl_op() at all, or that
>> dsm_impl_op()'s Windows codepath encountered ERROR_ALREADY_EXISTS or
>> ERROR_ACCESS_DENIED.  It's far from clear why those cases should be
>> treated as a silent fail.

> There's a good reason for that, though.  See
> 419113dfdc4c729f6c763cc30a9b02ee68a7da94.

But surely the silent treatment should only apply to DSM_OP_CREATE?
We're not going to retry anything else.

>> It's even less clear why dsm_attach's early
>> exit cases don't produce any messages.  But since we're left not knowing
>> what happened, the messaging design here is clearly inadequate.

> I don't know what you mean by this.  The function only has one
> early-exit case, the comment for which I quoted above.

OK, s/cases/case/, but the problem remains.  We don't know what happened.
We cannot have more than one case in this code where nothing gets logged.

>> It's not very clear how that happened, but my
>> bet is that the postmaster incremented parallel_terminate_count more than
>> once while cleaning up after the crashed worker.  It looks to me like
>> there's nothing stopping BackgroundWorkerStateChange from incrementing it
>> and then the eventual ForgetBackgroundWorker call from incrementing it
>> again.  I haven't traced through things to identify why this might only
>> occur in a worker-failure scenario, but surely we want to make sure that
>> the counter increment happens once and only once per worker.

> Yeah -- if that can happen, it's definitely a bug.

My first thought about fixing it is that we should remove that code from
BackgroundWorkerStateChange altogether.  The parallel_terminate_count
increment should happen in, and only in, ForgetBackgroundWorker.  There
seems little reason to risk bugs by trying to do it a bit earlier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

2017-06-14 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 6/13/17 18:11, Stephen Frost wrote:
> >> Let's see a proposal in those terms then.  How easy can you make it,
> >> compared to existing OS-level solutions, and will that justify the
> >> maintenance overhead?
> > From the original post on this thread, which included a WIP patch:
> > 
> > --
> > Usage
> > =
> > 
> > Set up database like so:
> > 
> > (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
> >  export PGENCRYPTIONKEY
> >  initdb -k -K pgcrypto $PGDATA )
> > 
> > Start PostgreSQL:
> > 
> > (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
> >  export PGENCRYPTIONKEY
> >  postgres $PGDATA )
> > --
> 
> Relying on environment variables is clearly pretty crappy.  So if that's
> the proposal, then I think it needs to be better.

I don't believe that was ever intended to be the final solution, I was
just pointing out that it's what the WIP patch did.

The discussion had moved into having a command called which provided the
key on stdout, as I recall, allowing it to be whatever the user wished,
including binary of any kind.

If you have other suggestions, I'm sure they would be well received.  As
to the question of complexity, it certainly looks like it'll probably be
quite straight-forward for users to use.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] WIP: Data at rest encryption

2017-06-14 Thread Peter Eisentraut

On 6/13/17 18:11, Stephen Frost wrote:
>> Let's see a proposal in those terms then.  How easy can you make it,
>> compared to existing OS-level solutions, and will that justify the
>> maintenance overhead?
> From the original post on this thread, which included a WIP patch:
> 
> --
> Usage
> =
> 
> Set up database like so:
> 
> (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
>  export PGENCRYPTIONKEY
>  initdb -k -K pgcrypto $PGDATA )
> 
> Start PostgreSQL:
> 
> (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
>  export PGENCRYPTIONKEY
>  postgres $PGDATA )
> --

Relying on environment variables is clearly pretty crappy.  So if that's
the proposal, then I think it needs to be better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

2017-06-14 Thread Peter Eisentraut

On 6/14/17 05:04, Aleksander Alekseev wrote:
> A few companies that hired system administrators that are too
> lazy to read two or three man pages is not a reason to re-implement file
> system encryption (or compression, or mirroring if that matters) in any
> open source RDBMS.

To be fair, we did implement our own compression (TOAST) and mirroring
(compared to, say, DRBD) because there were clear advantages in
simplicity and performance.  Checksumming is another area where we moved
forward in spite of arguments that the file system should do it.  So we
will need to see the whole picture.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Document bug regarding read only transactions

On Tue, Jun 13, 2017 at 10:08 PM, Tatsuo Ishii  wrote:
> In 
> https://www.postgresql.org/docs/10/static/hot-standby.html#hot-standby-users
>
> It is explained that read only transactions (not in standby) allow to
> update sequences.
>
> In normal operation, read-only transactions are allowed to
> update sequences and to use LISTEN, UNLISTEN, and
> NOTIFY, so Hot Standby sessions operate under slightly tighter
> restrictions than ordinary read-only sessions.  It is possible that some
> of these restrictions might be loosened in a future release.
>
> This is plain wrong.

It used to be true.  Tom changed it in commit
05d8a561ff85db1545f5768fe8d8dc9d99ad2ef7, back in 2010.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

On Tue, Jun 13, 2017 at 4:56 PM, Thomas Munro
 wrote:
> On Wed, Jun 14, 2017 at 4:22 AM, Robert Haas  wrote:
>> I'm just trying to understand your comments so that I can have an
>> intelligent opinion about what we should do from here.  Given that the
>> replan wouldn't happen anyway, there seems to be no reason to tinker
>> with the location of enrtuples for v10 -- which is exactly what both
>> of us already said, but I was confused about your comments about
>> enrtuples getting changed.  I think that I am no longer confused.
>>
>> We still need to review and commit the minimal cleanup patch Thomas
>> posted upthread (v1, not v2).  I think I might go do that unless
>> somebody else is feeling the urge to whack it around before commit.
>
> OK, if we're keeping enrtuples in RangeTblEntry for v10 then we'd
> better address Noah's original complaint that it's missing from
> _{copy,equal,out,read}RangeTblEntry() .  Here's a patch for that, to
> apply on top of the -v1 patch above.

Committed both of those together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench tap tests & minor fixes

2017-06-14 Thread Nikolay Shaplov

В письме от 8 июня 2017 19:56:02 пользователь Fabien COELHO написал:

> > So this should be fixed in both expr_scanner_get_substring cases, and keep
> > last symbol only if it is not "\n".
> 
> Indeed, this is a mess. The code assumes all stuff is a line ending with
> '\n', but this is not always the case.
> 
> Here is a v7 which chomps the final newline only if there is one.

Sorry, but I still have problems with new solution...

First is function name. "expr_scanner_get_line" does not deal with any line at 
all... it gets substring and "chomps" it.

As for me, I would do expr_scanner_chomp_substring(PsqlScanState, int, int&);
that changes end_offset as desired... And use it instead of end_offset = 
expr_scanner_offset(sstate) - 1;



The second issue: you are removing all trailing \n and \r. I think you should 
remove only one \n at the end of the string, and one \r before \n if there was 
one.
I do not think there will be any practical difference between my and yours 
solutions here, but mine is more neat, I think... I do not have enough 
imagination to think about if "\n\r\r\r\r\r\n" case can happen, and what will 
happen of we remove them all... So I suggest to remove only the last one.


-- 
Do code for fun. Can do it for money (Perl & C/C++ ~10h/week)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

On Wed, Jun 14, 2017 at 3:33 PM, Tom Lane  wrote:
> So the first problem here is the lack of supporting information for the
> 'could not map' failure.

Hmm.  I think I believed at the time I wrote dsm_attach() that
somebody might want to try to soldier on after failing to map a DSM,
but that doesn't seem very likely  any more.  That theory is supported
by this comment:

/*
 * If we didn't find the handle we're looking for in the control segment,
 * it probably means that everyone else who had it mapped, including the
 * original creator, died before we got to this point. It's up to the
 * caller to decide what to do about that.
 */

But in practice, every current caller handles a failure here by bailing out.

> AFAICS, this must mean either that dsm_attach()
> returned without ever calling dsm_impl_op() at all, or that
> dsm_impl_op()'s Windows codepath encountered ERROR_ALREADY_EXISTS or
> ERROR_ACCESS_DENIED.  It's far from clear why those cases should be
> treated as a silent fail.

There's a good reason for that, though.  See
419113dfdc4c729f6c763cc30a9b02ee68a7da94.

> It's even less clear why dsm_attach's early
> exit cases don't produce any messages.  But since we're left not knowing
> what happened, the messaging design here is clearly inadequate.

I don't know what you mean by this.  The function only has one
early-exit case, the comment for which I quoted above.

> The subsequent assertion crash came from here:
>
> /*
>  * If this is a parallel worker, check whether there are already too many
>  * parallel workers; if so, don't register another one.  Our view of
>  * parallel_terminate_count may be slightly stale, but that doesn't really
>  * matter: we would have gotten the same result if we'd arrived here
>  * slightly earlier anyway.  There's no help for it, either, since the
>  * postmaster must not take locks; a memory barrier wouldn't guarantee
>  * anything useful.
>  */
> if (parallel && (BackgroundWorkerData->parallel_register_count -
>  BackgroundWorkerData->parallel_terminate_count) >=
> max_parallel_workers)
> {
> Assert(BackgroundWorkerData->parallel_register_count -
>BackgroundWorkerData->parallel_terminate_count <=
>MAX_PARALLEL_WORKER_LIMIT);
> LWLockRelease(BackgroundWorkerLock);
> return false;
> }
>
> What I suspect happened is that parallel_register_count was less than
> parallel_terminate_count; since those counters are declared as uint32,
> the negative difference would have been treated as a large unsigned value,
> triggering the assertion.

Right, and that's exactly the point of having that assertion.

> It's not very clear how that happened, but my
> bet is that the postmaster incremented parallel_terminate_count more than
> once while cleaning up after the crashed worker.  It looks to me like
> there's nothing stopping BackgroundWorkerStateChange from incrementing it
> and then the eventual ForgetBackgroundWorker call from incrementing it
> again.  I haven't traced through things to identify why this might only
> occur in a worker-failure scenario, but surely we want to make sure that
> the counter increment happens once and only once per worker.

Yeah -- if that can happen, it's definitely a bug.

> I'm also noting that ForgetBackgroundWorker is lacking a memory barrier
> between incrementing parallel_terminate_count and marking the slot free.
> Maybe it doesn't need one, but since there is one in
> BackgroundWorkerStateChange, it looks weird to not have it here.

I noticed that the other day and thought about mentioning it, but
didn't quite muster up the energy.  It seems unlikely to me to be the
cause of any of the problems we're seeing, but I think it can't hurt
to fix the omission.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe
 wrote:
> Here are the details of the patches in attached zip.
> 0001. refactoring existing ATExecAttachPartition  code so that it can be
> used for
> default partitioning as well
> 0002. support for default partition with the restriction of preventing
> addition
> of any new partition after default partition.
> 0003. extend default partitioning support to allow addition of new
> partitions.
> 0004. extend default partitioning validation code to reuse the refactored
> code
> in patch 0001.

I think the core ideas of this patch are pretty solid now.  It's come
a long way in the last month.  High-level comments:

- Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.
- Still no documentation.
- Should probably be merged with the patch to add default partitioning
for ranges.

Other stuff I noticed:

- The regression tests don't seem to check that the scan-skipping
logic works as expected.  We have regression tests for that case for
attaching regular partitions, and it seems like it would be worth
testing the default-partition case as well.

- check_default_allows_bound() assumes that if
canSkipPartConstraintValidation() fails for the default partition, it
will also fail for every subpartition of the default partition.  That
is, once we commit to scanning one child partition, we're committed to
scanning them all.  In practice, that's probably not a huge
limitation, but if it's not too much code, we could keep the top-level
check but also check each partitioning individually as we reach it,
and skip the scan for any individual partitions for which the
constraint can be proven.  For example, suppose the top-level table is
list-partitioned with a partition for each of the most common values,
and then we range-partition the default partition.

- The changes to the regression test results in 0004 make the error
messages slightly worse.  The old message names the default partition,
whereas the new one does not.  Maybe that's worth avoiding.

Specific comments:

+ * Also, invalidate the parent's and a sibling default partition's relcache,
+ * so that the next rebuild will load the new partition's info into parent's
+ * partition descriptor and default partition constraints(which are dependent
+ * on other partition bounds) are built anew.

I find this a bit unclear, and it also repeats the comment further
down.  Maybe something like: Also, invalidate the parent's relcache
entry, so that the next rebuild will load he new partition's info into
its partition descriptor.  If there is a default partition, we must
invalidate its relcache entry as well.

+/*
+ * The default partition constraints depend upon the partition bounds of
+ * other partitions. Adding a new(or even removing existing) partition
+ * would invalidate the default partition constraints. Invalidate the
+ * default partition's relcache so that the constraints are built anew and
+ * any plans dependent on those constraints are invalidated as well.
+ */

Here, I'd write: The partition constraint for the default partition
depends on the partition bounds of every other partition, so we must
invalidate the relcache entry for that partition every time a
partition is added or removed.

+/*
+ * Default partition cannot be added if there already
+ * exists one.
+ */
+if (spec->is_default)
+{
+overlap = partition_bound_has_default(boundinfo);
+with = boundinfo->default_index;
+break;
+}

To support default partitioning for range, this is going to have to be
moved above the switch rather than done inside of it.  And there's
really no downside to putting it there.

+ * constraint, by *proving* that the existing constraints of the table
+ * *imply* the given constraints.  We include the table's check constraints and

Both the comma and the asterisks are unnecessary.

+ * Check whether all rows in the given table (scanRel) obey given partition

obey the given

I think the larger comment block could be tightened up a bit, like
this:  Check whether all rows in the given table obey the given
partition constraint; if so, it can be attached as a partition.  We do
this by scanning the table (or all of its leaf partitions) row by row,
except when the existing constraints are sufficient to prove that the
new partitioning constraint must already hold.

+/* Check if we can do away with having to scan the table being attached. */

If possible, skip the validation scan.

+ * Set up to have the table be scanned to validate the partition
+ * constraint If it's a partitioned table, we instead schedule its leaf
+ * partitions to be scanned.

I suggest: Prepare to scan the default partition (or, if it is itself
partitioned, all of its leaf partitions).

+int defaul

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

Yesterday lorikeet failed the select_parallel test in a new way:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2017-06-13%2020%3A28%3A33

2017-06-13 16:44:57.247 EDT [59404ec9.2e78:1] ERROR:  could not map dynamic 
shared memory segment
2017-06-13 16:44:57.248 EDT [59404dec.2d9c:5] LOG:  worker process: parallel 
worker for PID 10072 (PID 11896) exited with exit code 1
2017-06-13 16:44:57.273 EDT [59404ec6.2758:64] LOG:  statement: select 
stringu1::int2 from tenk1 where unique1 = 1;
TRAP: FailedAssertion("!(BackgroundWorkerData->parallel_register_count - 
BackgroundWorkerData->parallel_terminate_count <= 1024)", File: 
"/home/andrew/bf64/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c",
 Line: 974)
2017-06-13 16:45:02.652 EDT [59404dec.2d9c:6] LOG:  server process (PID 10072) 
was terminated by signal 6: Aborted
2017-06-13 16:45:02.652 EDT [59404dec.2d9c:7] DETAIL:  Failed process was 
running: select stringu1::int2 from tenk1 where unique1 = 1;
2017-06-13 16:45:02.652 EDT [59404dec.2d9c:8] LOG:  terminating any other 
active server processes

So the first problem here is the lack of supporting information for the
'could not map' failure.  AFAICS, this must mean either that dsm_attach()
returned without ever calling dsm_impl_op() at all, or that
dsm_impl_op()'s Windows codepath encountered ERROR_ALREADY_EXISTS or
ERROR_ACCESS_DENIED.  It's far from clear why those cases should be
treated as a silent fail.  It's even less clear why dsm_attach's early
exit cases don't produce any messages.  But since we're left not knowing
what happened, the messaging design here is clearly inadequate.

The subsequent assertion crash came from here:

/*
 * If this is a parallel worker, check whether there are already too many
 * parallel workers; if so, don't register another one.  Our view of
 * parallel_terminate_count may be slightly stale, but that doesn't really
 * matter: we would have gotten the same result if we'd arrived here
 * slightly earlier anyway.  There's no help for it, either, since the
 * postmaster must not take locks; a memory barrier wouldn't guarantee
 * anything useful.
 */
if (parallel && (BackgroundWorkerData->parallel_register_count -
 BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
{
Assert(BackgroundWorkerData->parallel_register_count -
   BackgroundWorkerData->parallel_terminate_count <=
   MAX_PARALLEL_WORKER_LIMIT);
LWLockRelease(BackgroundWorkerLock);
return false;
}

What I suspect happened is that parallel_register_count was less than
parallel_terminate_count; since those counters are declared as uint32,
the negative difference would have been treated as a large unsigned value,
triggering the assertion.  It's not very clear how that happened, but my
bet is that the postmaster incremented parallel_terminate_count more than
once while cleaning up after the crashed worker.  It looks to me like
there's nothing stopping BackgroundWorkerStateChange from incrementing it
and then the eventual ForgetBackgroundWorker call from incrementing it
again.  I haven't traced through things to identify why this might only
occur in a worker-failure scenario, but surely we want to make sure that
the counter increment happens once and only once per worker.

I'm also noting that ForgetBackgroundWorker is lacking a memory barrier
between incrementing parallel_terminate_count and marking the slot free.
Maybe it doesn't need one, but since there is one in
BackgroundWorkerStateChange, it looks weird to not have it here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication busy-waiting on a lock

Hi,

On 2017-06-13 00:50:20 -0700, Andres Freund wrote:
> Just to be clear: The patch, after the first point above (which I did),
> looks ok.  I'm just looking for comments.

And pushed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes

If I publish a pgbench workload and subscribe to it, the subscription
worker is signalling the wal writer thousands of times a second, once for
every async commit.  This has a noticeable performance cost.

I don't think it is ever necessary to signal the wal writer here, unless
wal writer is taking the long sleep for power saving purposes.  If up
to wal_writer_delay of "committed" transactions get lost on a crash, it
doesn't really matter because they will be replayed again once replication
resumes. However, might delaying the advance of the hot_standby_feedback by
up to the amount of wal_writer_delay be a problem?  I would think any
set-up which depends on the standby never being a few dozen milliseconds
behind is already doomed.

If I want to suppress signalling, what would be the right way to
communicate to XLogSetAsyncXactLSN that it is being called in a
subscription worker?

Another approach would be to make XLogSetAsyncXactLSN signalling less
aggressive for everyone, not just subscription workers.  There is no point
in signalling it more than once for a given WriteRqstPtr page boundary.  So
each backend could locally remember the last boundary for which it
signalled wal writer, and not signal again for the same boundary.  This
would be especially effective for a subscription worker, as it should be
pretty common for almost all the async commits to be coming from the same
process.  Or, the last boundary could be kept in shared memory (XLogCtl) so
all backends can share it, at the expense of additional work needed to be
done under a spin lock.

Other ideas?

Thanks,

Jeff

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

Piotr Stefaniak  writes:
> I would like to go a bit further than that. I see that GNU indent
> doesn't recognize -V, but prints its version if you use the option
> --version. I wish to implement the same option for FreeBSD indent, but
> that would force a change in how pgindent recognizes indent.

Not really a problem, considering we're making far larger changes
than that to the pgindent script anyway.

> As for the version bump, I briefly considered "9.7.0", but 2.0 seems
> more appropriate as the 2 would reflect the fundamental changes that
> I've done and the .0 would indicate that it's still rough around edges.

Yeah, +1 for 2.0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

2017-06-14 Thread Piotr Stefaniak

On 2017-06-14 17:05, Bruce Momjian wrote:
> On Wed, Jun 14, 2017 at 10:38:40AM -0400, Tom Lane wrote:
>> btw, I was slightly amused to notice that this version still calls itself
>>
>> $ indent -V
>> pg_bsd_indent 1.3
>>
>> Don't know what you plan to do with that program name, but we certainly
>> need a version number bump so that pgindent can tell that it's got an
>> up-to-date copy.  1.4?  2.0?
> 
> For Piotr's reference, we will update src/tools/pgindent/pgindent to
> match whatever new version number you use.

I would like to go a bit further than that. I see that GNU indent
doesn't recognize -V, but prints its version if you use the option
--version. I wish to implement the same option for FreeBSD indent, but
that would force a change in how pgindent recognizes indent.

As for the version bump, I briefly considered "9.7.0", but 2.0 seems
more appropriate as the 2 would reflect the fundamental changes that
I've done and the .0 would indicate that it's still rough around edges.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

2017-06-14 Thread Piotr Stefaniak

On 2017-06-14 19:31, Tom Lane wrote:
> Does that test case pass for you?

No, I broke it recently. Sorry.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallowing multiple queries per PQexec()

On 2017-06-12 10:32:57 -0400, Tom Lane wrote:
> "Daniel Verite"  writes:
> > PGC_POSTMASTER implies that it's an instance-wide setting.
> > Is is intentional? I can understand that it's more secure for this not to
> > be changeable in an existing session, but it's also much less usable if you
> > can't set it per-database and per-user.
> > Maybe it should be PGC_SUSET ?
> 
> Bearing in mind that I'm not really for this at all...

FWIW, I agree that this isn't something we should do.  For one the GUC
would really have to be GUC_REPORT, which'll cost everyone, and will
break things like pgbouncer.   I also don't think it's a good solution to
the problem at hand - there *are* cases where application
*intentionally* use PQexec() with multiple statements, namely when
aggregate latency is an issue. Since it's an application writer's choice
whether to use it, it seems to make not that much sense to have a
serverside guc - it can't really be sensible set.   If you want to do
something here, you should probably work on convincing ORM etc. writers
to use PQexecParams().

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] v10beta pg_catalog diagrams

On 2017-06-14 06:05:24 +0200, Pavel Stehule wrote:
> 2017-06-14 5:53 GMT+02:00 Peter Eisentraut  >:
> 
> > On 6/13/17 17:08, Andres Freund wrote:
> > > I wondered before if we shouldn't introduce "information only"
> > > unenforced foreign key constraints for the catalogs.  We kind of
> > > manually do that via oidjoins, it'd be nicer if we'd a function
> > > rechecking fkeys, and the fkeys were in the catalog...
> >
> > I don't see why we couldn't just add a full complement of primary and
> > foreign key constraints (and unique constraints and perhaps some check
> > constraints).  The argument is that they wouldn't normally do anything,
> > but they would help with documentation and browsing tools, and they
> > wouldn't hurt anything.

Well, unique constraints are a bit more complicated because they rely on
an index, and we wouldn't e.g. maintain indexes with WHERE clauses or
other expressions correctly.  I'd be a bit wary of declaring such
indexes as actually being fully valid, because we have planner logic
that does planning based on various constraints now, it'd certainly be
annoying if some "re-check constraint" type queries would actually have
their joins optimized away or such...

> These constraints can slowdown creating/dropping database objects - mainly
> temp tables.

How so?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RemoveSubscriptionRel uses simple_heap_delete

2017-06-14 Thread Masahiko Sawada

On Wed, Jun 14, 2017 at 10:44 PM, Tom Lane  wrote:
> Masahiko Sawada  writes:
>> Currently $subject but should we use CatalogTupleDelete() instead?
>> It's actually the same behavior though. Other functions use
>> CatalogTupleXXX(). Attached patch.
>
> Yeah, evidently that patch failed to track the effects of commits
> 2f5c9d9c9 et al, since it wasn't in-tree yet.  Poking around, at
> least statscmds.c has got the same disease.
>

Thank you for fixing it!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

Piotr Stefaniak  writes:
> On 2017-06-13 18:22, Tom Lane wrote:
>> Also, I am wondering about the test cases under tests/.  I do not
>> see anything in the Makefile or elsewhere suggesting how those are
>> to be used.  It would sure be nice to have some quick smoke-test
>> to check that a build on a new platform is working.

> They'd started out like David Holland's tests for his tradcpp(1), with a
> similar makefile (again, BSD make). But I was tenaciously asked to use
> Kyua (a testing framework that is the standard regression test mechanism
> for FreeBSD) instead, so now the makefile's existence and use is a great
> secret and the file is not under any source control. Adaption of the
> indent test suite to Kyua made the makefile more inelegant, but I'm
> attaching it to this email in hope that you can do something useful with
> it. I can only guess that you have the option to use Kyua instead, but I
> don't know the tool at all.

Ah, thanks.  I hacked up a gmake rule for this:

test: $(INDENT)
cp $(srcdir)/tests/*.list .
for testsrc in $(srcdir)/tests/*.0; do \
test=`basename "$$testsrc" .0`; \
./$(INDENT) $$testsrc $$test.out -P$(srcdir)/tests/$$test.pro || echo 
FAILED >>$$test.out; \
diff -u $$testsrc.stdout $$test.out || exit 1; \
done

and I'm getting one failure, which I don't understand:

--- ./tests/f_decls.0.stdout2017-05-21 19:40:38.507303623 -0400
+++ f_decls.out 2017-06-14 13:28:49.212871476 -0400
@@ -1,4 +1,4 @@
-char *
+char  *
 x(void)
 {
typeidentifier;
@@ -13,7 +13,7 @@
return NULL;
 }
 
-int *
+int   *
 y(void)
 {
 
Does that test case pass for you?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

On Wed, Jun 14, 2017 at 6:15 AM, Ashutosh Bapat
 wrote:
> PFA patch set addressing comments by Tom and Amit.

LGTM.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] outfuncs.c utility statement support

Peter Eisentraut  writes:
> So this seems to be a pretty basic bug.  Some node fields of type char
> may be zero, and so printing them as a zero byte just truncates the
> whole output string.  This could be fixed by printing chars like strings
> with the full escaping mechanism.  See attached patch.

+1 for fixing this, but you have not handled the read side correctly.
pg_strtok doesn't promise to return a null-terminated string, so without
changes, readfuncs.c would not successfully decode a zero-byte char field.
Also it would do the wrong thing for any character code that outToken had
decided to prefix with a backslash.  I think you could fix both problems
like this:

 /* Read a char field (ie, one ascii character) */
 #define READ_CHAR_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
-   local_node->fldname = token[0]
+   local_node->fldname = debackslash(token, length)[0]

although that's annoyingly expensive for the normal case where no
special processing is needed.  Maybe better

+   local_node->fldname = (length == 0) ? '\0' : (token[0] == '\\') ? 
token[1] : token[0]

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

On Wed, Jun 14, 2017 at 06:41:43PM +0300, Ants Aasma wrote:
> On Wed, Jun 14, 2017 at 6:26 PM, Bruce Momjian  wrote:
> > Are you checking the CPU type or if AES instructions are enabled on the
> > CPU? I ask this because I just realized in researching my new TLS talk
> > that my BIOS defaults to AES instructions disabled, and I had to
> > manually enable it.
> 
> There is zero code for that now, but the plan was to check the CPUID
> instruction. My understanding is that it should report what is
> currently enabled on the CPU. Will double check when actually writing
> the code for the check.

Just for specifics, I have two Intel Xeon CPU E5620, but the AES
instructions were disabled for this CPU since 2012 when I bought it. 
:-(  The good news is that only recently have I forced https in some
pages so this is the first time I heavily need it.  I now have a boot
test, which returns 16:

grep -c '\' /proc/cpuinfo

> >> > I anticipate that one of the trickier problems here will be handling
> >> > encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> >> > a time.  In the current system, once you've written and flushed a
> >> > block, you can consider it durably committed, but if that block is
> >> > encrypted, this is no longer true.  A crash might tear the block,
> >> > making it impossible to decrypt.  Replay will therefore stop at the
> >> > end of the previous block, not at the last record actually flushed as
> >> > would happen today.
> >>
> >> My patch is currently doing a block at a time for WAL. The XTS mode
> >
> > Uh, how are you writing partial writes to the WAL.  I assume you are
> > doing a streaming cipher so you can write in increments, right?
> 
> We were doing 8kB page aligned writes to WAL anyway. I just encrypt
> the block before it gets written out.

Oh, we do.  The beauty of streaming ciphers built on block ciphers is
that you can pre-compute the cipher to be XOR'ed with the data because
the block cipher output doesn't depend on the user data.  This is used
for SSH, for example.

> >> I think we need to require wal_log_hints=on when encryption is
> >> enabled. Currently I have not considered tearing on CLOG bits. Other
> >> SLRUs probably have similar issues. I need to think a bit about how to
> >> solve that.
> >
> > I am not sure if clog even needs to be encrypted.
> 
> Me neither, but it currently is, and it looks like that's broken in a
> "silently corrupts your data" way in face of torn writes. Using OFB
> mode (xor plaintext with pseudorandom stream for cipher) looks like it
> might help here, if other approaches fail.

I would just document the limitation and move on.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 weekly progress reports (week 2)

2017-06-14 Thread Shubham Barai

Hi,

I have made some changes in tests and pushed them to my branch.

Thanks for helping me out with testing.

Now, current head produces false positives but, with my patch, it doesn't.

Here is the link for updated tests:
https://github.com/shubhambaraiss/postgres/commit/2c02685a50a2b30654beb5c52542a57a46219c39


Regards,
Shubham



   Sent with Mailtrack


On 13 June 2017 at 23:32, Andrew Borodin  wrote:

> 2017-06-13 18:00 GMT+05:00 Shubham Barai :
> >
> > Project: Explicitly support predicate locks in index AMs besides b-tree
> >
> Hi, Shubham
> Good job!
>
> So, in current HEAD test predicate_gist_2.spec generate false
> positives, but with your patch, it does not?
> I'd suggest keeping spec tests with your code in the same branch, it's
> easier. Also it worth to clean up specs style and add some words to
> documentation.
>
> Kevin, all, how do you think, is it necessary to expand these tests
> not only on Index Only Scan, but also on Bitmap Index Scan? And may be
> KNN version of scan too?
> I couldn't find such tests for B-tree, do we have them?
>
> Best regards, Andrey Borodin, Octonica.
>

Re: [HACKERS] outfuncs.c utility statement support

On Tue, Jun 13, 2017 at 11:59 PM, Amit Langote
 wrote:
>> So this seems to be a pretty basic bug.  Some node fields of type char
>> may be zero, and so printing them as a zero byte just truncates the
>> whole output string.  This could be fixed by printing chars like strings
>> with the full escaping mechanism.  See attached patch.
>
> +1.  I've been meaning to report about
> zero-byte-truncating-the-whole-output-string thing for some time now.

I've been bitten by this, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

2017-06-14 Thread Ants Aasma

On Wed, Jun 14, 2017 at 6:26 PM, Bruce Momjian  wrote:
> Are you checking the CPU type or if AES instructions are enabled on the
> CPU? I ask this because I just realized in researching my new TLS talk
> that my BIOS defaults to AES instructions disabled, and I had to
> manually enable it.

There is zero code for that now, but the plan was to check the CPUID
instruction. My understanding is that it should report what is
currently enabled on the CPU. Will double check when actually writing
the code for the check.

>> > I anticipate that one of the trickier problems here will be handling
>> > encryption of the write-ahead log.  Suppose you encrypt WAL a block at
>> > a time.  In the current system, once you've written and flushed a
>> > block, you can consider it durably committed, but if that block is
>> > encrypted, this is no longer true.  A crash might tear the block,
>> > making it impossible to decrypt.  Replay will therefore stop at the
>> > end of the previous block, not at the last record actually flushed as
>> > would happen today.
>>
>> My patch is currenly doing a block at a time for WAL. The XTS mode
>
> Uh, how are you writing partial writes to the WAL.  I assume you are
> doing a streaming cipher so you can write in increments, right?

We were doing 8kB page aligned writes to WAL anyway. I just encrypt
the block before it gets written out.

>> I think we need to require wal_log_hints=on when encryption is
>> enabled. Currently I have not considered tearing on CLOG bits. Other
>> SLRUs probably have similar issues. I need to think a bit about how to
>> solve that.
>
> I am not sure if clog even needs to be encrypted.

Me neither, but it currently is, and it looks like that's broken in a
"silently corrupts your data" way in face of torn writes. Using OFB
mode (xor plaintext with pseudorandom stream for cipher) looks like it
might help here, if other approaches fail.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

On Wed, Jun 14, 2017 at 06:10:32PM +0300, Ants Aasma wrote:
> On Tue, Jun 13, 2017 at 6:35 PM, Robert Haas  wrote:
> > Performance is likely to be poor on large databases,
> > because every time a page transits between shared_buffers and the
> > buffer cache we've got to en/decrypt, but as long as it's only poor
> > for the people who opt into the feature I don't see a big problem with
> > that.
> 
> It would make sense to tune the database with large shared buffers if
> encryption is enabled. That should make sure that most shared buffers
> traffic is going to disk anyway. As for performance, I have a
> prototype assembly implementation of AES that does 3GB/s/core on my
> laptop. If we add that behind a CPUID check the overhead should be
> quite reasonable.

Are you checking the CPU type or if AES instructions are enabled on the
CPU? I ask this because I just realized in researching my new TLS talk
that my BIOS defaults to AES instructions disabled, and I had to
manually enable it.

> > I anticipate that one of the trickier problems here will be handling
> > encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> > a time.  In the current system, once you've written and flushed a
> > block, you can consider it durably committed, but if that block is
> > encrypted, this is no longer true.  A crash might tear the block,
> > making it impossible to decrypt.  Replay will therefore stop at the
> > end of the previous block, not at the last record actually flushed as
> > would happen today.
> 
> My patch is currenly doing a block at a time for WAL. The XTS mode

Uh, how are you writing partial writes to the WAL.  I assume you are
doing a streaming cipher so you can write in increments, right?

> used to encrypt has the useful property that blocks that share
> identical prefix unencrypted also have identical prefix when
> encrypted. It requires that the tearing is 16B aligned, but I think
> that is true for pretty much all storage systems. That property of
> course has security downsides, but for table/index storage we have a
> nonce in the form of LSN in the page header eliminating the issue.
> 
> > So, your synchronous_commit suddenly isn't.  A
> > similar problem will occur any other page where we choose not to
> > protect against torn pages using full page writes.  For instance,
> > unless checksums are enabled or wal_log_hints=on, we'll write a data
> > page where a single bit has been flipped and assume that the bit will
> > either make it to disk or not; the page can't really be torn in any
> > way that hurts us.  But with encryption that's no longer true, because
> > the hint bit will turn into much more than a single bit flip, and
> > rereading that page with half old and half new contents will be the
> > end of the world (TM).  I don't know off-hand whether we're
> > protecting, say, CLOG page writes with FPWs.: because setting a couple
> > of bits is idempotent and doesn't depend on the existing page
> > contents, we might not need it currently, but with encryption, every
> > bit in the page depends on every other bit in the page, so we
> > certainly would.  I don't know how many places we've got assumptions
> > like this baked into the system, but I'm guessing there are a bunch.
> 
> I think we need to require wal_log_hints=on when encryption is
> enabled. Currently I have not considered tearing on CLOG bits. Other
> SLRUs probably have similar issues. I need to think a bit about how to
> solve that.

I am not sure if clog even needs to be encrypted.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallowing multiple queries per PQexec()

2017-06-14 Thread Fabien COELHO

Hello Surafel,

My 0.02€:

I attach a patch that incorporate the comments and uses similar routines
with the rest of the file rather than using command tag

I'm not fully convinced by this feature: using multiple queries is a
useful trick to reduce network-related latency by combining several
queries in one packet. Devs and even ORMs could use this trick.

Now I do also understand the point of trying to shield against some types
of SQL injection at the database level, because the other levels have
shown weaknesses in the past...

However the protection offered here is quite partial: it really helps if
say a SELECT operation is turned into something else which modifies the
database. Many SQL injection attacks focus on retrieving sensitive data,
in which case "' UNION SELECT ... --" would still work nicely. Should we
allow to forbid UNION? And/or disallow comments as well, which are
unlikely to be used from app code, and would make this harder? If pg is to
provide SQL injection guards by removing some features on some
connections, maybe it could be more complete about it.

About the documentation:

+When this parameter is on, the PostgreSQL server
+allow

allows

+multiple SQL commands without being a transaction block in the
+given string in simple query protocol.

This sentence is not very clear.

I'm unsure about this "transaction block" exception, because (1) the name
of the setting becomes misleading, it should really be:
"allow_multiple_queries_outside_transaction_block", and maybe it should
also say that it is only for the simple protocol (if it is the case), (2)
there may be SQL injections in a transaction block anyway and they are not
prevented nor preventable (3) if I'm combining queries for latency
optimization, it does not mean that I do want a single transaction block
anyway in the packet, so it does not help me all the way there.

+setting

Setting

+ this parameter off is use for providing an

One useless space between "for" & "providing".

Maybe be more direct "... off provides ...". Otherwise "is used for"

+ additional defense against SQL-injection attacks.

I would add something about the feature cost: ", at the price of rejecting
combined queries."

+ The server may be configure to disallow multiple sql

SQL ?

+ commands without being a transaction block to add an extra defense against SQL-injection
+ attacks

some type of SQL injections attacks?

+ so it is always a good practice to add
+ BEGIN/COMMIT

+commands for multiple sql commands

I do not really agree that it is an advisable "good practice" to do
that...

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

On Wed, Jun 14, 2017 at 07:45:17PM +0530, Amit Kapila wrote:
> > Now, it seems we later added a doc section early on that talks about
> > "Verify standby servers" so I have moved the wal_level section into that
> > block, which should be safe.  There is now no need to start/stop the new
> > server since pg_upgrade will do that safely already.
> >
> 
> ! 
> !  Also, if upgrading standby servers, change wal_level
> !  to replica in the postgresql.conf file on
> !  the new cluster.
> 
> I think it is better to indicate that this is required for the master
> cluster (probably it is clear for users) /"on the new cluster."/"on
> the new master cluster.". Do we need something different for v10 where
> default wal_level is 'replica'

You know, I thought about that and was afraid saying "new master
cluster" would be confusing because it isn't a master _yet_, but if you
feel it will help, and I considered it, let's add it.  The problem is
that in the old instructions, at the point we were mentioning this, it
was the new master, which is why I evaluated removing it in the first
place. (Yeah, I am amazed I considered all these cases.)

Updated patch attached.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
new file mode 100644
index bf58a0a..05fa053
*** a/doc/src/sgml/ref/pgupgrade.sgml
--- b/doc/src/sgml/ref/pgupgrade.sgml
*** NET STOP postgresql-9.0
*** 317,331 

! Verify standby servers

!  If you are upgrading Streaming Replication and Log-Shipping standby
!  servers, verify that the old standby servers are caught up by running
!  pg_controldata against the old primary and standby
!  clusters.  Verify that the Latest checkpoint location
!  values match in all clusters.  (There will be a mismatch if old
!  standby servers were shut down before the old primary.)

--- 317,338 

! Prepare for standby server upgrades

!  If you are upgrading standby servers (as outlined in section ), verify that the old standby
!  servers are caught up by running pg_controldata
!  against the old primary and standby clusters.  Verify that the
!  Latest checkpoint location values match in all clusters.
!  (There will be a mismatch if old standby servers were shut down
!  before the old primary.)
! 
! 
! 
!  Also, if upgrading standby servers, change wal_level
!  to replica in the postgresql.conf file on
!  the new master cluster.

*** pg_upgrade.exe
*** 410,416 

!
  Upgrade Streaming Replication and Log-Shipping standby servers

--- 417,423 

!
  Upgrade Streaming Replication and Log-Shipping standby servers

*** pg_upgrade.exe
*** 471,486 

-  
-   Start and stop the new master cluster
- 
-   
-In the new master cluster, change wal_level to
-replica in the postgresql.conf file
-and then start and stop the cluster.
-   
-  
- 

Run rsync

--- 478,483 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

2017-06-14 Thread Ants Aasma

On Tue, Jun 13, 2017 at 6:35 PM, Robert Haas  wrote:
> Of course, what would be even more useful is fine-grained encryption -
> encrypt these tables (and the corresponding indexes, toast tables, and
> WAL records related to any of that) with this key, encrypt these other
> tables (and the same list of associated stuff) with this other key,
> and leave the rest unencrypted.  The problem with that is that you
> probably can't run recovery without all of the keys, and even on a
> clean startup there would be a good deal of engineering work involved
> in refusing access to tables whose key hadn't been provided yet.

That's pretty much the reason why I decided to skip anything more
complicated for now. Anything that would be able to run recovery
without knowing the encryption key looks like an order of magnitude
more complicated to implement.

> Performance is likely to be poor on large databases,
> because every time a page transits between shared_buffers and the
> buffer cache we've got to en/decrypt, but as long as it's only poor
> for the people who opt into the feature I don't see a big problem with
> that.

It would make sense to tune the database with large shared buffers if
encryption is enabled. That should make sure that most shared buffers
traffic is going to disk anyway. As for performance, I have a
prototype assembly implementation of AES that does 3GB/s/core on my
laptop. If we add that behind a CPUID check the overhead should be
quite reasonable.

> I anticipate that one of the trickier problems here will be handling
> encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> a time.  In the current system, once you've written and flushed a
> block, you can consider it durably committed, but if that block is
> encrypted, this is no longer true.  A crash might tear the block,
> making it impossible to decrypt.  Replay will therefore stop at the
> end of the previous block, not at the last record actually flushed as
> would happen today.

My patch is currenly doing a block at a time for WAL. The XTS mode
used to encrypt has the useful property that blocks that share
identical prefix unencrypted also have identical prefix when
encrypted. It requires that the tearing is 16B aligned, but I think
that is true for pretty much all storage systems. That property of
course has security downsides, but for table/index storage we have a
nonce in the form of LSN in the page header eliminating the issue.

> So, your synchronous_commit suddenly isn't.  A
> similar problem will occur any other page where we choose not to
> protect against torn pages using full page writes.  For instance,
> unless checksums are enabled or wal_log_hints=on, we'll write a data
> page where a single bit has been flipped and assume that the bit will
> either make it to disk or not; the page can't really be torn in any
> way that hurts us.  But with encryption that's no longer true, because
> the hint bit will turn into much more than a single bit flip, and
> rereading that page with half old and half new contents will be the
> end of the world (TM).  I don't know off-hand whether we're
> protecting, say, CLOG page writes with FPWs.: because setting a couple
> of bits is idempotent and doesn't depend on the existing page
> contents, we might not need it currently, but with encryption, every
> bit in the page depends on every other bit in the page, so we
> certainly would.  I don't know how many places we've got assumptions
> like this baked into the system, but I'm guessing there are a bunch.

I think we need to require wal_log_hints=on when encryption is
enabled. Currently I have not considered tearing on CLOG bits. Other
SLRUs probably have similar issues. I need to think a bit about how to
solve that.

Regards,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

On Wed, Jun 14, 2017 at 10:38:40AM -0400, Tom Lane wrote:
> btw, I was slightly amused to notice that this version still calls itself
> 
> $ indent -V
> pg_bsd_indent 1.3
> 
> Don't know what you plan to do with that program name, but we certainly
> need a version number bump so that pgindent can tell that it's got an
> up-to-date copy.  1.4?  2.0?

For Piotr's reference, we will update src/tools/pgindent/pgindent to
match whatever new version number you use.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

On Tue, Jun 13, 2017 at 06:29:20PM -0400, Stephen Frost wrote:
> > Isn't the leakage controlled by OS permissions, so is it really leakage,
> > i.e., if you can see the leakage, you probably have bypassed the OS
> > permissions and see the key and data anyway.
> 
> The case I'm mainly considering is if you somehow lose control over the
> medium in which the encrypted database resides- that is, someone steals
> the hard drive, or perhaps the hard drive is sold without properly being
> wiped first, things like that.
> 
> In such a case, there's no OS permissions to bypass because the OS is
> now controlled by the attacker.  In that case, if the key wasn't stored
> on the hard drive, then the attacker would be able to see the contents
> of the filesystem and the associated metadata, but not the contents of
> the cluster.
> 
> In that case, the distinction between filesystem-level encryption and
> PG-level encryption is that with filesystem-level encryption the
> attacker wouldn't be able to even see what files exist or any metadata
> about them, whereas with PG-level encryption that information would be
> available to the attacker.

Yes, Peter Eisentraut pointed that out in an earlier email in this
thread.  Thanks.

> > > > The benefit is allowing configuration
> > > > in the database rather than the OS?
> > > 
> > > No, the benefit is that the database administrator can configure it and
> > > set it up and not have to get an OS-level administrator involved.  There
> > > may also be other reasons why filesystem-level encryption is difficult
> > > to set up or use in a certain environment, but this wouldn't depend on
> > > anything OS-related and therefore could be done.
> > 
> > OK, my only point here is that we are going down a slippery slope of
> > implementing OS things in the database.  There is nothing wrong with
> > that but it has often been something we have avoided, because of the
> > added complexity required in the db server.
> 
> I'm not sure that I actually agree that encryption is really solely an
> OS-level function, or even that encryption at rest is solely the OS's
> job.  As a counter-example, I encrypt my SSH keys and GPG keys
> routinely, even when I'm using OS-level filesystem encryption.  Perhaps
> that's excessive of me, but, well, I don't find it so. :)

Well, I think SSH and GPG keys are a case of selective encryption, which
is where I said database encryption could really be a win, because you
can't do that outside the database.  Just to go crazy, here is something
I think would be cool for a fully or partially encrypted data row:

row data encrypted with symmetric key k
k encrypted with public key of user 1
k encrypted with public key of user 2
hash of previous fields signed by insert user

The would allow the insert user complete control over who sees the row. 
The database administrator could corrupt the row or add/remove users,
but that would be detected by the hash signature being invalid.  This is
kind of like TLS bundled in the database.  I think this is actually all
possible now too.  :-)

> > As a counter-example, we only added an external collation library to
> > Postgres when we clearly saw a benefit, e.g. detecting collation
> > changes.
> 
> Right, and there's also the potential for adding more flexibility down
> the road, which I'm certainly all for, but I see value in having even
> this initial version of the feature too.

Understood.

> > > Also, tools like pg_basebackup could be used, with nearly zero changes,
> > > I think, to get an encrypted copy of the database for backup purposes,
> > > removing the need to work out a way to handle encrypting backups.
> > 
> > I do think we need much more documentation on how to encrypt things,
> > though that is a separate issue.  It might help to document how you
> > _should_ do things now to see the limitations we have.
> 
> Improving our documentation would certainly be good, but I'm not sure
> that we can really recommend specific ways of doing things like
> filesystem-level encryption, as that's really OS-dependent and there
> could be trade-offs in different ways a given OS might provide that
> capability.  I'm not sure that having our documentation generically say
> "you should use filesystem-level encryption" would really be very
> helpful.
> 
> Perhaps I misunderstood your suggestion here though..?

I was just throwing out the idea that sometimes writing things down
shows the gaps in our feature-set --- it might not apply here.

> > > > Is the problem that you have to encrypt before sending and decrypt on
> > > > arrival, if you don't trust the transmission link?  Is that used a lot? 
> > > > Is having the db encrypt every write a reasonable solution to that?
> > > 
> > > There's multiple use-cases here.  Making it easier to copy the database
> > > is just one of them and it isn't the biggest one.  The biggest benefit
> > > is that there's cases where filesystem-level encryption isn't re

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

btw, I was slightly amused to notice that this version still calls itself

$ indent -V
pg_bsd_indent 1.3

Don't know what you plan to do with that program name, but we certainly
need a version number bump so that pgindent can tell that it's got an
up-to-date copy.  1.4?  2.0?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Data at rest encryption

On Wed, Jun 14, 2017 at 04:13:57PM +0300, Aleksander Alekseev wrote:
> > > While I agree that configuring full disk encryption is not technically
> > > difficult, it requires much more privileged access to the system and
> > > basically requires the support of a system administrator. In addition,
> > > if a volume is not available for encryption, PostgreSQL support for
> > > encryption would still allow for its data to be encrypted and as others
> > > have mentioned can be enabled by the DBA alone.
> > 
> > Frankly I'm having difficulties imagining when it could be a real
> > problem. It doesn't seem to be such a burden to ask a colleague for
> > assistance in case you don't have sufficient permissions to do
> > something. And I got a strong feeling that solving bureaucracy issues of
> > specific organizations by changing PostgreSQL core in very invasive way
> > (keeping in mind testing, maintaining, etc) is misguided.
> 
> In the same time implementing a plugable storage API and then implementing
> encrypted / compressed / whatever storage in a standalone extension using
> this API seems to be a reasonable thing to do. 

Agreed, good point.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-14 Thread Amit Kapila

On Wed, Jun 14, 2017 at 1:01 AM, Bruce Momjian  wrote:
> On Mon, Jun 12, 2017 at 06:31:11PM +0300, Vladimir Borodin wrote:
>> What about the following sequence?
>>
>> 1. Run pg_upgrade on master,
>> 2. Start it in single-user mode and stop (to get right wal_level in
>> pg_control),
>> 3. Copy pg_control somewhere,
>> 4. Start master, run analyze and stop.
>> 5. Put the control file from step 3 to replicas and rsync them according to 
>> the
>> documentation.
>>
>> And I think that step 10.f in the documentation [1] should be fixed to 
>> mention
>> starting in single-user mode or with disabled autovacuum.
>>
>> [1] https://www.postgresql.org/docs/devel/static/pgupgrade.html
>
> First, I want to apologize for not getting involved in this thread
> earlier, and I want to thank everyone for the huge amount of detective
> work in finding the cause of this bug.
>
> Let me see if I can replay how the standby server upgrade instructions
> evolved over time.
>
> Initially we knew that we had to set wal_level to replica+ so that when
> you reconnect to the standby servers, the WAL would have the right
> contents.  (We are basically simulating pg_start/stop backup with
> rsync.)
>
> There was a desire to have those instructions inside a documentation
> block dedicated to standby server upgrades, so the wal_level adjustment
> and new server start/stop was added to that block.  I assumed a
> start/stop could not modify the WAL, or at least nothing important would
> happen, but obviously I was wrong.  (pg_upgrade takes steps to ensure
> that nothing happens.)  Adding ANALYZE in there just made it worse, but
> the problem always existed.  I sure hope others haven't had a problem
> with this.
>
> Now, it seems we later added a doc section early on that talks about
> "Verify standby servers" so I have moved the wal_level section into that
> block, which should be safe.  There is now no need to start/stop the new
> server since pg_upgrade will do that safely already.
>

! 
!  Also, if upgrading standby servers, change wal_level
!  to replica in the postgresql.conf file on
!  the new cluster.

I think it is better to indicate that this is required for the master
cluster (probably it is clear for users) /"on the new cluster."/"on
the new master cluster.". Do we need something different for v10 where
default wal_level is 'replica'


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallowing multiple queries per PQexec()

2017-06-14 Thread Surafel Temesgen

On Mon, Jun 12, 2017 at 5:22 PM, Daniel Verite 
wrote:
>
>
> PGC_POSTMASTER implies that it's an instance-wide setting.
> Is is intentional? I can understand that it's more secure for this not to
> be changeable in an existing session, but it's also much less usable if you
> can't set it per-database and per-user.
> Maybe it should be PGC_SUSET ?
>
It’s my misunderstanding I thought PGC_POSTMASTER set only by
superuser and changed with a hard restart

I attach a patch that incorporate the comments and uses similar routines
with the rest of the file rather than using command tag

Regards,

Surafel


disallow-multiple-queries-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RemoveSubscriptionRel uses simple_heap_delete