On 3/6/17 3:28 PM, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:
David Steele <da...@pgmasters.net> writes:
On 3/6/17 12:48 PM, Robert Haas wrote:
This issue also exists in 9.6, but we obviously can't do anything
about 9.6 clusters that already exist.  Possibly this could be
back-patched so that future 9.6 clusters would come out OK, or
possibly we should back-patch some other fix, but that would need more
discussion.

I think it would be worth back-patching the catalog fix for future 9.6
clusters as a start.

Yes, I think it's rather silly not to do so.  We have made comparable
backpatched fixes multiple times in the past.  What is worth discussing is
whether there are *additional* things we ought to do in 9.6 to prevent
misbehavior in installations initdb'd pre-9.6.3.

If there's a cheap way of testing "AmInParallelWorker", I'd be in favor of
adding a quick-n-dirty test and ereport(ERROR) to these functions in the
9.6 branch, so that at least you get a clean error and not some weird
misbehavior.  Not sure if there's anything more we can do than that.

That's more-or-less what I was thinking (and suggested to David over IM
a little while ago, actually).  I don't know if there's an easy way to
do such a check, but I don't think it would really need to be
particularly cheap, just not overly complex.  These code paths are
certainly not ones that need to be high-performance.

Way back when, I tried to get backups on 9.6 to fail due to pg_stop_backup() running in a parallel worker and I was not able to make it happen so I gave up and moved on to other things.

However, we just got a report from the field that a user ran into this exact situation:

ERROR: [057]: raised from remote-0 protocol on 'XX.XX.XXX.XXX': unable to execute query 'select lsn::text as lsn,
pg_catalog.pg_xlogfile_name(lsn)::text as wal_segment_name,
labelfile::text as backuplabel_file,
spcmapfile::text as tablespacemap_file
from pg_catalog.pg_stop_backup(false)'
ERROR: non-exclusive backup is not in progress
HINT: Did you mean to use pg_stop_backup('t')?
CONTEXT: parallel worker

https://github.com/pgbackrest/pgbackrest/issues/1083

So apparently it is possible. To get them working as soon as possible I recommended that they run:

alter role postgres set max_parallel_workers_per_gather = 0;

And that solved their problem. 9.6 is getting on in years so I'm not sure how much time/effort we want to spend on this, but I figured it was worth mentioning.

I did another round of trying to reproduce the issue but came up short a second time.

I'm willing to put together a patch for 9.6 to update the catalog and/or add the error if there is interest.

Thoughts?

Regards,
--
-David
da...@pgmasters.net


Reply via email to