On 3/6/17 3:28 PM, Stephen Frost wrote:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
David Steele <da...@pgmasters.net> writes:
On 3/6/17 12:48 PM, Robert Haas wrote:
This issue also exists in 9.6, but we obviously can't do anything
about 9.6 clusters that already exist. Possibly this could be
back-patched so that future 9.6 clusters would come out OK, or
possibly we should back-patch some other fix, but that would need more
discussion.
I think it would be worth back-patching the catalog fix for future 9.6
clusters as a start.
Yes, I think it's rather silly not to do so. We have made comparable
backpatched fixes multiple times in the past. What is worth discussing is
whether there are *additional* things we ought to do in 9.6 to prevent
misbehavior in installations initdb'd pre-9.6.3.
If there's a cheap way of testing "AmInParallelWorker", I'd be in favor of
adding a quick-n-dirty test and ereport(ERROR) to these functions in the
9.6 branch, so that at least you get a clean error and not some weird
misbehavior. Not sure if there's anything more we can do than that.
That's more-or-less what I was thinking (and suggested to David over IM
a little while ago, actually). I don't know if there's an easy way to
do such a check, but I don't think it would really need to be
particularly cheap, just not overly complex. These code paths are
certainly not ones that need to be high-performance.
Way back when, I tried to get backups on 9.6 to fail due to
pg_stop_backup() running in a parallel worker and I was not able to make
it happen so I gave up and moved on to other things.
However, we just got a report from the field that a user ran into this
exact situation:
ERROR: [057]: raised from remote-0 protocol on 'XX.XX.XXX.XXX': unable
to execute query 'select lsn::text as lsn,
pg_catalog.pg_xlogfile_name(lsn)::text as wal_segment_name,
labelfile::text as backuplabel_file,
spcmapfile::text as tablespacemap_file
from pg_catalog.pg_stop_backup(false)'
ERROR: non-exclusive backup is not in progress
HINT: Did you mean to use pg_stop_backup('t')?
CONTEXT: parallel worker
https://github.com/pgbackrest/pgbackrest/issues/1083
So apparently it is possible. To get them working as soon as possible I
recommended that they run:
alter role postgres set max_parallel_workers_per_gather = 0;
And that solved their problem. 9.6 is getting on in years so I'm not
sure how much time/effort we want to spend on this, but I figured it was
worth mentioning.
I did another round of trying to reproduce the issue but came up short a
second time.
I'm willing to put together a patch for 9.6 to update the catalog and/or
add the error if there is interest.
Thoughts?
Regards,
--
-David
da...@pgmasters.net