Re: [HACKERS] Rework the way multixact truncations work

2015-12-12 Thread Robert Haas
On Sat, Dec 12, 2015 at 12:02 PM, Andres Freund  wrote:
> Noah, Robert, All
>
> On 2015-12-11 11:20:21 -0500, Robert Haas wrote:
>> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas  wrote:
>> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
>> >
>> > So let's do that, then.
>>
>> Who is going to take care of this?
>
> Here's two patches:
>
> 1) The fix to SetOffsetVacuumLimit().
>
>I've tested this by introducing a probabilistic "return false;" to
>find_multixact_start(), and torturing postgres by burning through
>billions of multixactids of various sizes.  Behaves about as
>bad^Wgood as without the induced failures; before the patch there
>were moments of spurious warnings/errors when ->offsetStopLimit was
>out of whack.

I find the commit message you wrote a little difficult to read, and
propose the following version instead, which reads better to me:

Previously, if find_multixact_start() failed, SetOffsetVacuumLimit()
would install 0 into MultiXactState->offsetStopLimit.  Luckily, there
are no known cases where find_multixact_start() will return an error
in 9.5 and above. But if it were to happen, for example due to
filesystem permission issues, it'd be somewhat bad:
GetNewMultiXactId() could continue allocating mxids even if close to a
wraparound, or it could erroneously stop allocating mxids, even if no
wraparound is looming.  The wrong value would be corrected the next
time SetOffsetVacuumLimit() is called, or by a restart.

(I have no comments on the substance of either patch and have reviewed
the first one to a negligible degree - it doesn't look obviously wrong
- and the second one not at all.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-12 Thread Andres Freund
Noah, Robert, All

On 2015-12-11 11:20:21 -0500, Robert Haas wrote:
> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas  wrote:
> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
> >
> > So let's do that, then.
> 
> Who is going to take care of this?

Here's two patches:

1) The fix to SetOffsetVacuumLimit().

   I've tested this by introducing a probabilistic "return false;" to
   find_multixact_start(), and torturing postgres by burning through
   billions of multixactids of various sizes.  Behaves about as
   bad^Wgood as without the induced failures; before the patch there
   were moments of spurious warnings/errors when ->offsetStopLimit was
   out of whack.

2) A subset of the comment changes from Noah's repository. Some of the
   comment changes didn't make sense without the removal
   SlruScanDirCbFindEarliest(), a small number of other changes I couldn't
   fully agree with.

   Noah, are you ok with pushing that subset of your changes? Is
   "Slightly edited subset of a larger patchset by Noah Misch" an OK
   attribution?


Noah, on a first glance I think e50cca0ae ("Remove the
SlruScanDirCbFindEarliest() test from TruncateMultiXact().") is a good
idea. So I do encourage you to submit that as a separate patch.

Regards,

Andres
>From d26d41a96a7fc8dbe9827d55837875790b21ca1b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 12 Dec 2015 17:57:21 +0100
Subject: [PATCH 1/2] Fix bug in SetOffsetVacuumLimit() triggered by
 find_multixact_start() failure.

In case find_multixact_start() failed SetOffsetVacuumLimit() installed 0
into MultiXactState->offsetStopLimit. Unlike oldestOffset the
to-be-installed value was not restored in the error branch.

Luckily there are no known cases where find_multixact_start() will
return an error in 9.5 and above.

But if the bug is triggered, e.g. due to filesystem permission issues,
it'd be somewhat bad: GetNewMultiXactId() could continue allocating
mxids even if close to a wraparound, or it could erroneously stop
allocating mxids, even if no wraparound is looming.  Luckily the wrong
value would be corrected the next time SetOffsetVacuumLimit() is called,
or by a restart.

Reported-By: Noah Misch, although this is not his preferred fix
Backpatch: 9.5, where the bug was introduced as part of 4f627f
---
 src/backend/access/transam/multixact.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index b66a2b6..d2619bd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2552,6 +2552,7 @@ SetOffsetVacuumLimit(void)
 	bool		oldestOffsetKnown = false;
 	bool		prevOldestOffsetKnown;
 	MultiXactOffset offsetStopLimit = 0;
+	MultiXactOffset prevOffsetStopLimit;
 
 	/*
 	 * NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2566,6 +2567,7 @@ SetOffsetVacuumLimit(void)
 	nextOffset = MultiXactState->nextOffset;
 	prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
 	prevOldestOffset = MultiXactState->oldestOffset;
+	prevOffsetStopLimit = MultiXactState->offsetStopLimit;
 	Assert(MultiXactState->finishedStartup);
 	LWLockRelease(MultiXactGenLock);
 
@@ -2633,11 +2635,13 @@ SetOffsetVacuumLimit(void)
 	{
 		/*
 		 * If we failed to get the oldest offset this time, but we have a
-		 * value from a previous pass through this function, use the old value
-		 * rather than automatically forcing it.
+		 * value from a previous pass through this function, use the old
+		 * values rather than automatically forcing an emergency autovacuum
+		 * cycle again.
 		 */
 		oldestOffset = prevOldestOffset;
 		oldestOffsetKnown = true;
+		offsetStopLimit = prevOffsetStopLimit;
 	}
 
 	/* Install the computed values */
-- 
2.6.0.rc3

>From addafc27e7b89187a3f55a2cbce136d58722c97b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 12 Dec 2015 17:53:41 +0100
Subject: [PATCH 2/2] Improve/Fix comments around multixact truncation.

My commits 4f627f8 ("Rework the way multixact truncations work.") and
aa29c1c ("Remove legacy multixact truncation support.") missed updating
a number of comments. Fix that. Additionally improve accuracy of a few
of the added comments.

Reported-By: Noah Misch
Author: Slightly edited subset of a larger patchset by Noah Misch
Backpatch: 9.5, which is the first branch to contain the above commits
---
 src/backend/access/heap/README.tuplock |  6 ++
 src/backend/access/transam/multixact.c | 36 +++---
 src/backend/access/transam/xlog.c  |  4 
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 10b8d78..f845958 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -102,10 +102,8 @@ the MultiXacts in them are no longer of interest.
 VACUUM is in charge of removing old MultiXacts at the time

Re: [HACKERS] Rework the way multixact truncations work

2015-12-11 Thread Noah Misch
On Thu, Dec 10, 2015 at 08:55:54AM -0500, Robert Haas wrote:
> I don't know have anything to add to what others have said in response
> to this point, except this: the whole point of using a source code
> management system is to tell you what changed and when.  What you are
> proposing to do makes it unusable for that purpose.

Based on your comments, I'm calling the patch series returned with feedback.
I built the series around the goal of making history maximally reviewable for
persons not insiders to commit 4f627f8.  Having spent 90% of my 2015
PostgreSQL contribution time finding or fixing committed defects, my judgment
of how best to achieve that is no shout from the peanut gallery.  (Neither is
your judgment.)  In particular, I had in view two works, RLS and pg_audit,
that used the post-commit repair strategy you've advocated.  But you gave me a
fair chance to make the case, and you stayed convinced that my repairs oppose
my goal.  I can now follow your development of that belief, which is enough.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-11 Thread Robert Haas
On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas  wrote:
> On Thu, Dec 10, 2015 at 9:04 AM, Andres Freund  wrote:
>> On 2015-12-10 08:55:54 -0500, Robert Haas wrote:
>>> Maybe.  But I think we could use a little more vigorous discussion of
>>> that issue, since Andres doesn't seem to be very convinced by your
>>> analysis, and I don't currently understand what you've fixed because I
>>> can't, as mentioned several times, follow your patch stack.
>>
>> The issue at hand is that the following block
>> oldestOffsetKnown =
>> find_multixact_start(oldestMultiXactId, 
>> &oldestOffset);
>>
>> ...
>> else if (prevOldestOffsetKnown)
>> {
>> /*
>>  * If we failed to get the oldest offset this time, but we 
>> have a
>>  * value from a previous pass through this function, use the 
>> old value
>>  * rather than automatically forcing it.
>>  */
>> oldestOffset = prevOldestOffset;
>> oldestOffsetKnown = true;
>> }
>> in SetOffsetVacuumLimit() fails to restore offsetStopLimit, which then
>> is set in shared memory:
>> /* Install the computed values */
>> LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
>> MultiXactState->oldestOffset = oldestOffset;
>> MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
>> MultiXactState->offsetStopLimit = offsetStopLimit;
>> LWLockRelease(MultiXactGenLock);
>>
>> so, if find_multixact_start() failed - a "should never happen" occurance
>> - we install a wrong stop limit. It does get 'repaired' upon the next
>> suceeding find_multixact_start() in SetOffsetVacuumLimit() or a restart
>> though.
>>
>> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
>
> So let's do that, then.

Who is going to take care of this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 9:04 AM, Andres Freund  wrote:
> On 2015-12-10 08:55:54 -0500, Robert Haas wrote:
>> Maybe.  But I think we could use a little more vigorous discussion of
>> that issue, since Andres doesn't seem to be very convinced by your
>> analysis, and I don't currently understand what you've fixed because I
>> can't, as mentioned several times, follow your patch stack.
>
> The issue at hand is that the following block
> oldestOffsetKnown =
> find_multixact_start(oldestMultiXactId, 
> &oldestOffset);
>
> ...
> else if (prevOldestOffsetKnown)
> {
> /*
>  * If we failed to get the oldest offset this time, but we 
> have a
>  * value from a previous pass through this function, use the 
> old value
>  * rather than automatically forcing it.
>  */
> oldestOffset = prevOldestOffset;
> oldestOffsetKnown = true;
> }
> in SetOffsetVacuumLimit() fails to restore offsetStopLimit, which then
> is set in shared memory:
> /* Install the computed values */
> LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
> MultiXactState->oldestOffset = oldestOffset;
> MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
> MultiXactState->offsetStopLimit = offsetStopLimit;
> LWLockRelease(MultiXactGenLock);
>
> so, if find_multixact_start() failed - a "should never happen" occurance
> - we install a wrong stop limit. It does get 'repaired' upon the next
> suceeding find_multixact_start() in SetOffsetVacuumLimit() or a restart
> though.
>
> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.

So let's do that, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-10 Thread Andres Freund
On 2015-12-10 08:55:54 -0500, Robert Haas wrote:
> Maybe.  But I think we could use a little more vigorous discussion of
> that issue, since Andres doesn't seem to be very convinced by your
> analysis, and I don't currently understand what you've fixed because I
> can't, as mentioned several times, follow your patch stack.

The issue at hand is that the following block
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);

...
else if (prevOldestOffsetKnown)
{
/*
 * If we failed to get the oldest offset this time, but we have 
a
 * value from a previous pass through this function, use the 
old value
 * rather than automatically forcing it.
 */
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
}
in SetOffsetVacuumLimit() fails to restore offsetStopLimit, which then
is set in shared memory:
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);

so, if find_multixact_start() failed - a "should never happen" occurance
- we install a wrong stop limit. It does get 'repaired' upon the next
suceeding find_multixact_start() in SetOffsetVacuumLimit() or a restart
though.

Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-10 Thread Robert Haas
On Wed, Dec 9, 2015 at 8:23 PM, Noah Misch  wrote:
> On Wed, Dec 09, 2015 at 11:08:32AM -0500, Robert Haas wrote:
>> On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch  wrote:
>> > I hope those who have not already read commit 4f627f8 will not waste time
>> > reading it.  They should instead ignore multixact changes from commit 
>> > 4f627f8
>> > through its revert.  The 2015-09-26 commits have not appeared in a 
>> > supported
>> > release, and no other work has built upon them.  They have no tenure.  (I 
>> > am
>> > glad you talked the author out of back-patching; otherwise, 9.4.5 and 
>> > 9.3.10
>> > would have introduced a data loss bug.)  Nobody reported a single defect
>> > before my review overturned half the patch.  A revert will indeed impose on
>> > those who invested time to understand commit 4f627f8, but the silence about
>> > its defects suggests the people understanding it number either zero or one.
>> > Even as its author and reviewers, you would do better to set aside what you
>> > thought you knew about this code.
>>
>> I just don't find this a realistic model of how people use the git
>> log.  Maybe you use it this way; I don't.  I don't *want* git blame to
>> make it seem as if 4f627f8 is not part of the history.  For better or
>> worse, it is.
>
> I would like to understand how you use git, then.  What's one of your models
> of using "git log" and/or "git blame" in which you foresee the revert making
> history less clear, not more clear?

Well, suppose I wanted to know what bugs were fixed between 9.5beta
and 9.5.0, for example.  I mean, I'm going to run git log
src/backend/access/transam/multixact.c ... and the existing commits
are going to be there.

> By the way, it occurs to me that I should also make pg_upgrade blacklist the
> range of catversions that might have data loss.  No sense in putting ourselves
> in the position of asking whether data files of a 9.9.3 cluster spent time in
> a 9.5beta2 cluster.

Maybe.  But I think we could use a little more vigorous discussion of
that issue, since Andres doesn't seem to be very convinced by your
analysis, and I don't currently understand what you've fixed because I
can't, as mentioned several times, follow your patch stack.

>> Ripping it out and replacing it monolithically will not
>> change that; it will only make the detailed history harder to
>> reconstruct, and I *will* want to reconstruct it.
>
> What's something that might happen six months from now and lead you to inspect
> master or 9.5 multixact.c between 4f627f8 and its revert?

I don't know have anything to add to what others have said in response
to this point, except this: the whole point of using a source code
management system is to tell you what changed and when.  What you are
proposing to do makes it unusable for that purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-10 Thread Bert
+1

On Thu, Dec 10, 2015 at 9:58 AM, Peter Geoghegan  wrote:

> On Thu, Dec 10, 2015 at 12:34 AM, Andres Freund 
> wrote:
> >> > Ripping it out and replacing it monolithically will not
> >> > change that; it will only make the detailed history harder to
> >> > reconstruct, and I *will* want to reconstruct it.
> >>
> >> What's something that might happen six months from now and lead you to
> inspect
> >> master or 9.5 multixact.c between 4f627f8 and its revert?
> >
> > "Hey, what has happened to multixact.c lately? I'm investigating a bug,
> > and I wonder if it already has been fixed?", "Uh, what was the problem
> > with that earlier large commit?", "Hey, what has changed between beta2
> > and the final release?"...
>
> Quite.
>
> I can't believe we're still having this silly discussion. Can we please
> move on?
>
> --
> Peter Geoghegan
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Bert Desmet
0477/305361


Re: [HACKERS] Rework the way multixact truncations work

2015-12-10 Thread Peter Geoghegan
On Thu, Dec 10, 2015 at 12:34 AM, Andres Freund  wrote:
>> > Ripping it out and replacing it monolithically will not
>> > change that; it will only make the detailed history harder to
>> > reconstruct, and I *will* want to reconstruct it.
>>
>> What's something that might happen six months from now and lead you to 
>> inspect
>> master or 9.5 multixact.c between 4f627f8 and its revert?
>
> "Hey, what has happened to multixact.c lately? I'm investigating a bug,
> and I wonder if it already has been fixed?", "Uh, what was the problem
> with that earlier large commit?", "Hey, what has changed between beta2
> and the final release?"...

Quite.

I can't believe we're still having this silly discussion. Can we please move on?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-10 Thread Andres Freund
On 2015-12-09 20:23:06 -0500, Noah Misch wrote:
> By the way, it occurs to me that I should also make pg_upgrade blacklist the
> range of catversions that might have data loss.  No sense in putting ourselves
> in the position of asking whether data files of a 9.9.3 cluster spent time in
> a 9.5beta2 cluster.

I can't see any benefit in that. We're talking about a bug that, afaics,
needs another unknown bug to trigger (so find_multixact_start() fails),
and then very likely needs significant amounts of new multixacts
consumed, without a restart and without find_multixact_start()
succeeding later.


What I think would actually help for questions like this, is to add, as
discussed in some other threads, the following:
1) 'creating version' to pg_control
2) 'creating version' to each pg_class entry
3) 'last relation rewrite in version' to each pg_class entry
4) 'last full vacuum in version' to each pg_class entry

I think for this purpose 'version' should be something akin to
$catversion||$numericversion (int64 probably?) - that way development
branches and release branches are handled somewhat usefully.

I think that'd be useful, both from an investigatory perspective, as
from a tooling perspective, because it'd allow reusing things like hint
bits.


> > Ripping it out and replacing it monolithically will not
> > change that; it will only make the detailed history harder to
> > reconstruct, and I *will* want to reconstruct it.
> 
> What's something that might happen six months from now and lead you to inspect
> master or 9.5 multixact.c between 4f627f8 and its revert?

"Hey, what has happened to multixact.c lately? I'm investigating a bug,
and I wonder if it already has been fixed?", "Uh, what was the problem
with that earlier large commit?", "Hey, what has changed between beta2
and the final release?"...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-09 Thread Noah Misch
On Wed, Dec 09, 2015 at 11:08:32AM -0500, Robert Haas wrote:
> On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch  wrote:
> > I hope those who have not already read commit 4f627f8 will not waste time
> > reading it.  They should instead ignore multixact changes from commit 
> > 4f627f8
> > through its revert.  The 2015-09-26 commits have not appeared in a supported
> > release, and no other work has built upon them.  They have no tenure.  (I am
> > glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10
> > would have introduced a data loss bug.)  Nobody reported a single defect
> > before my review overturned half the patch.  A revert will indeed impose on
> > those who invested time to understand commit 4f627f8, but the silence about
> > its defects suggests the people understanding it number either zero or one.
> > Even as its author and reviewers, you would do better to set aside what you
> > thought you knew about this code.
> 
> I just don't find this a realistic model of how people use the git
> log.  Maybe you use it this way; I don't.  I don't *want* git blame to
> make it seem as if 4f627f8 is not part of the history.  For better or
> worse, it is.

I would like to understand how you use git, then.  What's one of your models
of using "git log" and/or "git blame" in which you foresee the revert making
history less clear, not more clear?

By the way, it occurs to me that I should also make pg_upgrade blacklist the
range of catversions that might have data loss.  No sense in putting ourselves
in the position of asking whether data files of a 9.9.3 cluster spent time in
a 9.5beta2 cluster.

> Ripping it out and replacing it monolithically will not
> change that; it will only make the detailed history harder to
> reconstruct, and I *will* want to reconstruct it.

What's something that might happen six months from now and lead you to inspect
master or 9.5 multixact.c between 4f627f8 and its revert?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-09 Thread Andres Freund
On 2015-12-09 11:18:39 -0500, Robert Haas wrote:
> If I correctly understand the scenario that you are describing, that
> does happen - not for the same MXID, but for different ones.  At least
> the last time I checked, and I'm not sure if we've fixed this, it
> could happen because the SLRU page that contains the multixact wasn't
> flushed out of the SLRU buffers yet.

That should be fixed, with the brute force solution of flushing buffers
before searching for files on disk.

> But apart from that, it could
> happen any time there's a gap in the sequence of files, and that sure
> doesn't seem like a can't-happen situation.  We know that, on 9.3,
> there's definitely a sequence of events that leads to a  file
> followed by a gap followed by the series of files that are still live.
> Given the number of other bugs we've fixed in this area, I would not
> like to bet on that being the only scenario where this crops up.  It
> *shouldn't* happen, and as far as we know, if you start and end on a
> version newer than 4f627f8 and aa29c1c, it won't.  Older branches,
> though, I wouldn't like to bet on.

Ok, fair enough.

andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-09 Thread Robert Haas
On Wed, Dec 9, 2015 at 10:41 AM, Andres Freund  wrote:
>> (I am glad you talked the author out of back-patching; otherwise,
>> 9.4.5 and 9.3.10 would have introduced a data loss bug.)
>
> Isn't that a bug in a, as far as we know, impossible scenario? Unless I
> miss something there's no known case where it's "expected" that
> find_multixact_start() fails after initially succeeding? Sure, it sucks
> that the bug survived review and that it was written in the first
> place. But it not showing up during testing isn't meaningful, given it's
> a should-never-happen scenario.

If I correctly understand the scenario that you are describing, that
does happen - not for the same MXID, but for different ones.  At least
the last time I checked, and I'm not sure if we've fixed this, it
could happen because the SLRU page that contains the multixact wasn't
flushed out of the SLRU buffers yet.  But apart from that, it could
happen any time there's a gap in the sequence of files, and that sure
doesn't seem like a can't-happen situation.  We know that, on 9.3,
there's definitely a sequence of events that leads to a  file
followed by a gap followed by the series of files that are still live.
Given the number of other bugs we've fixed in this area, I would not
like to bet on that being the only scenario where this crops up.  It
*shouldn't* happen, and as far as we know, if you start and end on a
version newer than 4f627f8 and aa29c1c, it won't.  Older branches,
though, I wouldn't like to bet on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-09 Thread Robert Haas
On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch  wrote:
> Andres writing the patch that became commit 4f627f8 and you reviewing it were
> gifts to Alvaro and to the community.  Aware of that, I have avoided[1] saying
> that I was shocked to see that commit's defects.  Despite a committer-author
> and _two_ committer reviewers, the patch was rife with wrong new comments,
> omitted updates to comments it caused to become wrong, and unsolicited
> whitespace churn.  (Anyone could have missed the data loss bug, but these
> collectively leap off the page.)  This in beleaguered code critical to data
> integrity.  You call this thread's latest code a patch submission, but I call
> it bandaging the tree after a recent commit that never should have reached the
> tree.  Hey, if you'd like me to post the traditional patch files, that's easy.
> It would have been easier for me.  I posted branches because it gives more
> metadata to guide review.  As for the choice to revert and redo ...

Yes, I'd like patch files, one per topic.

I wasn't very happy with the way that patch it was written; it seemed
to me that it touched too much code and move a lot of things around
unnecessarily, and I said so at the time.  I would have preferred
something more incremental, and I asked for it and didn't get it.
Well, I'm not giving up: I'm asking for the same thing here.  I didn't
think it was a good idea for Andres to rearrange that much code in a
single commit, because it was hard to review, and I don't think it's a
good idea for you to do it, either.  To the extent that you found
bugs, I think that proves the point that large commits are hard to
review and small commits that change things just a little bit at a
time are the way to go.

> I hope those who have not already read commit 4f627f8 will not waste time
> reading it.  They should instead ignore multixact changes from commit 4f627f8
> through its revert.  The 2015-09-26 commits have not appeared in a supported
> release, and no other work has built upon them.  They have no tenure.  (I am
> glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10
> would have introduced a data loss bug.)  Nobody reported a single defect
> before my review overturned half the patch.  A revert will indeed impose on
> those who invested time to understand commit 4f627f8, but the silence about
> its defects suggests the people understanding it number either zero or one.
> Even as its author and reviewers, you would do better to set aside what you
> thought you knew about this code.

I just don't find this a realistic model of how people use the git
log.  Maybe you use it this way; I don't.  I don't *want* git blame to
make it seem as if 4f627f8 is not part of the history.  For better or
worse, it is.  Ripping it out and replacing it monolithically will not
change that; it will only make the detailed history harder to
reconstruct, and I *will* want to reconstruct it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-09 Thread Andres Freund
On 2015-12-09 09:43:19 -0500, Noah Misch wrote:
> Aware of that, I have avoided[1] saying that I was shocked to see that
> commit's defects.  Despite a committer-author and _two_ committer
> reviewers, the patch was rife with wrong new comments, omitted updates
> to comments it caused to become wrong,

It's not like that patch wasn't posted for review for months...


> and unsolicited whitespace churn.

Whitespace churn? The commit includes a pgindent run, because Alvaro
asked me to do that, but that just affected a handful of lines. If you
mean the variable ordering: given several variables were renamed anyway,
additionally putting them in a easier to understand order, seems rather
painless. If you mean 'pgindent immune' long lines - multixact.c is far
from the only one with those, and they're prett harmless.


> You call this thread's latest code a patch
> submission, but I call it bandaging the tree after a recent commit
> that never should have reached the tree.

Oh, for christs sake.


> Hey, if you'd like me to
> post the traditional patch files, that's easy.  It would have been
> easier for me.

You've been asked that, repeatedly. At least if you take 'traditional
patch files' to include traditional, iterative, patches ontop of the
current tree.


> I hope those who have not already read commit 4f627f8 will not waste time
> reading it.

We have to, who knows what's hiding in there. Your git log even shows
that you had conflicts in your approach (83cb04 Conflicts:
src/backend/access/transam/multixact.c).


> They should instead ignore multixact changes from commit 4f627f8
> through its revert.  The 2015-09-26 commits have not appeared in a supported
> release, and no other work has built upon them.

> They have no tenure.

Man.


> (I am glad you talked the author out of back-patching; otherwise,
> 9.4.5 and 9.3.10 would have introduced a data loss bug.)

Isn't that a bug in a, as far as we know, impossible scenario? Unless I
miss something there's no known case where it's "expected" that
find_multixact_start() fails after initially succeeding? Sure, it sucks
that the bug survived review and that it was written in the first
place. But it not showing up during testing isn't meaningful, given it's
a should-never-happen scenario.

I'm actually kinda inclined to rip out the whole "previous pass" logic
out alltogether, and replace it with a PANIC. It's a hard to test,
should never happen, scenario. If it happens, things have already
seriously gone sour.

> > That's not a one-time problem for Andres during the course of review;
> > that is a problem for every single person who looks at the commit
> > history from now until the end of time.
> 
> It's a service to future readers that no line of "git blame master <...>" will
> refer to a 2015-09-26 multixact commit.

And a disservice for everyone doing git log, or git blame for
intermediate states of the tree. The benefit for git blame, are almost
nonexistant, not seing a couple newlines changed, or not seing some
intermediate commits isn't really important.


> Blame reports will instead refer to
> replacement commits designed to be meaningful for study in isolation.  If I
> instead structured the repairs as you ask, the blame would find one of 4f627f8
> or its various repair commits, none of which would be a self-contained unit of
> development.

So what? That's how development in general works. And how it actually
happened in this specific case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-09 Thread Noah Misch
On Tue, Dec 08, 2015 at 01:05:03PM -0500, Robert Haas wrote:
> On Tue, Dec 8, 2015 at 6:43 AM, Andres Freund  wrote:
> > On 2015-12-04 21:55:29 -0500, Noah Misch wrote:
> >> On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
> >> > Sorry, but I really just want to see these changes as iterative patches
> >> > ontop of 9.5/HEAD instead of this process. I won't revert the reversion
> >> > if you push it anyway, but I think it's a rather bad idea.
> >>
> >> I hear you.
> >
> > Not just me.
> >
> >> I evaluated your request and judged that the benefits you cited
> >> did not make up for the losses I cited.  Should you wish to change my mind,
> >> your best bet is to find defects in the commits I proposed.  If I 
> >> introduced
> >> juicy defects, that discovery would lend much weight to your conjectures.
> >
> > I've absolutely no interest in "proving you wrong". And my desire to
> > review patches that are in a, in my opinion, barely reviewable format is
> > pretty low as well.
> 
> I agree.  Noah, it seems to me that you are offering a novel theory of
> how patches should be submitted, reviewed, and committed, but you've
> got three people, two of them committers, telling you that we don't
> like that approach.  I seriously doubt you're going to find anyone who
> does.

Andres writing the patch that became commit 4f627f8 and you reviewing it were
gifts to Alvaro and to the community.  Aware of that, I have avoided[1] saying
that I was shocked to see that commit's defects.  Despite a committer-author
and _two_ committer reviewers, the patch was rife with wrong new comments,
omitted updates to comments it caused to become wrong, and unsolicited
whitespace churn.  (Anyone could have missed the data loss bug, but these
collectively leap off the page.)  This in beleaguered code critical to data
integrity.  You call this thread's latest code a patch submission, but I call
it bandaging the tree after a recent commit that never should have reached the
tree.  Hey, if you'd like me to post the traditional patch files, that's easy.
It would have been easier for me.  I posted branches because it gives more
metadata to guide review.  As for the choice to revert and redo ...

> When stuff gets committed to the tree, people want to to be
> able to answer the question "what has just now changed?" and it is
> indisputable that what you want to do here will make that harder.

I hope those who have not already read commit 4f627f8 will not waste time
reading it.  They should instead ignore multixact changes from commit 4f627f8
through its revert.  The 2015-09-26 commits have not appeared in a supported
release, and no other work has built upon them.  They have no tenure.  (I am
glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10
would have introduced a data loss bug.)  Nobody reported a single defect
before my review overturned half the patch.  A revert will indeed impose on
those who invested time to understand commit 4f627f8, but the silence about
its defects suggests the people understanding it number either zero or one.
Even as its author and reviewers, you would do better to set aside what you
thought you knew about this code.

> That's not a one-time problem for Andres during the course of review;
> that is a problem for every single person who looks at the commit
> history from now until the end of time.

It's a service to future readers that no line of "git blame master <...>" will
refer to a 2015-09-26 multixact commit.  Blame reports will instead refer to
replacement commits designed to be meaningful for study in isolation.  If I
instead structured the repairs as you ask, the blame would find one of 4f627f8
or its various repair commits, none of which would be a self-contained unit of
development.  What's to enjoy about discovering that history?

> I don't think you have the
> right to force your proposed approach through in the face of concerted
> opposition.

That's correct; I do not have that right.  Your objection still worries me.

nm

[1] 
http://www.postgresql.org/message-id/20151029065903.gc770...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-08 Thread Robert Haas
On Tue, Dec 8, 2015 at 6:43 AM, Andres Freund  wrote:
> On 2015-12-04 21:55:29 -0500, Noah Misch wrote:
>> On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
>> > Sorry, but I really just want to see these changes as iterative patches
>> > ontop of 9.5/HEAD instead of this process. I won't revert the reversion
>> > if you push it anyway, but I think it's a rather bad idea.
>>
>> I hear you.
>
> Not just me.
>
>> I evaluated your request and judged that the benefits you cited
>> did not make up for the losses I cited.  Should you wish to change my mind,
>> your best bet is to find defects in the commits I proposed.  If I introduced
>> juicy defects, that discovery would lend much weight to your conjectures.
>
> I've absolutely no interest in "proving you wrong". And my desire to
> review patches that are in a, in my opinion, barely reviewable format is
> pretty low as well.

I agree.  Noah, it seems to me that you are offering a novel theory of
how patches should be submitted, reviewed, and committed, but you've
got three people, two of them committers, telling you that we don't
like that approach.  I seriously doubt you're going to find anyone who
does.  When stuff gets committed to the tree, people want to to be
able to answer the question "what has just now changed?" and it is
indisputable that what you want to do here will make that harder.
That's not a one-time problem for Andres during the course of review;
that is a problem for every single person who looks at the commit
history from now until the end of time.  I don't think you have the
right to force your proposed approach through in the face of concerted
opposition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-08 Thread Andres Freund
On 2015-12-04 21:55:29 -0500, Noah Misch wrote:
> On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
> > Sorry, but I really just want to see these changes as iterative patches
> > ontop of 9.5/HEAD instead of this process. I won't revert the reversion
> > if you push it anyway, but I think it's a rather bad idea.
> 
> I hear you.

Not just me.

> I evaluated your request and judged that the benefits you cited
> did not make up for the losses I cited.  Should you wish to change my mind,
> your best bet is to find defects in the commits I proposed.  If I introduced
> juicy defects, that discovery would lend much weight to your conjectures.

I've absolutely no interest in "proving you wrong". And my desire to
review patches that are in a, in my opinion, barely reviewable format is
pretty low as well.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-04 Thread Noah Misch
On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
> On 2015-12-03 04:38:45 -0500, Noah Misch wrote:
> > On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> > > Especially if reverting and redoing includes conflicts that mainly
> > > increases the chance of accidental bugs.
> > 
> > True.  (That doesn't apply to these patches.)
> 
> Uh, it does. You had conflicts in your process, and it's hard to verify
> that the re-applied patch is actually functionally identical given the
> volume of changes. It's much easier to see what actually changes by
> looking at iterative commits forward from the current state.

Ah, we were talking about different topics after all.  I was talking about
_merge_ conflicts in a reversion commit.

> Sorry, but I really just want to see these changes as iterative patches
> ontop of 9.5/HEAD instead of this process. I won't revert the reversion
> if you push it anyway, but I think it's a rather bad idea.

I hear you.  I evaluated your request and judged that the benefits you cited
did not make up for the losses I cited.  Should you wish to change my mind,
your best bet is to find defects in the commits I proposed.  If I introduced
juicy defects, that discovery would lend much weight to your conjectures.

> My question was more about the comment being after the "early return"
> than about the content change, should have made that clearer. Can we
> just move your comment up?

Sure, I will.

> > > >  static bool
> > > >  SetOffsetVacuumLimit(void)
> > > >  {
> > > > -   MultiXactId oldestMultiXactId;
> > > > +   MultiXactId oldestMultiXactId;
> > > > MultiXactId nextMXact;
> > > > -   MultiXactOffset oldestOffset = 0;   /* placate compiler */
> > > > -   MultiXactOffset prevOldestOffset;
> > > > +   MultiXactOffset oldestOffset = 0;   /* placate 
> > > > compiler */
> > > > MultiXactOffset nextOffset;
> > > > boololdestOffsetKnown = false;
> > > > +   MultiXactOffset prevOldestOffset;
> > > > boolprevOldestOffsetKnown;
> > > > -   MultiXactOffset offsetStopLimit = 0;
> > > 
> > > I don't see the benefit of the order changes here.
> > 
> > I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
> > reverted when I refinished that commit as branch mxt3-main.
> 
> But the other changes are there, and in the history anyway. As the new
> order isn't more meaningful than the current one...

Right.  A revert+redo patch series can and should purge formatting changes
that did not belong in its predecessor commits.  Alternate change delivery
strategies wouldn't do that.

> > > > -   if (oldestOffsetKnown)
> > > > -   ereport(DEBUG1,
> > > > -   (errmsg("oldest MultiXactId 
> > > > member is at offset %u",
> > > > -   oldestOffset)));
> > > 
> > > That's imo a rather useful debug message.
> > 
> > The branches emit that message at the same times 4f627f8^ and earlier emit 
> > it.
> 
> During testing I found it rather helpful if it was emitted regularly.

I wouldn't oppose a patch making it happen more often.

> > > > LWLockRelease(MultiXactTruncationLock);
> > > > 
> > > > /*
> > > > -* If we can, compute limits (and install them MultiXactState) 
> > > > to prevent
> > > > -* overrun of old data in the members SLRU area. We can only do 
> > > > so if the
> > > > -* oldest offset is known though.
> > > > +* There's no need to update anything if we don't know the 
> > > > oldest offset
> > > > +* or if it hasn't changed.
> > > >  */
> > > 
> > > Is that really a worthwhile optimization?
> > 
> > I would neither remove that longstanding optimization nor add it from 
> > scratch
> > today.  Branch commit 06c9979 restored it as part of a larger restoration to
> > the pre-4f627f8 structure of SetOffsetVacuumLimit().
> 
> There DetermineSafeOldestOffset() did it unconditionally.

That is true; one won't be consistent with both.  06c9979 materially shortened
the final patch and eliminated some user-visible message emission changes.
Moreover, this is clearly a case of SetOffsetVacuumLimit() absorbing
DetermineSafeOldestOffset(), not vice versa.

> > > > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, 
> > > > void *data)
> > > 
> > > That really seems like an independent change, deserving its own commit +
> > > explanation.
> > 
> > Indeed.  I explained that change at length in
> > http://www.postgresql.org/message-id/20151108085407.ga1097...@tornado.leadboat.com,
> > including that it's alone on a branch (mxt1-disk-independent), to become its
> > own PostgreSQL commit.
> 
> The comment there doesn't include the explanation...

If you visit that URL, everything from "If anything here requires careful
study, it's the small mxt1-disk-independent change, which .

Re: [HACKERS] Rework the way multixact truncations work

2015-12-03 Thread Andres Freund
On 2015-12-03 04:38:45 -0500, Noah Misch wrote:
> On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> > Especially if reverting and redoing includes conflicts that mainly
> > increases the chance of accidental bugs.
> 
> True.  (That doesn't apply to these patches.)

Uh, it does. You had conflicts in your process, and it's hard to verify
that the re-applied patch is actually functionally identical given the
volume of changes. It's much easier to see what actually changes by
looking at iterative commits forward from the current state.


Sorry, but I really just want to see these changes as iterative patches
ontop of 9.5/HEAD instead of this process. I won't revert the reversion
if you push it anyway, but I think it's a rather bad idea.


> > > git remote add community git://git.postgresql.org/git/postgresql.git
> > > git remote add nmisch_github https://github.com/nmisch/postgresql.git
> > > git fetch --multiple community nmisch_github
> > > git diff community/master...nmisch_github/mxt4-rm-legacy
> > 
> > That's a nearly useless diff, because it includes a lot of other changes
> > (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
> > you published the changes.
> 
> Perhaps you used "git diff a..b", not "git diff a...b".

Ah yes. Neat, didn't know that one.

> > > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> > > - /*
> > > -  * Computing the actual limits is only possible once the data directory 
> > > is
> > > -  * in a consistent state. There's no need to compute the limits while
> > > -  * still replaying WAL - no decisions about new multis are made even
> > > -  * though multixact creations might be replayed. So we'll only do 
> > > further
> > > -  * checks after TrimMultiXact() has been called.
> > > -  */
> > > + /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
> > >   if (!MultiXactState->finishedStartup)
> > >   return;
> > > -
> > >   Assert(!InRecovery);
> > > 
> > > - /* Set limits for offset vacuum. */
> > > + /*
> > > +  * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> > > +  * call, which is only possible once the data directory is in a 
> > > consistent
> > > +  * state. There's no need for an offset limit while still replaying WAL;
> > > +  * no decisions about new multis are made even though multixact 
> > > creations
> > > +  * might be replayed.
> > > +  */
> > >   needs_offset_vacuum = SetOffsetVacuumLimit();
> > 
> > I don't really see the benefit of this change. The previous location of
> > the comment is where we return early, so it seems appropriate to
> > document the reason there?
> 
> I made that low-importance change for two reasons.  First, returning at that
> point skips more than just the setting a limit; it also skips autovacuum
> signalling and wraparound warnings.  Second, the function has just computed
> mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we
> defer an offset limit, not any and all limits.

My question was more about the comment being after the "early return"
than about the content change, should have made that clearer. Can we
just move your comment up?

> > >  static bool
> > >  SetOffsetVacuumLimit(void)
> > >  {
> > > - MultiXactId oldestMultiXactId;
> > > + MultiXactId oldestMultiXactId;
> > >   MultiXactId nextMXact;
> > > - MultiXactOffset oldestOffset = 0;   /* placate compiler */
> > > - MultiXactOffset prevOldestOffset;
> > > + MultiXactOffset oldestOffset = 0;   /* placate compiler */
> > >   MultiXactOffset nextOffset;
> > >   boololdestOffsetKnown = false;
> > > + MultiXactOffset prevOldestOffset;
> > >   boolprevOldestOffsetKnown;
> > > - MultiXactOffset offsetStopLimit = 0;
> > 
> > I don't see the benefit of the order changes here.
> 
> I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
> reverted when I refinished that commit as branch mxt3-main.

But the other changes are there, and in the history anyway. As the new
order isn't more meaningful than the current one...

> > > - if (oldestOffsetKnown)
> > > - ereport(DEBUG1,
> > > - (errmsg("oldest MultiXactId member is 
> > > at offset %u",
> > > - oldestOffset)));
> > 
> > That's imo a rather useful debug message.
> 
> The branches emit that message at the same times 4f627f8^ and earlier emit it.

During testing I found it rather helpful if it was emitted regularly.

> > >   LWLockRelease(MultiXactTruncationLock);
> > > 
> > >   /*
> > > -  * If we can, compute limits (and install them MultiXactState) to 
> > > prevent
> > > -  * overrun of old data in the members SLRU area. We can only do so if 
> > > the
> > > -  * oldest offset is known though.
> > > +  * There's no need to update anything if we don't know the oldest offset
> > > +  * or if it hasn't changed.
> > >*/
> > 
> > 

Re: [HACKERS] Rework the way multixact truncations work

2015-12-03 Thread Noah Misch
On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> > Hackers have been too reticent to revert and redo defect-laden
> > commits.  If doing that is weird today, let it be normal.
> 
> Why?

See my paragraph ending with the two sentences you quoted.

> Especially if reverting and redoing includes conflicts that mainly
> increases the chance of accidental bugs.

True.  (That doesn't apply to these patches.)

> > git remote add community git://git.postgresql.org/git/postgresql.git
> > git remote add nmisch_github https://github.com/nmisch/postgresql.git
> > git fetch --multiple community nmisch_github
> > git diff community/master...nmisch_github/mxt4-rm-legacy
> 
> That's a nearly useless diff, because it includes a lot of other changes
> (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
> you published the changes.

Perhaps you used "git diff a..b", not "git diff a...b".  If not, please send
the outputs of "git rev-parse community/master nmisch_github/mxt4-rm-legacy"
and "git --version".

> > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> > -   /*
> > -* Computing the actual limits is only possible once the data directory 
> > is
> > -* in a consistent state. There's no need to compute the limits while
> > -* still replaying WAL - no decisions about new multis are made even
> > -* though multixact creations might be replayed. So we'll only do 
> > further
> > -* checks after TrimMultiXact() has been called.
> > -*/
> > +   /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
> > if (!MultiXactState->finishedStartup)
> > return;
> > -
> > Assert(!InRecovery);
> > 
> > -   /* Set limits for offset vacuum. */
> > +   /*
> > +* Setting MultiXactState->oldestOffset entails a find_multixact_start()
> > +* call, which is only possible once the data directory is in a 
> > consistent
> > +* state. There's no need for an offset limit while still replaying WAL;
> > +* no decisions about new multis are made even though multixact 
> > creations
> > +* might be replayed.
> > +*/
> > needs_offset_vacuum = SetOffsetVacuumLimit();
> 
> I don't really see the benefit of this change. The previous location of
> the comment is where we return early, so it seems appropriate to
> document the reason there?

I made that low-importance change for two reasons.  First, returning at that
point skips more than just the setting a limit; it also skips autovacuum
signalling and wraparound warnings.  Second, the function has just computed
mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we
defer an offset limit, not any and all limits.

> >  static bool
> >  SetOffsetVacuumLimit(void)
> >  {
> > -   MultiXactId oldestMultiXactId;
> > +   MultiXactId oldestMultiXactId;
> > MultiXactId nextMXact;
> > -   MultiXactOffset oldestOffset = 0;   /* placate compiler */
> > -   MultiXactOffset prevOldestOffset;
> > +   MultiXactOffset oldestOffset = 0;   /* placate compiler */
> > MultiXactOffset nextOffset;
> > boololdestOffsetKnown = false;
> > +   MultiXactOffset prevOldestOffset;
> > boolprevOldestOffsetKnown;
> > -   MultiXactOffset offsetStopLimit = 0;
> 
> I don't see the benefit of the order changes here.

I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
reverted when I refinished that commit as branch mxt3-main.

> > -   if (oldestOffsetKnown)
> > -   ereport(DEBUG1,
> > -   (errmsg("oldest MultiXactId member is 
> > at offset %u",
> > -   oldestOffset)));
> 
> That's imo a rather useful debug message.

The branches emit that message at the same times 4f627f8^ and earlier emit it.

> > -   else
> > +   if (!oldestOffsetKnown)
> > +   {
> > +   /* XXX This message is incorrect if 
> > prevOldestOffsetKnown. */
> > ereport(LOG,
> > (errmsg("MultiXact member wraparound 
> > protections are disabled because oldest checkpointed MultiXact %u does not 
> > exist on disk",
> > oldestMultiXactId)));
> > +   }
> > }
> 
> Hm, the XXX is a "problem" in all 9.3+ - should we just fix it everywhere?

I welcome a project to fix it.

> > LWLockRelease(MultiXactTruncationLock);
> > 
> > /*
> > -* If we can, compute limits (and install them MultiXactState) to 
> > prevent
> > -* overrun of old data in the members SLRU area. We can only do so if 
> > the
> > -* oldest offset is known though.
> > +* There's no need to update anything if we don't know the oldest offset
> > +* or if it hasn't changed.
> >  */
> 
> Is that really a worthwhile optimization?

I would ne

Re: [HACKERS] Rework the way multixact truncations work

2015-12-02 Thread Andres Freund
On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> On Tue, Dec 01, 2015 at 05:07:15PM -0500, Robert Haas wrote:
> > I think it's weird to back a commit out only to put a bunch of very similar
> > stuff back in.
>
> I agree with that.  If the original patches and their replacements shared 95%
> of diff lines in common, we wouldn't be having this conversation. These
> replacements redo closer to 50% of the lines, so the patches are not very
> similar by verbatim line comparison.

Which is a huge problem, because it makes it very hard to see what your
changes actually do. And a significant portion of the changes relative
to master aren't particularly critical. Which is easy to see if if a
commit only changes comments, but harder if you see one commit reverting
things, and another redoing most of the same things.

> Hackers have been too reticent to revert and redo defect-laden
> commits.  If doing that is weird today, let it be normal.

Why? Especially if reverting and redoing includes conflicts that mainly
increases the chance of accidental bugs.

> git remote add community git://git.postgresql.org/git/postgresql.git
> git remote add nmisch_github https://github.com/nmisch/postgresql.git
> git fetch --multiple community nmisch_github
> git diff community/master...nmisch_github/mxt4-rm-legacy

That's a nearly useless diff, because it includes a lot of other changes
(218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
you published the changes. What kinda works is
git diff $(git merge-base community/master 
nmisch_github/mxt4-rm-legacy)..nmisch_github/mxt4-rm-legacy
which shows the diff to the version of master you start off from.

Review of the above diff:
> @@ -2013,7 +2017,7 @@ TrimMultiXact(void)
>  {
>   MultiXactId nextMXact;
>   MultiXactOffset offset;
> - MultiXactId oldestMXact;
> + MultiXactId oldestMXact;

That's a bit weird, given that nextMXact isn't indented...


> @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> - /*
> -  * Computing the actual limits is only possible once the data directory 
> is
> -  * in a consistent state. There's no need to compute the limits while
> -  * still replaying WAL - no decisions about new multis are made even
> -  * though multixact creations might be replayed. So we'll only do 
> further
> -  * checks after TrimMultiXact() has been called.
> -  */
> + /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
>   if (!MultiXactState->finishedStartup)
>   return;
> -
>   Assert(!InRecovery);
> 
> - /* Set limits for offset vacuum. */
> + /*
> +  * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> +  * call, which is only possible once the data directory is in a 
> consistent
> +  * state. There's no need for an offset limit while still replaying WAL;
> +  * no decisions about new multis are made even though multixact 
> creations
> +  * might be replayed.
> +  */
>   needs_offset_vacuum = SetOffsetVacuumLimit();

I don't really see the benefit of this change. The previous location of
the comment is where we return early, so it seems appropriate to
document the reason there?

>   /*
> @@ -2354,6 +2356,12 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
>   debug_elog3(DEBUG2, "MultiXact: setting next multi to %u", 
> minMulti);
>   MultiXactState->nextMXact = minMulti;
>   }
> +
> + /*
> +  * MultiXactOffsetPrecedes() gives the wrong answer if nextOffset would
> +  * advance more than 2^31 between calls.  Since we get a call for each
> +  * XLOG_MULTIXACT_CREATE_ID, that should never happen.
> +  */

Independent comment improvement. Good idea though.


>  /*
> - * Update our oldestMultiXactId value, but only if it's more recent than what
> - * we had.
> - *
> - * This may only be called during WAL replay.
> + * Update our oldestMultiXactId value, but only if it's more recent than
> + * what we had.  This may only be called during WAL replay.
>   */

Whatever?

>  void
>  MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB)
> @@ -2544,14 +2550,13 @@ GetOldestMultiXactId(void)
>  static bool
>  SetOffsetVacuumLimit(void)
>  {
> - MultiXactId oldestMultiXactId;
> + MultiXactId oldestMultiXactId;
>   MultiXactId nextMXact;
> - MultiXactOffset oldestOffset = 0;   /* placate compiler */
> - MultiXactOffset prevOldestOffset;
> + MultiXactOffset oldestOffset = 0;   /* placate compiler */
>   MultiXactOffset nextOffset;
>   boololdestOffsetKnown = false;
> + MultiXactOffset prevOldestOffset;
>   boolprevOldestOffsetKnown;
> - MultiXactOffset offsetStopLimit = 0;

I don't see the benefit of the order changes here.

> @@ -2588,40 +2590,50 @@ SetOffsetVacuumLimit(void)
>   else
>   {
>   /*
> -  * Figure out where 

Re: [HACKERS] Rework the way multixact truncations work

2015-12-02 Thread Noah Misch
On Tue, Dec 01, 2015 at 05:07:15PM -0500, Robert Haas wrote:
> I think it's weird to back a commit out only to put a bunch of very similar
> stuff back in.

I agree with that.  If the original patches and their replacements shared 95%
of diff lines in common, we wouldn't be having this conversation.  These
replacements redo closer to 50% of the lines, so the patches are not very
similar by verbatim line comparison.  Even so, let's stipulate that my
proposal is weird.  I'd rather be weird than lose one of the benefits I
articulated upthread, let alone all of them.

My post-commit review of RLS woke me up to the problems of gradually finishing
work in-tree.  By the time I started that review in 2015-06, the base RLS
feature already spanned twenty-two commits.  (That count has since more than
doubled.)  Reviewing each commit threatened to be wasteful, because I would
presumably find matters already fixed later.  I tried to cherry-pick the
twenty-two commits onto a branch, hoping to review the overall diff as "git
diff master...squash-rls", but that yielded complex merge conflicts.  Where
the conflicts were too much, I reviewed entire files instead.  (Granted, no
matter how this thread ends, I do not expect an outcome that opaque.)  Hackers
have been too reticent to revert and redo defect-laden commits.  If doing that
is weird today, let it be normal.

> Even figuring out what you've actually changed here seems
> rather hard.  I couldn't get github to give me a diff showing your
> changes vs. master.

If you, an expert in the 2015-09-26 commits, want to study the key changes I
made, I recommend perusing these outputs:

git remote add nmisch_github https://github.com/nmisch/postgresql.git
git fetch nmisch_github
git diff nmisch_github/mxt0-revert nmisch_github/mxt1-disk-independent
git log -p --reverse --no-merges 
nmisch_github/mxt2-cosmetic..nmisch_github/mxt3-main

For the overall diff vs. master that you sought:

git remote add community git://git.postgresql.org/git/postgresql.git
git remote add nmisch_github https://github.com/nmisch/postgresql.git
git fetch --multiple community nmisch_github
git diff community/master...nmisch_github/mxt4-rm-legacy

If anyone not an author or reviewer of the 2015-09-26 commits wishes to review
the work, don't read the above diffs; the intermediate states are
uninteresting.  Read these four:

git remote add nmisch_github https://github.com/nmisch/postgresql.git
git fetch nmisch_github
git diff nmisch_github/mxt0-revert nmisch_github/mxt1-disk-independent
git diff nmisch_github/mxt1-disk-independent nmisch_github/mxt2-cosmetic
git diff nmisch_github/mxt2-cosmetic nmisch_github/mxt3-main
git diff nmisch_github/mxt3-main nmisch_github/mxt4-rm-legacy

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-01 Thread Peter Geoghegan
On Tue, Dec 1, 2015 at 2:07 PM, Robert Haas  wrote:
> Hmm.  I read Peter's message as agreeing with Andres rather than with
> you.  And I have to say I agree with Andres as well.  I think it's
> weird to back a commit out only to put a bunch of very similar stuff
> back in.

Your interpretation was correct. I think it's surprising to structure
things this way, especially since we haven't done things this way in
the past.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-01 Thread Robert Haas
On Fri, Nov 27, 2015 at 5:16 PM, Noah Misch  wrote:
> On Mon, Nov 23, 2015 at 11:44:45AM -0800, Peter Geoghegan wrote:
>> On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch  wrote:
>> >> I'm not following along right now - in order to make cleanups the plan is 
>> >> to revert a couple commits and then redo them prettyfied?
>> >
>> > Yes, essentially.  Given the volume of updates, this seemed neater than
>> > framing those updates as in-tree incremental development.
>>
>> I think that's an odd way of representing this work. I tend to
>> remember roughly when major things were committed even years later. An
>> outright revert should represent a total back out of the original
>> commit IMV. Otherwise, a git blame can be quite misleading.
>
> I think you're saying that "clearer git blame" is a more-important reason than
> "volume of updates" for preferring an outright revert over in-tree incremental
> development.  Fair preference.  If that's a correct reading of your message,
> then we do agree on the bottom line.

Hmm.  I read Peter's message as agreeing with Andres rather than with
you.  And I have to say I agree with Andres as well.  I think it's
weird to back a commit out only to put a bunch of very similar stuff
back in.  Even figuring out what you've actually changed here seems
rather hard.  I couldn't get github to give me a diff showing your
changes vs. master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-11-27 Thread Noah Misch
On Mon, Nov 23, 2015 at 11:44:45AM -0800, Peter Geoghegan wrote:
> On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch  wrote:
> >> I'm not following along right now - in order to make cleanups the plan is 
> >> to revert a couple commits and then redo them prettyfied?
> >
> > Yes, essentially.  Given the volume of updates, this seemed neater than
> > framing those updates as in-tree incremental development.
> 
> I think that's an odd way of representing this work. I tend to
> remember roughly when major things were committed even years later. An
> outright revert should represent a total back out of the original
> commit IMV. Otherwise, a git blame can be quite misleading.

I think you're saying that "clearer git blame" is a more-important reason than
"volume of updates" for preferring an outright revert over in-tree incremental
development.  Fair preference.  If that's a correct reading of your message,
then we do agree on the bottom line.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-11-23 Thread Peter Geoghegan
On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch  wrote:
>> I'm not following along right now - in order to make cleanups the plan is to 
>> revert a couple commits and then redo them prettyfied?
>
> Yes, essentially.  Given the volume of updates, this seemed neater than
> framing those updates as in-tree incremental development.

I think that's an odd way of representing this work. I tend to
remember roughly when major things were committed even years later. An
outright revert should represent a total back out of the original
commit IMV. Otherwise, a git blame can be quite misleading. I can
imagine questioning my recollection, even when it is accurate, if only
because I don't expect this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-11-16 Thread Noah Misch
On Sun, Nov 08, 2015 at 11:59:33AM -0800, Andres Freund wrote:
> On November 8, 2015 11:52:05 AM PST, Noah Misch  wrote:
> >On Sun, Nov 08, 2015 at 11:11:42AM -0800, Andres Freund wrote:
> >> On November 8, 2015 12:54:07 AM PST, Noah Misch  wrote:
> >> 
> >> >I have pushed a stack of branches to
> >> >https://github.com/nmisch/postgresql.git:
> >> >
> >> >mxt0-revert - reverts commits 4f627f8 and aa29c1c
> >> >mxt1-disk-independent - see below
> >> >mxt2-cosmetic - update already-wrong comments and formatting
> >> >mxt3-main - replaces commit 4f627f8
> >> >mxt4-rm-legacy - replaces commit aa29c1c
> >> >
> >> >The plan is to squash each branch into one PostgreSQL commit.  In
> >> >addition to
> >> >examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
> >> >reviewing
> >> >itemized changes and commit log entries in "git log -p --reverse
> >> >--no-merges
> >> >mxt2-cosmetic..mxt3-main".  In particular, when a change involves
> >> >something
> >> >you discussed upthread or was otherwise not obvious, I put a
> >statement
> >> >of
> >> >rationale in the commit log.
> >> 
> >> I'm not following along right now - in order to make cleanups the
> >plan is to revert a couple commits and then redo them prettyfied?
> >
> >Yes, essentially.  Given the volume of updates, this seemed neater than
> >framing those updates as in-tree incremental development.
> 
> I don't like that plan. I don't have a problem doing that in some development 
> branch somewhere, but I fail to see any benefit doing that in 9.5/master. 
> It'll just make the history more convoluted for no benefit.
> 
> I'll obviously still review the changes.

Cleanliness of history is precisely why I did it this way.  If I had framed
the changes as in-tree incremental development, no one "git diff" command
would show the truncation rework or a coherent subset.  To review the whole,
students of this code might resort to a cherry-pick of the repair commits onto
aa29c1c.  That, too, proves dissatisfying; the history would nowhere carry a
finished version of legacy truncation support.  A hacker opting to back-patch
in the future, as commit 4f627f8 contemplated, would need to dig through this
thread for the bits added in mxt3-main and removed in mxt4-rm-legacy.

The benefits may become clearer as you continue to review the branches.

> Even for review it's nor particularly convenient, because now the entirety of 
> the large changes essentially needs to be reviewed anew, given they're not 
> the same. 

Agreed; I optimized for future readers, and I don't doubt this is less
convenient for you and for others already familiar with commits 4f627f8 and
aa29c1c.  I published branches, not squashed patches, mostly because I think
the individual branch commits will facilitate your study of the changes.  I
admit the cost to you remains high.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-11-08 Thread Andres Freund
On November 8, 2015 11:52:05 AM PST, Noah Misch  wrote:
>On Sun, Nov 08, 2015 at 11:11:42AM -0800, Andres Freund wrote:
>> On November 8, 2015 12:54:07 AM PST, Noah Misch 
>wrote:
>> 
>> >I have pushed a stack of branches to
>> >https://github.com/nmisch/postgresql.git:
>> >
>> >mxt0-revert - reverts commits 4f627f8 and aa29c1c
>> >mxt1-disk-independent - see below
>> >mxt2-cosmetic - update already-wrong comments and formatting
>> >mxt3-main - replaces commit 4f627f8
>> >mxt4-rm-legacy - replaces commit aa29c1c
>> >
>> >The plan is to squash each branch into one PostgreSQL commit.  In
>> >addition to
>> >examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
>> >reviewing
>> >itemized changes and commit log entries in "git log -p --reverse
>> >--no-merges
>> >mxt2-cosmetic..mxt3-main".  In particular, when a change involves
>> >something
>> >you discussed upthread or was otherwise not obvious, I put a
>statement
>> >of
>> >rationale in the commit log.
>> 
>> I'm not following along right now - in order to make cleanups the
>plan is to revert a couple commits and then redo them prettyfied?
>
>Yes, essentially.  Given the volume of updates, this seemed neater than
>framing those updates as in-tree incremental development.

I don't like that plan. I don't have a problem doing that in some development 
branch somewhere, but I fail to see any benefit doing that in 9.5/master. It'll 
just make the history more convoluted for no benefit.

I'll obviously still review the changes.

Even for review it's nor particularly convenient, because now the entirety of 
the large changes essentially needs to be reviewed anew, given they're not the 
same. 

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-11-08 Thread Noah Misch
On Sun, Nov 08, 2015 at 11:11:42AM -0800, Andres Freund wrote:
> On November 8, 2015 12:54:07 AM PST, Noah Misch  wrote:
> 
> >I have pushed a stack of branches to
> >https://github.com/nmisch/postgresql.git:
> >
> >mxt0-revert - reverts commits 4f627f8 and aa29c1c
> >mxt1-disk-independent - see below
> >mxt2-cosmetic - update already-wrong comments and formatting
> >mxt3-main - replaces commit 4f627f8
> >mxt4-rm-legacy - replaces commit aa29c1c
> >
> >The plan is to squash each branch into one PostgreSQL commit.  In
> >addition to
> >examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
> >reviewing
> >itemized changes and commit log entries in "git log -p --reverse
> >--no-merges
> >mxt2-cosmetic..mxt3-main".  In particular, when a change involves
> >something
> >you discussed upthread or was otherwise not obvious, I put a statement
> >of
> >rationale in the commit log.
> 
> I'm not following along right now - in order to make cleanups the plan is to 
> revert a couple commits and then redo them prettyfied?

Yes, essentially.  Given the volume of updates, this seemed neater than
framing those updates as in-tree incremental development.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-11-08 Thread Andres Freund
On November 8, 2015 12:54:07 AM PST, Noah Misch  wrote:

>I have pushed a stack of branches to
>https://github.com/nmisch/postgresql.git:
>
>mxt0-revert - reverts commits 4f627f8 and aa29c1c
>mxt1-disk-independent - see below
>mxt2-cosmetic - update already-wrong comments and formatting
>mxt3-main - replaces commit 4f627f8
>mxt4-rm-legacy - replaces commit aa29c1c
>
>The plan is to squash each branch into one PostgreSQL commit.  In
>addition to
>examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
>reviewing
>itemized changes and commit log entries in "git log -p --reverse
>--no-merges
>mxt2-cosmetic..mxt3-main".  In particular, when a change involves
>something
>you discussed upthread or was otherwise not obvious, I put a statement
>of
>rationale in the commit log.

I'm not following along right now - in order to make cleanups the plan is to 
revert a couple commits and then redo them prettyfied?

Andres
Hi
--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-11-08 Thread Noah Misch
On Thu, Oct 29, 2015 at 08:46:52AM +0100, Andres Freund wrote:
> On October 29, 2015 7:59:03 AM GMT+01:00, Noah Misch  
> wrote:
> >That helps; thanks.  Your design seems good.  I've located only insipid
> >defects.
> 
> Great!
> 
> > I propose to save some time by writing a patch series
> >eliminating
> >them, which you could hopefully review.  Does that sound good?
> 
> Yes, it does.

I have pushed a stack of branches to https://github.com/nmisch/postgresql.git:

mxt0-revert - reverts commits 4f627f8 and aa29c1c
mxt1-disk-independent - see below
mxt2-cosmetic - update already-wrong comments and formatting
mxt3-main - replaces commit 4f627f8
mxt4-rm-legacy - replaces commit aa29c1c

The plan is to squash each branch into one PostgreSQL commit.  In addition to
examining overall "git diff mxt2-cosmetic mxt3-main", I recommend reviewing
itemized changes and commit log entries in "git log -p --reverse --no-merges
mxt2-cosmetic..mxt3-main".  In particular, when a change involves something
you discussed upthread or was otherwise not obvious, I put a statement of
rationale in the commit log.

> +  * Make sure to only attempt truncation if there's values to truncate
> +  * away. In normal processing values shouldn't go backwards, but there's
> +  * some corner cases (due to bugs) where that's possible.

Which bugs are those?  I would like to include more detail if available.


If anything here requires careful study, it's the small mxt1-disk-independent
change, which I have also attached in patch form.  That patch removes the
SlruScanDirCbFindEarliest() test from TruncateMultiXact(), which in turn makes
multixact control decisions independent of whether TruncateMultiXact() is
successful at unlinking segments.  Today, one undeletable segment file can
cause TruncateMultiXact() to give up on truncation completely for a span of
hundreds of millions of MultiXactId.  Patched multixact.c will, like CLOG,
make its decisions strictly based on the content of files it expects to exist.
It will no longer depend on the absence of files it hopes will not exist.

To aid in explaining the change's effects, I will define some terms.  A
"logical wrap" occurs when no range of 2^31 integers covers the set of
MultiXactId stored in xmax fields.  A "file-level wrap" occurs when there
exists a pair of pg_multixact/offsets segment files such that:

  | segno_a * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE - 
segno_b * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE | > 2^31

A logical wrap implies either a file-level wrap or missing visibility
metadata, but a file-level wrap does not imply other consequences.  The
SlruScanDirCbFindEarliest() test is usually redundant with
find_multixact_start(), because MultiXactIdPrecedes(oldestMXact, earliest)
almost implies that find_multixact_start() will fail.  The exception arises
when pg_multixact/offsets files compose a file-level wrap, which can happen
when TruncateMultiXact() fails to unlink segments as planned.  When it does
happen, the result of SlruScanDirCbFindEarliest(), and therefore the computed
"earliest" value, is undefined.  (This outcome is connected to our requirement
to use only half the pg_clog or pg_multixact/offsets address space at any one
time.  The PagePrecedes callbacks for these SLRUs cease to be transitive if
more than half their address space is in use.)

The SlruScanDirCbFindEarliest() test can be helpful when a file-level wrap
coexists with incorrect oldestMultiXactId (extant xmax values define "correct
oldestMultiXactId").  If we're lucky with readdir() order, the test will block
truncation so we don't delete still-needed segments.  I am content to lose
that, because (a) the code isn't reliable for (or even directed toward) that
purpose and (b) sites running on today's latest point releases no longer have
incorrect oldestMultiXactId.
diff --git a/src/backend/access/transam/multixact.c 
b/src/backend/access/transam/multixact.c
index cb12fc3..cb4a0cd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2945,29 +2945,6 @@ SlruScanDirCbRemoveMembers(SlruCtl ctl, char *filename, 
int segpage,
return false;   /* keep going */
 }
 
-typedef struct mxtruncinfo
-{
-   int earliestExistingPage;
-} mxtruncinfo;
-
-/*
- * SlruScanDirectory callback
- * This callback determines the earliest existing page number.
- */
-static bool
-SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
-{
-   mxtruncinfo *trunc = (mxtruncinfo *) data;
-
-   if (trunc->earliestExistingPage == -1 ||
-   ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
-   {
-   trunc->earliestExistingPage = segpage;
-   }
-
-   return false;   /* keep going */
-}
-
 /*
  * Remove all MultiXactOffset and MultiXactMember segments before the oldest
  * ones still of interest.
@@ -

Re: [HACKERS] Rework the way multixact truncations work

2015-10-29 Thread Andres Freund
Hi,

On October 29, 2015 7:59:03 AM GMT+01:00, Noah Misch  wrote:
>On Tue, Oct 27, 2015 at 03:44:10PM +0100, Andres Freund wrote:
>> On 2015-10-24 22:07:00 -0400, Noah Misch wrote:
>> > On Tue, Sep 22, 2015 at 07:57:27PM +0200, Andres Freund wrote:
>> > > On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
>> > > > - If SlruDeleteSegment fails in unlink(), shouldn't we at the
>very
>> > > > least log a message?  If that file is still there when we loop
>back
>> > > > around, it's going to cause a failure, I think.
>> > > 
>> > > The existing unlink() call doesn't, that's the only reason I
>didn't add
>> > > a message there. I'm fine with adding a (LOG or WARNING?)
>message.
>> 
>> Note that I didn't add the warning after all, as it'd be too noisy
>> during repeated replay, as all the files would already be gone. We
>could
>> only emit it when the error is not ENOFILE, if people prefer that.
>> 
>> 
>> > Unlinking old pg_clog files is strictly an optimization.  If you
>were to
>> > comment out every unlink() call in slru.c, the only ill effect on
>CLOG is the
>> > waste of disk space.  Is the same true of MultiXact?
>> 
>> Well, multixacts are a lot larger than the other SLRUs, I think that
>> makes some sort of difference.
>
>That helps; thanks.  Your design seems good.  I've located only insipid
>defects.

Great!

> I propose to save some time by writing a patch series
>eliminating
>them, which you could hopefully review.  Does that sound good?

Yes, it does.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-10-29 Thread Noah Misch
On Tue, Oct 27, 2015 at 03:44:10PM +0100, Andres Freund wrote:
> On 2015-10-24 22:07:00 -0400, Noah Misch wrote:
> > On Tue, Sep 22, 2015 at 07:57:27PM +0200, Andres Freund wrote:
> > > On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
> > > > - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> > > > least log a message?  If that file is still there when we loop back
> > > > around, it's going to cause a failure, I think.
> > > 
> > > The existing unlink() call doesn't, that's the only reason I didn't add
> > > a message there. I'm fine with adding a (LOG or WARNING?) message.
> 
> Note that I didn't add the warning after all, as it'd be too noisy
> during repeated replay, as all the files would already be gone. We could
> only emit it when the error is not ENOFILE, if people prefer that.
> 
> 
> > Unlinking old pg_clog files is strictly an optimization.  If you were to
> > comment out every unlink() call in slru.c, the only ill effect on CLOG is 
> > the
> > waste of disk space.  Is the same true of MultiXact?
> 
> Well, multixacts are a lot larger than the other SLRUs, I think that
> makes some sort of difference.

That helps; thanks.  Your design seems good.  I've located only insipid
defects.  I propose to save some time by writing a patch series eliminating
them, which you could hopefully review.  Does that sound good?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-10-27 Thread Josh Berkus
On 10/27/2015 07:44 AM, Andres Freund wrote:
>> Unlinking old pg_clog files is strictly an optimization.  If you were to
>> > comment out every unlink() call in slru.c, the only ill effect on CLOG is 
>> > the
>> > waste of disk space.  Is the same true of MultiXact?
> Well, multixacts are a lot larger than the other SLRUs, I think that
> makes some sort of difference.

And by "a lot larger" we're talking like 50X to 100X.  I regularly see
pg_multixact directories larger than 1GB.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-10-27 Thread Andres Freund
Hi,

On 2015-10-24 22:07:00 -0400, Noah Misch wrote:
> I'm several days into a review of this change (commits 4f627f8 and
> aa29c1c).

Cool!

> There's one part of the design I want to understand before commenting on
> specific code.  What did you anticipate to be the consequences of failing to
> remove SLRU segment files that MultiXactState->oldestMultiXactId implies we
> should have removed?  I ask because, on the one hand, I see code making
> substantial efforts to ensure that it truncates exactly as planned:

> [portion of TruncateMultiXact]

The reason we can't have checkpoints there, is that checkpoints records
multixact values in the checkpoint record. If we crash-restart before
the truncation has finished we can end up in the situation that
->oldestMultiXactId doesn't exist. Which will trigger a round of
emergency vacuum at the next startup, not something that should happen
due to a concurrency problem.

We could instead update the in-memory values first, but that could lead
to other problems.


So the critical section/delaying of checkpoints is more about having the
on-disk agreeing with the status data in the checkpoint/control file.


> On Tue, Sep 22, 2015 at 07:57:27PM +0200, Andres Freund wrote:
> > On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
> > > - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> > > least log a message?  If that file is still there when we loop back
> > > around, it's going to cause a failure, I think.
> > 
> > The existing unlink() call doesn't, that's the only reason I didn't add
> > a message there. I'm fine with adding a (LOG or WARNING?) message.

Note that I didn't add the warning after all, as it'd be too noisy
during repeated replay, as all the files would already be gone. We could
only emit it when the error is not ENOFILE, if people prefer that.


> Unlinking old pg_clog files is strictly an optimization.  If you were to
> comment out every unlink() call in slru.c, the only ill effect on CLOG is the
> waste of disk space.  Is the same true of MultiXact?

Well, multixacts are a lot larger than the other SLRUs, I think that
makes some sort of difference.


Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-10-24 Thread Noah Misch
I'm several days into a review of this change (commits 4f627f8 and aa29c1c).
There's one part of the design I want to understand before commenting on
specific code.  What did you anticipate to be the consequences of failing to
remove SLRU segment files that MultiXactState->oldestMultiXactId implies we
should have removed?  I ask because, on the one hand, I see code making
substantial efforts to ensure that it truncates exactly as planned:

/*
 * Do truncation, and the WAL logging of the truncation, in a critical
 * section. That way offsets/members cannot get out of sync anymore, 
i.e.
 * once consistent the newOldestMulti will always exist in members, even
 * if we crashed in the wrong moment.
 */
START_CRIT_SECTION();

/*
 * Prevent checkpoints from being scheduled concurrently. This is 
critical
 * because otherwise a truncation record might not be replayed after a
 * crash/basebackup, even though the state of the data directory would
 * require it.
 */
Assert(!MyPgXact->delayChkpt);
MyPgXact->delayChkpt = true;
...
/*
 * Update in-memory limits before performing the truncation, while 
inside
 * the critical section: Have to do it before truncation, to prevent
 * concurrent lookups of those values. Has to be inside the critical
 * section as otherwise a future call to this function would error out,
 * while looking up the oldest member in offsets, if our caller crashes
 * before updating the limits.
 */

On the other hand, TruncateMultiXact() callees ignore unlink() return values:

On Tue, Sep 22, 2015 at 07:57:27PM +0200, Andres Freund wrote:
> On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
> > - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> > least log a message?  If that file is still there when we loop back
> > around, it's going to cause a failure, I think.
> 
> The existing unlink() call doesn't, that's the only reason I didn't add
> a message there. I'm fine with adding a (LOG or WARNING?) message.

Unlinking old pg_clog files is strictly an optimization.  If you were to
comment out every unlink() call in slru.c, the only ill effect on CLOG is the
waste of disk space.  Is the same true of MultiXact?

If there's anyplace where failure to unlink would cause a malfunction, I think
it would be around the use of SlruScanDirCbFindEarliest().  That function's
result becomes undefined if the range of pg_multixact/offsets segment files
present on disk spans more than about INT_MAX/2 MultiXactId.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-29 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Sep 28, 2015 at 10:48 PM, Jim Nasby  wrote:
> > Maybe I'm confused, but I thought the whole purpose of this was to get rid
> > of the risk associated with that calculation in favor of explicit truncation
> > boundaries in the WAL log.
> 
> Yes.  But if the master hasn't been updated yet, then we still need to
> do something based on a calculation.

Right.

> > Even if that's not the case, ISTM that being big and in your face about a
> > potential data corruption bug is a good thing, as long as the DBA has a way
> > to "hit the snooze button".
> 
> Panicking the standby because the master hasn't been updated does not
> seem like a good thing to me in any way.

If we had a way to force the master to upgrade, I think it would be good
because we have a mechanism to get rid of the legacy truncation code;
but as I said several messages ago this doesn't actually work which is
why I dropped the idea of panicking.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-29 Thread Joel Jacobson
On Tue, Sep 22, 2015 at 3:20 PM, Andres Freund  wrote:
> What I've tested is the following:
> * continous burning of multis, both triggered via members and offsets
> * a standby keeping up when the primary is old
> * a standby keeping up when the primary is new
> * basebackups made while a new primary is under load
> * verified that we properly PANIC when a truncation record is replayed
>   in an old standby.

Are these test scripts available somewhere?
I understand they might be undocumented and perhaps tricky to set it all up,
but I would be very interested in them anyway,
think you could push them somewhere?

Thanks a lot for working on this!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-29 Thread Robert Haas
On Mon, Sep 28, 2015 at 10:48 PM, Jim Nasby  wrote:
> Maybe I'm confused, but I thought the whole purpose of this was to get rid
> of the risk associated with that calculation in favor of explicit truncation
> boundaries in the WAL log.

Yes.  But if the master hasn't been updated yet, then we still need to
do something based on a calculation.

> Even if that's not the case, ISTM that being big and in your face about a
> potential data corruption bug is a good thing, as long as the DBA has a way
> to "hit the snooze button".

Panicking the standby because the master hasn't been updated does not
seem like a good thing to me in any way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-29 Thread Andres Freund
On 2015-09-28 21:48:00 -0500, Jim Nasby wrote:
> On 9/28/15 8:49 PM, Robert Haas wrote:
> >If at some point we back-patch this further, then it potentially
> >becomes a live issue, but I would like to respectfully inquire what
> >exactly you think making it a PANIC would accomplish?  There are a lot
> >of scary things about this patch, but the logic for deciding whether
> >to perform a legacy truncation is solid as far as I know.
> 
> Maybe I'm confused, but I thought the whole purpose of this was to get rid
> of the risk associated with that calculation in favor of explicit truncation
> boundaries in the WAL log.

> Even if that's not the case, ISTM that being big and in your face about a
> potential data corruption bug is a good thing, as long as the DBA has a way
> to "hit the snooze button".

So we'd end up with a guc that everyone has to set while they
upgrade. That seems like a pointless hassle.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-28 Thread Jim Nasby

On 9/28/15 8:49 PM, Robert Haas wrote:

If at some point we back-patch this further, then it potentially
becomes a live issue, but I would like to respectfully inquire what
exactly you think making it a PANIC would accomplish?  There are a lot
of scary things about this patch, but the logic for deciding whether
to perform a legacy truncation is solid as far as I know.


Maybe I'm confused, but I thought the whole purpose of this was to get 
rid of the risk associated with that calculation in favor of explicit 
truncation boundaries in the WAL log.


Even if that's not the case, ISTM that being big and in your face about 
a potential data corruption bug is a good thing, as long as the DBA has 
a way to "hit the snooze button".


Either way, I'm not going to make a fuss over it.

Just to make sure we're on the same page; Alvaro's original comment was:

Honestly, I wonder whether this message
ereport(LOG,
(errmsg("performing legacy multixact 
truncation"),
 errdetail("Legacy truncations are sometimes 
performed when replaying WAL from an older primary."),
 errhint("Upgrade the primary, it is 
susceptible to data corruption.")));
shouldn't rather be a PANIC.  (The main reason not to, I think, is that
once you see this, there is no way to put the standby in a working state
without recloning).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-28 Thread Robert Haas
On Mon, Sep 28, 2015 at 5:47 PM, Jim Nasby  wrote:
> There was discussion about making this a PANIC instead of a LOG, which I
> think is a good idea... but then there'd need to be some way to not PANIC if
> you were doing an upgrade.

I think you're worrying about a non-problem.  This code has not been
back-patched prior to 9.5, and the legacy truncation code has been
removed in 9.5+.  So it's a complete non-issue right at the moment.
If at some point we back-patch this further, then it potentially
becomes a live issue, but I would like to respectfully inquire what
exactly you think making it a PANIC would accomplish?  There are a lot
of scary things about this patch, but the logic for deciding whether
to perform a legacy truncation is solid as far as I know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-28 Thread Jim Nasby

On 9/27/15 2:25 PM, Andres Freund wrote:

On 2015-09-27 14:21:08 -0500, Jim Nasby wrote:

IMHO doing just a log of something this serious; it should at least be a
WARNING.


In postgres LOG, somewhat confusingly, is more severe than WARNING.


Ahh, right. Which in this case stinks, because WARNING is a lot more 
attention grabbing than LOG. :/



I think the concern about upgrading a replica before the master is valid; is
there some way we could over-ride a PANIC when that's exactly what someone
is trying to do? Check for a special file maybe?


I don't understand this concern - that's just the situation we have in
all released branches today.


There was discussion about making this a PANIC instead of a LOG, which I 
think is a good idea... but then there'd need to be some way to not 
PANIC if you were doing an upgrade.



+   boolsawTruncationInCkptCycle;
What happens if someone downgrades the master, back to a version that no
longer logs truncation? (I don't think assuming that the replica will need
to restart if that happens is a safe bet...)


It'll just to do legacy truncation again - without a restart on the
standby required.


Oh, I thought once that was set it would stay set. NM.


-   if (MultiXactIdPrecedes(oldestMXact, earliest))
+   /* If there's nothing to remove, we can bail out early. */
+   if (MultiXactIdPrecedes(oldestMulti, earliest))
{
-   DetermineSafeOldestOffset(oldestMXact);
+   LWLockRelease(MultiXactTruncationLock);
If/when this is backpatched, would it be safer to just leave this alone?


What do you mean? This can't just isolated be left alone?


I thought removing DetermineSafeOldestOffset was just an optimization, 
but I guess I was confused.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-27 Thread Andres Freund
On 2015-09-27 14:21:08 -0500, Jim Nasby wrote:
> IMHO doing just a log of something this serious; it should at least be a
> WARNING.

In postgres LOG, somewhat confusingly, is more severe than WARNING.

> I think the concern about upgrading a replica before the master is valid; is
> there some way we could over-ride a PANIC when that's exactly what someone
> is trying to do? Check for a special file maybe?

I don't understand this concern - that's just the situation we have in
all released branches today.

> + boolsawTruncationInCkptCycle;
> What happens if someone downgrades the master, back to a version that no
> longer logs truncation? (I don't think assuming that the replica will need
> to restart if that happens is a safe bet...)

It'll just to do legacy truncation again - without a restart on the
standby required.

> - if (MultiXactIdPrecedes(oldestMXact, earliest))
> + /* If there's nothing to remove, we can bail out early. */
> + if (MultiXactIdPrecedes(oldestMulti, earliest))
>   {
> - DetermineSafeOldestOffset(oldestMXact);
> + LWLockRelease(MultiXactTruncationLock);
> If/when this is backpatched, would it be safer to just leave this alone?

What do you mean? This can't just isolated be left alone?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-27 Thread Jim Nasby

On 9/23/15 1:48 PM, Andres Freund wrote:

Honestly, I wonder whether this message
>ereport(LOG,
>(errmsg("performing legacy multixact 
truncation"),
> errdetail("Legacy truncations are sometimes 
performed when replaying WAL from an older primary."),
> errhint("Upgrade the primary, it is 
susceptible to data corruption.")));
>shouldn't rather be a PANIC.  (The main reason not to, I think, is that
>once you see this, there is no way to put the standby in a working state
>without recloning).

Huh? The behaviour in that case is still better than what we have in
9.3+ today (not delayed till the restartpoint). Don't see why that
should be a panic. That'd imo make it pretty much impossible to upgrade
a pair of primary/master where you normally upgrade the standby first?


IMHO doing just a log of something this serious; it should at least be a 
WARNING.


I think the concern about upgrading a replica before the master is 
valid; is there some way we could over-ride a PANIC when that's exactly 
what someone is trying to do? Check for a special file maybe?


+   boolsawTruncationInCkptCycle;
What happens if someone downgrades the master, back to a version that no 
longer logs truncation? (I don't think assuming that the replica will 
need to restart if that happens is a safe bet...)


-   if (MultiXactIdPrecedes(oldestMXact, earliest))
+   /* If there's nothing to remove, we can bail out early. */
+   if (MultiXactIdPrecedes(oldestMulti, earliest))
{
-   DetermineSafeOldestOffset(oldestMXact);
+   LWLockRelease(MultiXactTruncationLock);
If/when this is backpatched, would it be safer to just leave this alone?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-26 Thread Tom Lane
Andres Freund  writes:
> Should this get a release note entry given that we're not (at least
> immediately) backpatching this?

I'll probably put something in when I update the release notes for beta1
(next week sometime); no real need to deal with it individually.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-26 Thread Andres Freund
Hi,

I pushed this to 9.5 and master, committing the xlog page magic bump
separately. To avoid using a magic value from master in 9.5 I bumped the
numbers by two in both branches.

Should this get a release note entry given that we're not (at least
immediately) backpatching this?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-23 Thread Andres Freund
On 2015-09-23 15:57:02 -0300, Alvaro Herrera wrote:
> I think we need to make a decision here.  Is this a terribly serious
> bug/misdesign that needs addressing?

Imo yes. Not sure about terribly, but definitely serious. It's several
data loss bugs in one package.

> If so, we need to backpatch.  If not, then by all means lets leave it
> alone.  I don't think it is a good idea to leave it open if we think
> it's serious, which is what I think is happening.

Right, but I don't want to backpatch this over an objection, and it
doesn't seem like I have a chance to convince Robert that it'd be a good
idea. So it'll be 9.5+master for now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-23 Thread Alvaro Herrera
Andres Freund wrote:
> On 2015-09-23 15:03:05 -0300, Alvaro Herrera wrote:

> > Honestly, I wonder whether this message
> > ereport(LOG,
> > (errmsg("performing legacy multixact 
> > truncation"),
> >  errdetail("Legacy truncations are 
> > sometimes performed when replaying WAL from an older primary."),
> >  errhint("Upgrade the primary, it is 
> > susceptible to data corruption.")));
> > shouldn't rather be a PANIC.  (The main reason not to, I think, is that
> > once you see this, there is no way to put the standby in a working state
> > without recloning).
> 
> Huh? The behaviour in that case is still better than what we have in
> 9.3+ today (not delayed till the restartpoint). Don't see why that
> should be a panic. That'd imo make it pretty much impossible to upgrade
> a pair of primary/master where you normally upgrade the standby first?
> 
> This is all moot given Robert's objection to backpatching this to
> 9.3/4.

I think we need to make a decision here.  Is this a terribly serious
bug/misdesign that needs addressing?  If so, we need to backpatch.  If
not, then by all means lets leave it alone.  I don't think it is a good
idea to leave it open if we think it's serious, which is what I think is
happening.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-23 Thread Andres Freund
On 2015-09-23 15:03:05 -0300, Alvaro Herrera wrote:
> The comment on top of TrimMultiXact states that "no locks are needed
> here", but then goes on to grab a few locks.

Hm. Yea. Although that was the case before.

> It's a bit odd that SetMultiXactIdLimit has the "finishedStartup" test
> so low.  Why bother setting all those local variables only to bail
> out?

Hm. Doesn't seem to matter much to me, but I can change it.

> In MultiXactAdvanceOldest, the test for sawTruncationinCkptCycle seems
> reversed?
>   if (!MultiXactState->sawTruncationInCkptCycle)
> surely we should be doing truncation if it's set?

No, that's correct. If there was a checkpoint cycle where oldestMulti
advanced without seing a truncation record we need to perform a legacy
truncation.

> Honestly, I wonder whether this message
>   ereport(LOG,
>   (errmsg("performing legacy multixact 
> truncation"),
>errdetail("Legacy truncations are 
> sometimes performed when replaying WAL from an older primary."),
>errhint("Upgrade the primary, it is 
> susceptible to data corruption.")));
> shouldn't rather be a PANIC.  (The main reason not to, I think, is that
> once you see this, there is no way to put the standby in a working state
> without recloning).

Huh? The behaviour in that case is still better than what we have in
9.3+ today (not delayed till the restartpoint). Don't see why that
should be a panic. That'd imo make it pretty much impossible to upgrade
a pair of primary/master where you normally upgrade the standby first?

This is all moot given Robert's objection to backpatching this to
9.3/4.

> If the find_multixact_start(oldestMulti) call in TruncateMultiXact
> fails, what recourse does the user have?  I wonder if the elog() should
> be a FATAL instead of just LOG.  It's not like it would work on a
> subsequent run, is it?

It currently only LOGs, I don't want to change that. The cases where we
currently know it's possible to hit this, it should be fixed by the next
set of emergency autovacuums (which we trigger).

Thanks for the look,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-23 Thread Alvaro Herrera
The comment on top of TrimMultiXact states that "no locks are needed
here", but then goes on to grab a few locks.  I think we should remove
the comment, or rephrase it to state that we still grab them for
consistency or whatever; or perhaps even remove the lock acquisitions.
(I think the comment is still true: by the time TrimMultiXact runs,
we're out of recovery but not yet running, so it's not possible for
anyone to try to do anything multixact-related.)

I wonder if it would be cleaner to move the setting of finishedStartup
down to just before calling SetMultiXactIdLimit, instead of at the top
of the function.

It's a bit odd that SetMultiXactIdLimit has the "finishedStartup" test
so low.  Why bother setting all those local variables only to bail out?
I think it would make more sense to just do it at the top.  The only
thing you lose AFAICS is that elog(DEBUG1) message -- is that worth it?
Also, the fact that finishedStartup itself is read without a lock at
least merits a comment.

In MultiXactAdvanceOldest, the test for sawTruncationinCkptCycle seems
reversed?
if (!MultiXactState->sawTruncationInCkptCycle)
surely we should be doing truncation if it's set?

Honestly, I wonder whether this message
ereport(LOG,
(errmsg("performing legacy multixact 
truncation"),
 errdetail("Legacy truncations are 
sometimes performed when replaying WAL from an older primary."),
 errhint("Upgrade the primary, it is 
susceptible to data corruption.")));
shouldn't rather be a PANIC.  (The main reason not to, I think, is that
once you see this, there is no way to put the standby in a working state
without recloning).

I think the prevOldestOffsetKnown test in line 2667 ("if we failed to
get ...") is better expressed as an else-if of the previous "if" block.

I think the two "there are NO MultiXacts" cases in TruncateMultiXact
would benefit in readability from adding braces around the lone
statement (and moving the comment to the line prior).

If the find_multixact_start(oldestMulti) call in TruncateMultiXact
fails, what recourse does the user have?  I wonder if the elog() should
be a FATAL instead of just LOG.  It's not like it would work on a
subsequent run, is it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-23 Thread Andres Freund
On 2015-09-23 10:29:09 -0300, Alvaro Herrera wrote:
> >  /*
> > + * Delete an individual SLRU segment, identified by the segment number.
> > + */
> > +void
> > +SlruDeleteSegment(SlruCtl ctl, int segno)
> 
> Is it okay to change the ABI of SlruDeleteSegment?

I think so. Any previous user of the API is going to be currently broken
anyway due to the missing flushing of buffers.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-23 Thread Alvaro Herrera
> @@ -1210,8 +1211,14 @@ restart:;
>   (void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
>  }
>  
> -void
> -SlruDeleteSegment(SlruCtl ctl, char *filename)
> +/*
> + * Delete an individual SLRU segment, identified by the filename.
> + *
> + * NB: This does not touch the SLRU buffers themselves, callers have to 
> ensure
> + * they either can't yet contain anything, or have already been cleaned out.
> + */
> +static void
> +SlruInternalDeleteSegment(SlruCtl ctl, char *filename)
>  {
>   charpath[MAXPGPATH];
>  
> @@ -1222,6 +1229,64 @@ SlruDeleteSegment(SlruCtl ctl, char *filename)
>  }
>  
>  /*
> + * Delete an individual SLRU segment, identified by the segment number.
> + */
> +void
> +SlruDeleteSegment(SlruCtl ctl, int segno)

Is it okay to change the ABI of SlruDeleteSegment?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Andres Freund
On 2015-09-22 20:14:11 -0400, Robert Haas wrote:
> I'm not disagreeing with any of that.  I'm just disagreeing with you
> on the likelihood that we're going to make things better vs. making
> them worse.  But, really, I've said everything I have to say about
> this.  You have a commit bit.

I'm not going to push backpatch this to 9.3/9.4 without you being on
board. For that I think you're unfortunately too often right, and this
is too critical. But I'm also not going to develop an alternative
stopgap for those versions, since I have no clue how that'd end up being
better.

The only alternative proposal I have right now is to push this to
9.5/9.6 (squashed with a followup patch removing legacy truncations) and
then push the patch including legacy stuff to 9.3/4 after the next set
of releases.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Robert Haas
On Tue, Sep 22, 2015 at 7:45 PM, Andres Freund  wrote:
> On 2015-09-23 01:24:31 +0200, Andres Freund wrote:
>> I think we put at least three layers on bandaid on this issue since
>> 9.3.2, and each layer made things more complicated.
>
> 2a9b01928f193f529b885ac577051c4fd00bd427 - Cope with possible failure of the 
> oldest MultiXact to exist.
> 5bbac7ec1b5754043e073a45454e4c257512ce30 - Advance the stop point for 
> multixact offset creation only at checkpoint.
> 9a28c3752c89ec01fb8b28bb5904c6d547507fda - Have multixact be truncated by 
> checkpoint, not vacuum
> 215ac4ad6589e0f6a31cc4cd867aedba3cd42924 - Truncate pg_multixact/'s contents 
> during crash recovery
>
> At least these are closely related to the fact that truncation isn't WAL
> logged. There are more that are tangentially related. We (primarily me,
> writing the timewise first one) should have gone for a new WAL record
> from the start. We've discussed that in at least three of the threads
> around the above commits...

I'm not disagreeing with any of that.  I'm just disagreeing with you
on the likelihood that we're going to make things better vs. making
them worse.  But, really, I've said everything I have to say about
this.  You have a commit bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Andres Freund
On 2015-09-23 01:24:31 +0200, Andres Freund wrote:
> I think we put at least three layers on bandaid on this issue since
> 9.3.2, and each layer made things more complicated.

2a9b01928f193f529b885ac577051c4fd00bd427 - Cope with possible failure of the 
oldest MultiXact to exist.
5bbac7ec1b5754043e073a45454e4c257512ce30 - Advance the stop point for multixact 
offset creation only at checkpoint.
9a28c3752c89ec01fb8b28bb5904c6d547507fda - Have multixact be truncated by 
checkpoint, not vacuum
215ac4ad6589e0f6a31cc4cd867aedba3cd42924 - Truncate pg_multixact/'s contents 
during crash recovery

At least these are closely related to the fact that truncation isn't WAL
logged. There are more that are tangentially related. We (primarily me,
writing the timewise first one) should have gone for a new WAL record
from the start. We've discussed that in at least three of the threads
around the above commits...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Andres Freund
On 2015-09-22 14:52:49 -0400, Robert Haas wrote:
> 1. It would be possible to write a patch that included ONLY the
> changes needed to make that happen, and did nothing else.  It would be
> largely a subset of this.  If we want to change 9.3 and 9.4, I
> recommend we do that first, and then come back to the rest of this.

I think that patch would be pretty much what I wrote.

To be correct you basically have to:
1) Never skip a truncation on the standby. Otherwise there might have
   already have been wraparound and you read the completely wrong
   offset.
2) Always perform truncations on the standby exactly the same moment (in
   the replay stream) as on the primary. Otherwise there also can be a
   wraparound.
3) Never read anything from an SLRU from the data directory while
   inconsistent. In an inconsistent state we can read completely wrong
   data. A standby can be inconsistent in many situations, including
   crashes, restarts and fresh base backups.

To me these three together leave only the option to never read an SLRUs
contents on a standby.  That only leaves minor changes in the patch that
could be removed afaics. I mean we could leave in
DetermineSafeOldestOffset() but it'd be doing pretty much the same as
SetOffsetVacuumLimit().


I think we put at least three layers on bandaid on this issue since
9.3.2, and each layer made things more complicated. We primarily did so
because of the compatibility and complexity concerns. I think that was a
bad mistake. We should have done it mostly right back then, and we'd be
better of now. If we continue with bandaids on the back branches while
having a fixed 9.5+ with significantly different behaviour we'll have a
hellish time fixing things in the back branches. And introduce more bugs
than this might introduce.


> 2. I agree that what we're doing right now is wrong.  And I agree that
> this fixes a real problem. But it seems to me to be quite possible,
> even likely, that it will create other problems.

Possible. But I think those bugs will be just bugs and not more
fundamental architectural problems.

To be very clear. I'm scared of the idea of backpatching this. I'm more
scared of doing that myself. But even more I am scared of the current
state.


> For example, suppose that there are files in the data directory that
> precede oldestMultiXact. In the current approach, we'll remove those
> because they're not in the range we expect to be used.

Hm. For offsets/ we continue to use SimpleLruTruncate() for truncation,
which scans the directory, so I don't see a problem. For members/ we
won't - but neither do we really today, see
SlruScanDirCbRemoveMembers(). So I don't think there'll be a significant
difference?


> a leftover old file that doesn't get removed the first time through -
> for whatever reason - becomes a time bomb that will explode on the
> next wraparound.  I don't know that that will happen.

We should be able to deal with that, otherwise recovery is pretty
borked. It can be a problem for the 'recovery from wrong oldest multi'
case, but that's the same today.


> I will bet you a beer that there are other possible hazards neither of
> us is foreseeing right now.

Right. I'm not dismissing that. I just think it's much more likely to be
handleable problems than the set we have today. It's incredibly hard to
get an accurate mental model of the combined behaviour & state of
primary and standby today. Even if we three have that today, I'm pretty
sure we won't in half a year. And sure as hell nearly nobody else will
have one.


> >> - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> >> least log a message?  If that file is still there when we loop back
> >> around, it's going to cause a failure, I think.
> >
> > The existing unlink() call doesn't, that's the only reason I didn't add
> > a message there. I'm fine with adding a (LOG or WARNING?) message.
> 
> Cool.

Hm. When redoing a truncation during [crash] recovery that can cause a
host of spurious warnings if already done before. DEBUG1 to avoid
scaring users?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Robert Haas
On Tue, Sep 22, 2015 at 1:57 PM, Andres Freund  wrote:
> On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
>> Regarding 0003, I'm still very much not convinced that it's a good
>> idea to apply this to 9.3 and 9.4.  This patch changes the way we do
>> truncation in those older releases; instead of happening at a
>> restartpoint, it happens when oldestMultiXid advances.
>
> The primary reason for doing that is that doing it at restartpoints is
> simply *wrong*. Restartpoints aren't scheduled in sync with replay -
> which means that a restartpoint can (will actually) happen long long
> after the checkpoint from the primary has replayed.  Which means that by
> the time the restartpoint is performed it's actually not unlikely that
> we've already filled all slru segments. Which is bad if we then fail
> over/start up.

1. It would be possible to write a patch that included ONLY the
changes needed to make that happen, and did nothing else.  It would be
largely a subset of this.  If we want to change 9.3 and 9.4, I
recommend we do that first, and then come back to the rest of this.

2. I agree that what we're doing right now is wrong.  And I agree that
this fixes a real problem. But it seems to me to be quite possible,
even likely, that it will create other problems.

For example, suppose that there are files in the data directory that
precede oldestMultiXact. In the current approach, we'll remove those
because they're not in the range we expect to be used.  But in this
approach we no longer remove everything we think shouldn't be there.
We remove exactly the stuff we think should go away.  As a general
principle, that's clearly superior.  But in the back-branches, it
creates a risk: a leftover old file that doesn't get removed the first
time through - for whatever reason - becomes a time bomb that will
explode on the next wraparound.  I don't know that that will happen.
But I sure as heck don't know that won't happen with any combination
of the variously broken 9.3.X releases we've put out there.  Even if
you can prove that particular risk never materializes to your
satisfaction and mine, I will bet you a beer that there are other
possible hazards neither of us is foreseeing right now.

>> I realize that you disagree and will probably commit this to those
>> branches anyway. But I want it to be clear that I don't endorse that.
>
> I don't plan to commit/backpatch this over your objection.

I'm not in a position to demand that you take my advice, but I'm
telling you what I think as honestly as I know how.

To be clear, I am fully in favor of making these changes (without the
legacy truncation stuff) in 9.5 and master, bumping WAL page magic so
that we invalidate any 9.5 alpha standys.  I think it's a far more
solid approach than what we've got right now, and it clearly
eliminates a host of dangers.  In fact, I think it would be a pretty
stupid idea not to make these changes in those branches.  It would be
doubling down on a design we know can never be made robust.

But I do not have confidence that we can change 9.4 and especially 9.3
without knock-on consequences.  You may have that confidence.  I most
definitely do not.  My previous two rounds in the boxing ring with
this problem convinced me that (1) it's incredibly easy to break
things with well-intentioned changes in this area, (2) it's
practically impossible to foresee everything that might go wrong with
some screwy combination of versions, and (3) early 9.3.X releases are
in much worse shape than early 9.4.X releases, to the point where
guessing what any given variable is going to contain on 9.3.X is
essentially throwing darts at the wall.  That's an awfully challenging
environment in which to write a bullet-proof patch.

>> - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
>> least log a message?  If that file is still there when we loop back
>> around, it's going to cause a failure, I think.
>
> The existing unlink() call doesn't, that's the only reason I didn't add
> a message there. I'm fine with adding a (LOG or WARNING?) message.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Andres Freund
On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
> Regarding 0003, I'm still very much not convinced that it's a good
> idea to apply this to 9.3 and 9.4.  This patch changes the way we do
> truncation in those older releases; instead of happening at a
> restartpoint, it happens when oldestMultiXid advances.

The primary reason for doing that is that doing it at restartpoints is
simply *wrong*. Restartpoints aren't scheduled in sync with replay -
which means that a restartpoint can (will actually) happen long long
after the checkpoint from the primary has replayed.  Which means that by
the time the restartpoint is performed it's actually not unlikely that
we've already filled all slru segments. Which is bad if we then fail
over/start up.

Aside from the more fundamental issue that restartpoints have to be
"asynchronous" with respect to the checkpoint record for performance
reasons, there's a bunch of additional reasons making this even more
likely to occur: Differing checkpoint segments on the standby and
pending actions (which we got rid off in 9.5+, but ...)

> I realize that you disagree and will probably commit this to those
> branches anyway. But I want it to be clear that I don't endorse that.

I don't plan to commit/backpatch this over your objection.

I do think it'd be the better approach, and I personally think that
we're much more likely to introduce bugs if we backpatch this in a
year. Which I think we'll end up having to. The longer people run on
these branches, the more issues we'll see.

> I wish more people were paying attention to these patches.

+many

> Other issues:
> - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> least log a message?  If that file is still there when we loop back
> around, it's going to cause a failure, I think.

The existing unlink() call doesn't, that's the only reason I didn't add
a message there. I'm fine with adding a (LOG or WARNING?) message.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Alvaro Herrera
Robert Haas wrote:

> Regarding 0003, I'm still very much not convinced that it's a good
> idea to apply this to 9.3 and 9.4.  This patch changes the way we do
> truncation in those older releases; instead of happening at a
> restartpoint, it happens when oldestMultiXid advances.  I admit that I
> don't see a specific way that that can go wrong, but there are so many
> different old versions with slightly different multixact truncation
> behaviors that it seems very hard to be sure that we're not going to
> make things worse rather than better by introducing yet another
> approach to the problem.  I realize that you disagree and will
> probably commit this to those branches anyway. But I want it to be
> clear that I don't endorse that.

Noted.  I am not sure about changing things so invasively either TBH.
The interactions of this stuff with other parts of the system are very
complicated and it's easy to make a mistake that goes unnoticed until
some weird scenario is run elsewhere.  (Who would have thought that
things would fail when a basebackup takes 12 hours to take and you have
a custom preemptive tuple freeze script in crontab).

> I wish more people were paying attention to these patches.  These are
> critical data-corrupting bugs, the code in question is very tricky,
> it's been majorly revised multiple times, and we're revising it again.
> And nobody except me and Andres is looking at this, and I'm definitely
> not smart enough to get this all right.

I'm also looking, and yes it's tricky.

> Other issues:

It would be good to pgindent the code before producing back-branch
patches.  I think some comments will get changed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Robert Haas
On Tue, Sep 22, 2015 at 9:20 AM, Andres Freund  wrote:
> On 2015-09-21 16:36:03 +0200, Andres Freund wrote:
>> Agreed. I'll update the patch.
>
> Here's updated patches against master. These include the "legacy"
> truncation support. There's no meaningful functional differences in this
> version except addressing the review comments that I agreed with, and a
> fair amount of additional polishing.

0002 looks fine.

Regarding 0003, I'm still very much not convinced that it's a good
idea to apply this to 9.3 and 9.4.  This patch changes the way we do
truncation in those older releases; instead of happening at a
restartpoint, it happens when oldestMultiXid advances.  I admit that I
don't see a specific way that that can go wrong, but there are so many
different old versions with slightly different multixact truncation
behaviors that it seems very hard to be sure that we're not going to
make things worse rather than better by introducing yet another
approach to the problem.  I realize that you disagree and will
probably commit this to those branches anyway. But I want it to be
clear that I don't endorse that.

I wish more people were paying attention to these patches.  These are
critical data-corrupting bugs, the code in question is very tricky,
it's been majorly revised multiple times, and we're revising it again.
And nobody except me and Andres is looking at this, and I'm definitely
not smart enough to get this all right.

Other issues:
- If SlruDeleteSegment fails in unlink(), shouldn't we at the very
least log a message?  If that file is still there when we loop back
around, it's going to cause a failure, I think.

Assorted minor nitpicking:
- "happend" is misspelled in the commit message for 0003
- "in contrast to before" should have a comma after it, also in that
commit message
- "how far the next members wraparound is away" -> "how far away the
next members wraparound is"
- "seing" -> "seeing"
- "Upgrade the primary," -> "Upgrade the primary;"
- "toMultiXact" -> "to MultiXact"

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-22 Thread Andres Freund
On 2015-07-02 11:52:04 -0400, Robert Haas wrote:
> On Mon, Jun 29, 2015 at 3:48 PM, Andres Freund  wrote:
> > New version attached.
> 
> 0002 looks good, but the commit message should perhaps mention the
> comment fix.  Or commit that separately.

I'm inclined to backpatch the applicable parts to 9.0 - seems pointless
to have differing autovacuum_freeze_max_age values and the current value
sucks for testing and space consumption there as well.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-21 Thread Andres Freund
On 2015-09-21 10:30:59 -0700, Josh Berkus wrote:
> On 09/21/2015 07:36 AM, Andres Freund wrote:
> > On 2015-09-21 10:31:17 -0400, Robert Haas wrote:
> >> So, where are we with this patch?
> > 
> > Uh. I'd basically been waiting on further review and then forgot about
> > it.
> 
> Does the current plan to never expire XIDs in 9.6 affect multixact
> truncation at all?

I doubt that it'd in a meaningful manner. Truncations will still need to
happen to contain space usage.

Besides, I'm pretty sceptical of shaping the design of bug fixes to suit
some unwritten feature we only know the highest level design of as of
yet.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-21 Thread Josh Berkus
On 09/21/2015 07:36 AM, Andres Freund wrote:
> On 2015-09-21 10:31:17 -0400, Robert Haas wrote:
>> So, where are we with this patch?
> 
> Uh. I'd basically been waiting on further review and then forgot about
> it.

Does the current plan to never expire XIDs in 9.6 affect multixact
truncation at all?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-21 Thread Andres Freund
On 2015-09-21 10:31:17 -0400, Robert Haas wrote:
> On Sun, Jul 5, 2015 at 3:16 PM, Andres Freund  wrote:
> >>On the other hand, in the common case, by the time we perform a
> >>restartpoint, we're consistent: I think the main exception to that is
> >>if we do a base backup that spans multiple checkpoints.  I think that
> >>in the new location, the chances that the legacy truncation is trying
> >>to read inconsistent data is probably higher.
> >
> > The primary problem isn't that we truncate too early, it's that we delay 
> > truncation on the standby in comparison to the primary by a considerable 
> > amount. All the while continuing to replay multi creations.
> >
> > I don't see the difference wrt. consistency right now, but I don't have 
> > access to the code right now. I mean we *have* to do something while 
> > inconsistent. A start/stop backup can easily span a day or four.
> 
> So, where are we with this patch?

Uh. I'd basically been waiting on further review and then forgot about
it.


> In my opinion, we ought to do something about master and 9.5 before
> beta, so that we're doing *yet another* major release with unfixed
> multixact bugs.  Let's make the relevant truncation changes in master
> and 9.5 and bump the WAL page magic, so that a 9.5alpha standby can't
> be used with a 9.5beta master.  Then, we don't need any of this legacy
> truncation stuff at all, and 9.5 is hopefully in a much better state
> than 9.4 and 9.3.

Hm.

> Now, that still potentially leaves 9.4 and 9.3 users hanging out to
> dry.  But we don't have a tremendous number of those people clamoring
> about this, and if we get 9.5+ correct, then we can go and change the
> logic in 9.4 and 9.3 later when, and if, we are confident that's the
> right thing to do.  I am still not altogether convinced that it's a
> good idea, nor am I altogether convinced that this code is right.
> Perhaps it is, and if we consensus on it, fine.

To me the current logic is much worse than what's in the patch, so I
don't think that's the best way to go. But I'm not not absolutely gung
ho on that.

> But regardless of that, we should not send a third major release to
> beta with the current broken system unless there is really no viable
> alternative.

Agreed. I'll update the patch.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-21 Thread Robert Haas
On Sun, Jul 5, 2015 at 3:16 PM, Andres Freund  wrote:
>>On the other hand, in the common case, by the time we perform a
>>restartpoint, we're consistent: I think the main exception to that is
>>if we do a base backup that spans multiple checkpoints.  I think that
>>in the new location, the chances that the legacy truncation is trying
>>to read inconsistent data is probably higher.
>
> The primary problem isn't that we truncate too early, it's that we delay 
> truncation on the standby in comparison to the primary by a considerable 
> amount. All the while continuing to replay multi creations.
>
> I don't see the difference wrt. consistency right now, but I don't have 
> access to the code right now. I mean we *have* to do something while 
> inconsistent. A start/stop backup can easily span a day or four.

So, where are we with this patch?

In my opinion, we ought to do something about master and 9.5 before
beta, so that we're doing *yet another* major release with unfixed
multixact bugs.  Let's make the relevant truncation changes in master
and 9.5 and bump the WAL page magic, so that a 9.5alpha standby can't
be used with a 9.5beta master.  Then, we don't need any of this legacy
truncation stuff at all, and 9.5 is hopefully in a much better state
than 9.4 and 9.3.

Now, that still potentially leaves 9.4 and 9.3 users hanging out to
dry.  But we don't have a tremendous number of those people clamoring
about this, and if we get 9.5+ correct, then we can go and change the
logic in 9.4 and 9.3 later when, and if, we are confident that's the
right thing to do.  I am still not altogether convinced that it's a
good idea, nor am I altogether convinced that this code is right.
Perhaps it is, and if we consensus on it, fine.  But regardless of
that, we should not send a third major release to beta with the
current broken system unless there is really no viable alternative.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-07-05 Thread Andres Freund
On July 5, 2015 8:50:57 PM GMT+02:00, Robert Haas  wrote:
>On Sun, Jul 5, 2015 at 2:28 PM, Andres Freund 
>wrote:
>> (quick answer, off now)
>>
>> On 2015-07-05 14:20:11 -0400, Robert Haas wrote:
>>> On Thu, Jul 2, 2015 at 2:28 PM, Andres Freund 
>wrote:
>>> > On 2015-07-02 13:58:45 -0400, Robert Haas wrote:
>>> >> I seriously, seriously doubt that it is a good idea to perform
>the
>>> >> legacy truncation from MultiXactAdvanceOldest() rather than
>>> >> TruncateMultiXact().
>>> >
>>> > But where should TruncateMultiXact() be called from? I mean, we
>could
>>> > move the logic from inside MultiXactAdvanceOldest() to some
>special case
>>> > in the replay routine, but what'd be the advantage?
>>>
>>> I think you should call it from where TruncateMultiXact() is being
>>> called from today.  Doing legacy truncations from a different place
>>> than we're currently doing them just gives us more ways to be wrong.
>>
>> The problem with that is that the current location is just plain
>> wrong. Restartpoints can be skipped (due different checkpoint
>segments
>> settings), may not happen at all (pending incomplete actions), and
>can
>> just be slowed down.
>>
>> That's a currently existing bug that's easy to reproduce.
>
>You might be right; I haven't tested that.
>
>On the other hand, in the common case, by the time we perform a
>restartpoint, we're consistent: I think the main exception to that is
>if we do a base backup that spans multiple checkpoints.  I think that
>in the new location, the chances that the legacy truncation is trying
>to read inconsistent data is probably higher.

The primary problem isn't that we truncate too early, it's that we delay 
truncation on the standby in comparison to the primary by a considerable 
amount. All the while continuing to replay multi creations. 

I don't see the difference wrt. consistency right now, but I don't have access 
to the code right now. I mean we *have* to do something while inconsistent. A 
start/stop backup can easily span a day or four. 

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-07-05 Thread Robert Haas
On Sun, Jul 5, 2015 at 2:28 PM, Andres Freund  wrote:
> (quick answer, off now)
>
> On 2015-07-05 14:20:11 -0400, Robert Haas wrote:
>> On Thu, Jul 2, 2015 at 2:28 PM, Andres Freund  wrote:
>> > On 2015-07-02 13:58:45 -0400, Robert Haas wrote:
>> >> I seriously, seriously doubt that it is a good idea to perform the
>> >> legacy truncation from MultiXactAdvanceOldest() rather than
>> >> TruncateMultiXact().
>> >
>> > But where should TruncateMultiXact() be called from? I mean, we could
>> > move the logic from inside MultiXactAdvanceOldest() to some special case
>> > in the replay routine, but what'd be the advantage?
>>
>> I think you should call it from where TruncateMultiXact() is being
>> called from today.  Doing legacy truncations from a different place
>> than we're currently doing them just gives us more ways to be wrong.
>
> The problem with that is that the current location is just plain
> wrong. Restartpoints can be skipped (due different checkpoint segments
> settings), may not happen at all (pending incomplete actions), and can
> just be slowed down.
>
> That's a currently existing bug that's easy to reproduce.

You might be right; I haven't tested that.

On the other hand, in the common case, by the time we perform a
restartpoint, we're consistent: I think the main exception to that is
if we do a base backup that spans multiple checkpoints.  I think that
in the new location, the chances that the legacy truncation is trying
to read inconsistent data is probably higher.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-07-05 Thread Andres Freund
(quick answer, off now)

On 2015-07-05 14:20:11 -0400, Robert Haas wrote:
> On Thu, Jul 2, 2015 at 2:28 PM, Andres Freund  wrote:
> > On 2015-07-02 13:58:45 -0400, Robert Haas wrote:
> >> I seriously, seriously doubt that it is a good idea to perform the
> >> legacy truncation from MultiXactAdvanceOldest() rather than
> >> TruncateMultiXact().
> >
> > But where should TruncateMultiXact() be called from? I mean, we could
> > move the logic from inside MultiXactAdvanceOldest() to some special case
> > in the replay routine, but what'd be the advantage?
> 
> I think you should call it from where TruncateMultiXact() is being
> called from today.  Doing legacy truncations from a different place
> than we're currently doing them just gives us more ways to be wrong.

The problem with that is that the current location is just plain
wrong. Restartpoints can be skipped (due different checkpoint segments
settings), may not happen at all (pending incomplete actions), and can
just be slowed down.

That's a currently existing bug that's easy to reproduce.

> >> The checkpoint hasn't really happened at that point yet; you might
> >> truncate away stuff, then crash before the checkpoint is complete, and
> >> then we you restart recovery, you've got trouble.
> >
> > We're only talking about restartpoints here, right? And I don't see the
> > problem - we don't read the slru anymore until the end of recovery, and
> > the end of recovery can't happen before reaching the minimum revovery
> > location?
> 
> You're still going to have to read the SLRU for as long as you are
> doing legacy truncations, at least.

I'm not following. Sure, we read the SLRUs as we do today. But, in
contrast to the current positioning in recovery, with the patch they're
done at pretty much the same point on the standby as on the primary
today?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-07-05 Thread Robert Haas
On Thu, Jul 2, 2015 at 2:28 PM, Andres Freund  wrote:
> On 2015-07-02 13:58:45 -0400, Robert Haas wrote:
>> On Thu, Jul 2, 2015 at 11:52 AM, Robert Haas  wrote:
>> > Will look at 0003 next.
>>
>> +appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
>>
>> I don't think we typically use this style for notating intervals.
>
> I don't think we really have a very consistent style for xlog messages -
> this seems to describe the meaning accurately?

Although I realize this is supposed to be interval notation, I'm not
sure everyone will immediately figure that out.  I believe it has
created some confusion in the past.  I'm not going to spend a lot of
time arguing with you about it, but I'd do something else, like
offsets from %u stop before %u, members %u stop before %u.

>> [several good points]
>
>> +(errmsg("performing legacy multixact truncation,
>> upgrade master")));
>>
>> This message needs work.  I'm not sure exactly what it should say, but
>> I'm pretty sure that's not clear enough.
>>
>> I seriously, seriously doubt that it is a good idea to perform the
>> legacy truncation from MultiXactAdvanceOldest() rather than
>> TruncateMultiXact().
>
> But where should TruncateMultiXact() be called from? I mean, we could
> move the logic from inside MultiXactAdvanceOldest() to some special case
> in the replay routine, but what'd be the advantage?

I think you should call it from where TruncateMultiXact() is being
called from today.  Doing legacy truncations from a different place
than we're currently doing them just gives us more ways to be wrong.

>> The checkpoint hasn't really happened at that point yet; you might
>> truncate away stuff, then crash before the checkpoint is complete, and
>> then we you restart recovery, you've got trouble.
>
> We're only talking about restartpoints here, right? And I don't see the
> problem - we don't read the slru anymore until the end of recovery, and
> the end of recovery can't happen before reaching the minimum revovery
> location?

You're still going to have to read the SLRU for as long as you are
doing legacy truncations, at least.

>> If TruncateMultiXact() fails to acquire MultiXactTruncationLock right
>> away, does it need to wait, or could it ConditionalAcquire and bail
>> out if the lock isn't obtained?
>
> That seems like premature optimization to me. And one that's not that
> easy to do correctly - what if the current caller actually has a new,
> lower, minimum mxid?

Doesn't the next truncation just catch up?  But sure, I agree this is
inessential (and maybe better left alone for now).

>> I'm not convinced that it's a good idea to remove
>> lastCheckpointedOldest and replace it with nothing.  It seems like a
>> very good idea to have two separate pieces of state in shared memory:
>
>> - The oldest offset that we think anyone might need to access to make
>> a visibility check for a tuple.
>> - The oldest offset that we still have on disk.
>>
>> The latter would now need to be called something else, not
>> lastCheckpointedOldest, but I think it's still good to have it.
>
>> Otherwise, I don't see how you protect against the on-disk state
>> wrapping around before you finish truncating, and then maybe
>> truncation eats something that was busy getting reused.
>
> Unless I miss something the stop limits will prevent that from
> happening? SetMultiXactIdLimit() is called only *after* the truncation
> has finished?

Hmm, that might be, I'd have to reread the patch.  The reason we
originally had it this way was because VACUUM was updating the limit
and then checkpoint was truncating, but now I guess vacuum + truncate
happen so close together that you might only need one value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-07-02 Thread Andres Freund
On 2015-07-02 13:58:45 -0400, Robert Haas wrote:
> On Thu, Jul 2, 2015 at 11:52 AM, Robert Haas  wrote:
> > Will look at 0003 next.
> 
> +appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
> 
> I don't think we typically use this style for notating intervals.

I don't think we really have a very consistent style for xlog messages -
this seems to describe the meaning accurately?

> [several good points]

> +(errmsg("performing legacy multixact truncation,
> upgrade master")));
> 
> This message needs work.  I'm not sure exactly what it should say, but
> I'm pretty sure that's not clear enough.
> 
> I seriously, seriously doubt that it is a good idea to perform the
> legacy truncation from MultiXactAdvanceOldest() rather than
> TruncateMultiXact().

But where should TruncateMultiXact() be called from? I mean, we could
move the logic from inside MultiXactAdvanceOldest() to some special case
in the replay routine, but what'd be the advantage?

> The checkpoint hasn't really happened at that point yet; you might
> truncate away stuff, then crash before the checkpoint is complete, and
> then we you restart recovery, you've got trouble.

We're only talking about restartpoints here, right? And I don't see the
problem - we don't read the slru anymore until the end of recovery, and
the end of recovery can't happen before reaching the minimum revovery
location?

> I think you should
> be very, very cautious about rejiggering the order of operations here.
> The current situation is not good, but casually rejiggering it can
> make things much worse.

The current placement - as part of the restartpoint - is utterly broken
and unpredictable. There'll frequently be no restartpoints performed at
all (due to different checkpoint segments, slow writeout, or pending
actions).  Because there's no careful timing of when this happens it's
much harder to understand the exact state in which the truncation
happens - I think moving it to a specific location during replay makes
things considerably easier.

> Generally, I think you've weakened the logic in SetOffsetVacuumLimit()
> appreciably here.  The existing code is careful never to set
> oldestOffsetKnown false when it was previously true.  Your rewrite
> removes that property.  Generally, I see no need for this function to
> be overhauled to the extent that you have, and would suggest reverting
> the changes that aren't absolutely required.

A lot of that has to do that the whole stuff about truncations happening
during checkpoints is gone and that thus the split with
DetermineSafeOldestOffset() doesn't make sense anymore.

That oldestOffsetKnown is unset is wrong - the if (prevOldestOffsetKnown
&& !oldestOffsetKnown) block should be a bit earlier.

> I don't particularly like the fact that find_multixact_start() calls
> SimpleLruFlush().  I think that's really a POLA violation: you don't
> expect that a function that looks like a simple inquiry is going to go
> do a bunch of unrelated I/O in the background.  If somebody called
> find_multixact_start() with any frequency, you'd be sad.  You're just
> doing it this way because of the context *in which you expect
> find_multixact_start* to be called, which does not seem very
> future-proof.  I prefer Thomas's approach.

I don't strongly care, but I do think it has some value to be sure about
the on-disk state for the current callers.  I think it'd be a pretty odd
thing to call find_multixact_start() frequently.

> If TruncateMultiXact() fails to acquire MultiXactTruncationLock right
> away, does it need to wait, or could it ConditionalAcquire and bail
> out if the lock isn't obtained?

That seems like premature optimization to me. And one that's not that
easy to do correctly - what if the current caller actually has a new,
lower, minimum mxid?

> + * XXX: It's also possible that the page that oldestMXact is on has
> + * already been truncated away, and we crashed before updating
> + * oldestMXact.
> 
> But why is that an XXX?  I think this is just a case of recovery
> needing tolerate redo of an action already redone.

Should rather have been NB.

> I'm not convinced that it's a good idea to remove
> lastCheckpointedOldest and replace it with nothing.  It seems like a
> very good idea to have two separate pieces of state in shared memory:

> - The oldest offset that we think anyone might need to access to make
> a visibility check for a tuple.
> - The oldest offset that we still have on disk.
> 
> The latter would now need to be called something else, not
> lastCheckpointedOldest, but I think it's still good to have it.

> Otherwise, I don't see how you protect against the on-disk state
> wrapping around before you finish truncating, and then maybe
> truncation eats something that was busy getting reused.

Unless I miss something the stop limits will prevent that from
happening? SetMultiXactIdLimit() is called only *after* the truncation
has finished?

> + * Update in-memory lim

Re: [HACKERS] Rework the way multixact truncations work

2015-07-02 Thread Robert Haas
On Thu, Jul 2, 2015 at 11:52 AM, Robert Haas  wrote:
> Will look at 0003 next.

+appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",

I don't think we typically use this style for notating intervals.

 case XLOG_MULTIXACT_CREATE_ID:
 id = "CREATE_ID";
 break;
+case XLOG_MULTIXACT_TRUNCATE_ID:
+id = "TRUNCATE";
+break;

If XLOG_MULTIXACT_CREATE_ID -> "CREATE_ID", then why not
XLOG_MULTIXACT_TRUNCATE_ID -> "TRUNCATE_ID"?

+ * too old to general truncation records.

s/general/generate/

+MultiXactId oldestMXactDB;

Data type should be OID.

+ * Recompute limits as we are now fully started, we now can correctly
+ * compute how far a members wraparound is away.

s/,/:/ or something.  This isn't particularly good English as written.

+ * Computing the actual limits is only possible once the data directory is
+ * in a consistent state. There's no need to compute the limits while
+ * still replaying WAL as no new multis can be created anyway. So we'll
+ * only do further checks after TrimMultiXact() has been called.

Multis can be and are created during replay.  What this should really
say is that we have no choice about whether to create them or not: we
just have to replay whatever's there.

+(errmsg("performing legacy multixact truncation,
upgrade master")));

This message needs work.  I'm not sure exactly what it should say, but
I'm pretty sure that's not clear enough.

I seriously, seriously doubt that it is a good idea to perform the
legacy truncation from MultiXactAdvanceOldest() rather than
TruncateMultiXact().  The checkpoint hasn't really happened at that
point yet; you might truncate away stuff, then crash before the
checkpoint is complete, and then we you restart recovery, you've got
trouble.  I think you should be very, very cautious about rejiggering
the order of operations here.  The current situation is not good, but
casually rejiggering it can make things much worse.

- * If no multixacts exist, then oldestMultiXactId will be the next
- * multixact that will be created, rather than an existing multixact.
+ * Determine the offset of the oldest multixact.  Normally, we can read
+ * the offset from the multixact itself, but there's an important special
+ * case: if there are no multixacts in existence at all, oldestMXact
+ * obviously can't point to one.  It will instead point to the multixact
+ * ID that will be assigned the next time one is needed.

There's no need to change this; it means the same thing either way.

Generally, I think you've weakened the logic in SetOffsetVacuumLimit()
appreciably here.  The existing code is careful never to set
oldestOffsetKnown false when it was previously true.  Your rewrite
removes that property.  Generally, I see no need for this function to
be overhauled to the extent that you have, and would suggest reverting
the changes that aren't absolutely required.

I don't particularly like the fact that find_multixact_start() calls
SimpleLruFlush().  I think that's really a POLA violation: you don't
expect that a function that looks like a simple inquiry is going to go
do a bunch of unrelated I/O in the background.  If somebody called
find_multixact_start() with any frequency, you'd be sad.  You're just
doing it this way because of the context *in which you expect
find_multixact_start* to be called, which does not seem very
future-proof.  I prefer Thomas's approach.

If TruncateMultiXact() fails to acquire MultiXactTruncationLock right
away, does it need to wait, or could it ConditionalAcquire and bail
out if the lock isn't obtained?

+ * Make sure to only attempt truncation if there's values to truncate
+ * away. In normal processing values shouldn't go backwards, but there's
+ * some corner cases (due to bugs) where that's possible.

I think this comment should be more detailed.  Is that talking about
the same thing as this comment:

- * Due to bugs in early releases of PostgreSQL 9.3.X and 9.4.X,
- * oldestMXact might point to a multixact that does not exist.
- * Autovacuum will eventually advance it to a value that does exist,
- * and we want to set a proper offsetStopLimit when that happens,
- * so call DetermineSafeOldestOffset here even if we're not actually
- * truncating.

This comment seems to be saying there's a race condition:

+ * XXX: It's also possible that the page that oldestMXact is on has
+ * already been truncated away, and we crashed before updating
+ * oldestMXact.

But why is that an XXX?  I think this is just a case of recovery
needing tolerate redo of an action already redone.

I'm not convinced that it's a good idea to remove
lastCheckpointedOldest and replace it with nothing.  It seems like a
very good idea to have two separate pieces of state in shared memory:

- The oldest offset that we think anyone might need to access to make
a visi

Re: [HACKERS] Rework the way multixact truncations work

2015-07-02 Thread Robert Haas
On Mon, Jun 29, 2015 at 3:48 PM, Andres Freund  wrote:
> New version attached.

0002 looks good, but the commit message should perhaps mention the
comment fix.  Or commit that separately.

Will look at 0003 next.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-06-29 Thread Andres Freund
On 2015-06-29 23:54:40 +1200, Thomas Munro wrote:
> On Mon, Jun 22, 2015 at 7:24 AM, Andres Freund  wrote:
> > It'd be very welcome to see some wider testing and review on this.
> 
> I started looking at this and doing some testing.  Here is some
> initial feedback:
> 
> Perhaps vac_truncate_clog needs a new name now that it does more,
> maybe vac_truncate_transaction_logs?

It has done more before, so I don't really see a connection to this
patch...

> MultiXactState->sawTruncationCkptCyle: is 'Cyle' supposed to be 'Cycle'?

Oops.

> In the struct xl_multixact_truncate, and also the function
> WriteMTruncateXlogRec and other places, I think you have confused the
> typedefs MultiXactOffset and MultiXactId.  If I'm not mistaken,
> startMemb and endMemb have the correct type, but startOff and endOff
> should be of type MultiXactId despite their names (the *values* stored
> inside pg_multixact/offsets are indeed offsets (into
> pg_multixact/members), but their *location* is what a multixact ID
> represents).

IIRC I did it that way to make clear this is just 'byte' type offsets,
without other meaning. Wasn't a good idea.

> I was trying to understand if there could be any problem caused by
> setting latest_page_number to the pageno that holds (or will hold)
> xlrec.endOff in multixact_redo.  The only thing that jumps out at me
> is that the next call to SlruSelectLRUPage will no longer be prevented
> from evicting the *real* latest page (the most recently created page).

That hasn't changed unless I miss something?

> In SlruDeleteSegment, is it OK to call unlink() while holding the SLRU
> control lock?

I think it's safer than not doing it, but don't particularly care.

> In find_multixact_start you call SimpleLruFlush before calling
> SimpleLruDoesPhysicalPageExist.  Should we do something like this
> instead?  https://gist.github.com/macdice/8e5b2f0fe3827fdf3d5a

I'm currently slightly inclined to do it "my way". They way these
functions are used it doesn't seem like a bad property to ensure things
are on disk.

> I think saw some extra autovacuum activity that I didn't expect in a
> simple scenario, but I'm not sure and will continue looking tomorrow.

Cool, thanks!

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-06-29 Thread Thomas Munro
On Mon, Jun 22, 2015 at 7:24 AM, Andres Freund  wrote:
> It'd be very welcome to see some wider testing and review on this.

I started looking at this and doing some testing.  Here is some
initial feedback:

Perhaps vac_truncate_clog needs a new name now that it does more,
maybe vac_truncate_transaction_logs?

MultiXactState->sawTruncationCkptCyle: is 'Cyle' supposed to be 'Cycle'?

In the struct xl_multixact_truncate, and also the function
WriteMTruncateXlogRec and other places, I think you have confused the
typedefs MultiXactOffset and MultiXactId.  If I'm not mistaken,
startMemb and endMemb have the correct type, but startOff and endOff
should be of type MultiXactId despite their names (the *values* stored
inside pg_multixact/offsets are indeed offsets (into
pg_multixact/members), but their *location* is what a multixact ID
represents).

I was trying to understand if there could be any problem caused by
setting latest_page_number to the pageno that holds (or will hold)
xlrec.endOff in multixact_redo.  The only thing that jumps out at me
is that the next call to SlruSelectLRUPage will no longer be prevented
from evicting the *real* latest page (the most recently created page).

In SlruDeleteSegment, is it OK to call unlink() while holding the SLRU
control lock?

In find_multixact_start you call SimpleLruFlush before calling
SimpleLruDoesPhysicalPageExist.  Should we do something like this
instead?  https://gist.github.com/macdice/8e5b2f0fe3827fdf3d5a

I think saw some extra autovacuum activity that I didn't expect in a
simple scenario, but I'm not sure and will continue looking tomorrow.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-06-27 Thread Andres Freund
On 2015-06-26 14:48:35 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > Rework the way multixact truncations work.
> 
> I spent some time this morning reviewing this patch and had some
> comments that I relayed over IM to Andres.

Thanks for that!

> 2. We set PGXACT->delayChkpt while the truncation is executed.  This
> seems reasonable, and there's a good reason for it, but all the other
> users of this facility only do small operations with this thing grabbed,
> while the multixact truncation could take a long time because a large
> number of files might be deleted.  Maybe it's not a problem to have
> checkpoints be delayed by several seconds, or who knows maybe even a
> minute in a busy system.  (We will have checkpointer sleeping in 10ms
> intervals until the truncation is complete).

I don't think this is a problem. Consider that we're doing all this in
the checkpointer today, blocking much more than just the actual xlog
insertion. That's a bigger problem, as we'll not do the paced writing
during that and such. The worst thatthis can cause is a bunch of sleeps,
that seems fairly harmless.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-06-26 Thread Alvaro Herrera
Andres Freund wrote:

> Rework the way multixact truncations work.

I spent some time this morning reviewing this patch and had some
comments that I relayed over IM to Andres.  The vast majority is
cosmetic, but there are two larger things:

1. I think this part of PerformMembersTruncation() is very confusing:

/* verify whether the current segment is to be deleted */
if (segment != startsegment && segment != endsegment)
SlruDeleteSegment(MultiXactMemberCtl, segment);

I think this works correctly in that it preserves both endpoint files,
but the files in between are removed ... which is a confusing interface,
IMO.  I think this merits a longer explanation.


2. We set PGXACT->delayChkpt while the truncation is executed.  This
seems reasonable, and there's a good reason for it, but all the other
users of this facility only do small operations with this thing grabbed,
while the multixact truncation could take a long time because a large
number of files might be deleted.  Maybe it's not a problem to have
checkpoints be delayed by several seconds, or who knows maybe even a
minute in a busy system.  (We will have checkpointer sleeping in 10ms
intervals until the truncation is complete).

Maybe this is fine, not sure.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers