subject:"Re\: \[HACKERS\] foreign key locks, 2nd attempt"


Excerpts from Robert Haas's message of jue mar 15 21:37:36 -0300 2012:
> 
> On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs  wrote:
> > On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas  wrote:
> >>> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
> >>> mxid
> >>> lookups, so I think something more sophisticated is needed to exercise 
> >>> that
> >>> cost.  Not sure what.
> >>
> >> I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
> >> all-visible.
> >
> > So because committed does not equal all visible there will be
> > additional lookups on mxids? That's complete rubbish.
> 
> Noah seemed to be implying that once the updating transaction
> committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
>  But I think that's not true, because anyone who looks at the tuple
> afterward will still need to know the exact xmax, to test it against
> their snapshot.

Yeah, we don't set HEAP_XMAX_COMMITTED on multis, even when there's an
update in it and it committed.  I think we could handle it, at least
some of the cases, but that'd require careful re-examination of all the
tqual.c code, which is not something I want to do right now.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Robert Haas

On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs  wrote:
> On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas  wrote:
>>> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
>>> lookups, so I think something more sophisticated is needed to exercise that
>>> cost.  Not sure what.
>>
>> I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
>> all-visible.
>
> So because committed does not equal all visible there will be
> additional lookups on mxids? That's complete rubbish.

Noah seemed to be implying that once the updating transaction
committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
 But I think that's not true, because anyone who looks at the tuple
afterward will still need to know the exact xmax, to test it against
their snapshot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 10:13 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012:
>>
>> On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
>>  wrote:
>> >
>> > Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
>> >> On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas  
>> >> wrote:
>> >
>> >> > But that would only make sense if
>> >> > we thought that getting rid of the fsyncs would be more valuable than
>> >> > avoiding the blocking here, and I don't.
>> >>
>> >> You're right that the existing code could use some optimisation.
>> >>
>> >> I'm a little tired, but I can't see a reason to fsync this except at 
>> >> checkpoint.
>> >
>> > Hang on.  What fsyncs are we talking about?  I don't see that the
>> > multixact code calls any fsync except that checkpoint and shutdown.
>>
>> If a dirty page is evicted it will fsync.
>
> Ah, right.
>
>> >> Also seeing that we issue 2 WAL records for each RI check. We issue
>> >> one during MultiXactIdCreate/MultiXactIdExpand and then immediately
>> >> afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
>> >> that each thinks it is doing it for the same reason and is the only
>> >> place its being done. Alvaro, any ideas why that is.
>> >
>> > AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
>> > being locked by a multixact -- it doesn't record the contents (member
>> > xids) of said multixact, which is what the other log entry records.
>>
>> Agreed. But issuing two records when we could issue just one seems a
>> little strange, especially when the two record types follow one
>> another so closely - so we end up queuing for the lock twice while
>> holding the lock on the data block.
>
> Hmm, that seems optimization that could be done separately.

Oh yes, definitely not something for you to add to the main patch.

Just some additional tuning to alleviate Robert's concerns.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012:
> 
> On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
>  wrote:
> >
> > Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
> >> On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas  wrote:
> >
> >> > But that would only make sense if
> >> > we thought that getting rid of the fsyncs would be more valuable than
> >> > avoiding the blocking here, and I don't.
> >>
> >> You're right that the existing code could use some optimisation.
> >>
> >> I'm a little tired, but I can't see a reason to fsync this except at 
> >> checkpoint.
> >
> > Hang on.  What fsyncs are we talking about?  I don't see that the
> > multixact code calls any fsync except that checkpoint and shutdown.
> 
> If a dirty page is evicted it will fsync.

Ah, right.

> >> Also seeing that we issue 2 WAL records for each RI check. We issue
> >> one during MultiXactIdCreate/MultiXactIdExpand and then immediately
> >> afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
> >> that each thinks it is doing it for the same reason and is the only
> >> place its being done. Alvaro, any ideas why that is.
> >
> > AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
> > being locked by a multixact -- it doesn't record the contents (member
> > xids) of said multixact, which is what the other log entry records.
> 
> Agreed. But issuing two records when we could issue just one seems a
> little strange, especially when the two record types follow one
> another so closely - so we end up queuing for the lock twice while
> holding the lock on the data block.

Hmm, that seems optimization that could be done separately.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
>> On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas  wrote:
>
>> > But that would only make sense if
>> > we thought that getting rid of the fsyncs would be more valuable than
>> > avoiding the blocking here, and I don't.
>>
>> You're right that the existing code could use some optimisation.
>>
>> I'm a little tired, but I can't see a reason to fsync this except at 
>> checkpoint.
>
> Hang on.  What fsyncs are we talking about?  I don't see that the
> multixact code calls any fsync except that checkpoint and shutdown.

If a dirty page is evicted it will fsync.

>> Also seeing that we issue 2 WAL records for each RI check. We issue
>> one during MultiXactIdCreate/MultiXactIdExpand and then immediately
>> afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
>> that each thinks it is doing it for the same reason and is the only
>> place its being done. Alvaro, any ideas why that is.
>
> AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
> being locked by a multixact -- it doesn't record the contents (member
> xids) of said multixact, which is what the other log entry records.

Agreed. But issuing two records when we could issue just one seems a
little strange, especially when the two record types follow one
another so closely - so we end up queuing for the lock twice while
holding the lock on the data block.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue mar 15 18:46:44 -0300 2012:
> 
> On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera
>  wrote:
> 
> > As things stand today
> 
> Can I confirm where we are now? Is there another version of the patch
> coming out soon?

Yes, another version is coming soon.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
> On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas  wrote:

> > But that would only make sense if
> > we thought that getting rid of the fsyncs would be more valuable than
> > avoiding the blocking here, and I don't.
> 
> You're right that the existing code could use some optimisation.
> 
> I'm a little tired, but I can't see a reason to fsync this except at 
> checkpoint.

Hang on.  What fsyncs are we talking about?  I don't see that the
multixact code calls any fsync except that checkpoint and shutdown.

> Also seeing that we issue 2 WAL records for each RI check. We issue
> one during MultiXactIdCreate/MultiXactIdExpand and then immediately
> afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
> that each thinks it is doing it for the same reason and is the only
> place its being done. Alvaro, any ideas why that is.

AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
being locked by a multixact -- it doesn't record the contents (member
xids) of said multixact, which is what the other log entry records.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera
 wrote:

> As things stand today

Can I confirm where we are now? Is there another version of the patch
coming out soon?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas  wrote:
> On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera
>  wrote:
>>> > Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
>>> > today?
>>>
>>> Good question.  So far, I can't think of a reason.  "nextMulti" is critical,
>>> but we already fsync it with pg_control.  We could delete the other 
>>> multixact
>>> state data at every startup and set OldestVisibleMXactId accordingly.
>>
>> Hmm, yeah.
>
> In a way, the fact that we don't do that is kind of fortuitous in this
> situation.  I had just assumed that we were not fsyncing it because
> there seems to be no reason to do so.  But since we are, we already
> know that the fsyncs resulting from frequent mxid allocation aren't a
> huge pain point.  If they were, somebody would have presumably
> complained about it and fixed it before now.  So that means that what
> we're really worrying about here is the overhead of fsyncing a little
> more often, which is a lot less scary than starting to do it when we
> weren't previously.

Good

> Now, we could look at this as an opportunity to optimize the existing
> implementation by removing the fsyncs, rather than adding the new
> infrastructure Alvaro is proposing.

This is not an exercise in tuning mxact code. There is a serious
algorithmic problem that is causing real world problems.

Removing the fsync will *not* provide a solution to the problem, so
there is no "opportunity" here.

> But that would only make sense if
> we thought that getting rid of the fsyncs would be more valuable than
> avoiding the blocking here, and I don't.

You're right that the existing code could use some optimisation.

I'm a little tired, but I can't see a reason to fsync this except at checkpoint.

Also seeing that we issue 2 WAL records for each RI check. We issue
one during MultiXactIdCreate/MultiXactIdExpand and then immediately
afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
that each thinks it is doing it for the same reason and is the only
place its being done. Alvaro, any ideas why that is.


> I still think that someone needs to do some benchmarking here, because
> this is a big complicated performance patch, and we can't predict the
> impact of it on real-world scenarios without testing.  There is
> clearly some additional overhead, and it makes sense to measure it and
> hopefully discover that it isn't excessive.  Still, I'm a bit
> relieved.

Very much agreed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 2:15 AM, Robert Haas  wrote:
> On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch  wrote:
>>> Well, post-release, the cat is out of the bag: we'll be stuck with
>>> this whether the performance characteristics are acceptable or not.
>>> That's why we'd better be as sure as possible before committing to
>>> this implementation that there's nothing we can't live with.  It's not
>>> like there's any reasonable way to turn this off if you don't like it.
>>
>> I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
>> syntax additions.  We could even avoid doing that by not documenting them.  A
>> later major release could implement them using a completely different
>> mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
>> UPDATE.  To be sure, let's still do a good job the first time.
>
> What I mean is really that, once the release is out, we don't get to
> take it back.  Sure, the next release can fix things, but any
> regressions will become obstacles to upgrading and pain points for new
> users.

This comment is completely superfluous. It's a complete waste of time
to turn up on a thread and remind people that if they commit something
and it doesn't actually work that it would be a bad thing. Why, we
might ask do you think that thought needs to be expressed here?
Please, don't answer, lets spend the time on actually reviewing the
patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas  wrote:

>> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
>> lookups, so I think something more sophisticated is needed to exercise that
>> cost.  Not sure what.
>
> I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
> all-visible.

So because committed does not equal all visible there will be
additional lookups on mxids? That's complete rubbish.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas

On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera
 wrote:
>> > Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
>> > today?
>>
>> Good question.  So far, I can't think of a reason.  "nextMulti" is critical,
>> but we already fsync it with pg_control.  We could delete the other multixact
>> state data at every startup and set OldestVisibleMXactId accordingly.
>
> Hmm, yeah.

In a way, the fact that we don't do that is kind of fortuitous in this
situation.  I had just assumed that we were not fsyncing it because
there seems to be no reason to do so.  But since we are, we already
know that the fsyncs resulting from frequent mxid allocation aren't a
huge pain point.  If they were, somebody would have presumably
complained about it and fixed it before now.  So that means that what
we're really worrying about here is the overhead of fsyncing a little
more often, which is a lot less scary than starting to do it when we
weren't previously.

Now, we could look at this as an opportunity to optimize the existing
implementation by removing the fsyncs, rather than adding the new
infrastructure Alvaro is proposing.  But that would only make sense if
we thought that getting rid of the fsyncs would be more valuable than
avoiding the blocking here, and I don't.

I still think that someone needs to do some benchmarking here, because
this is a big complicated performance patch, and we can't predict the
impact of it on real-world scenarios without testing.  There is
clearly some additional overhead, and it makes sense to measure it and
hopefully discover that it isn't excessive.  Still, I'm a bit
relieved.

> I have noticed that this code is not correct, because we don't know that
> we're holding an appropriate lock on the page, so we can't simply change
> the Xmax and reset those hint bits.  As things stand today, mxids
> persist longer.  (We could do some cleanup at HOT-style page prune, for
> example, though the lock we need is even weaker than that.)  Overall
> this means that coming up with a test case demonstrating this pressure
> probably isn't that hard.

What would such a test case look like?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas

On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch  wrote:
>> Well, post-release, the cat is out of the bag: we'll be stuck with
>> this whether the performance characteristics are acceptable or not.
>> That's why we'd better be as sure as possible before committing to
>> this implementation that there's nothing we can't live with.  It's not
>> like there's any reasonable way to turn this off if you don't like it.
>
> I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
> syntax additions.  We could even avoid doing that by not documenting them.  A
> later major release could implement them using a completely different
> mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
> UPDATE.  To be sure, let's still do a good job the first time.

What I mean is really that, once the release is out, we don't get to
take it back.  Sure, the next release can fix things, but any
regressions will become obstacles to upgrading and pain points for new
users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Alvaro Herrera


Excerpts from Noah Misch's message of mié mar 14 19:10:00 -0300 2012:
> 
> On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote:
> > On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch  wrote:
> > > More often than that; each 2-member mxid takes 4 bytes in an offsets file 
> > > and
> > > 10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note
> > > that we already fsync the multixact SLRUs today, so any increase will 
> > > arise
> > > from the widening of member entries from 4 bytes to 5. ?The realism of 
> > > this
> > > test is attractive. ?Nearly-static parent tables are plenty common, and 
> > > this
> > > test will illustrate the impact on those designs.
> > 
> > Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
> > today?
> 
> Good question.  So far, I can't think of a reason.  "nextMulti" is critical,
> but we already fsync it with pg_control.  We could delete the other multixact
> state data at every startup and set OldestVisibleMXactId accordingly.

Hmm, yeah.

> > > You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
> > > mxid
> > > lookups, so I think something more sophisticated is needed to exercise 
> > > that
> > > cost. ?Not sure what.
> > 
> > I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
> > all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.
> 
> True.  The patch (see ResetMultiHintBit()) also replaces a multixact xmax with
> the updater xid when all transactions of the multixact have ended.

I have noticed that this code is not correct, because we don't know that
we're holding an appropriate lock on the page, so we can't simply change
the Xmax and reset those hint bits.  As things stand today, mxids
persist longer.  (We could do some cleanup at HOT-style page prune, for
example, though the lock we need is even weaker than that.)  Overall
this means that coming up with a test case demonstrating this pressure
probably isn't that hard.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Noah Misch

On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote:
> On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch  wrote:
> > More often than that; each 2-member mxid takes 4 bytes in an offsets file 
> > and
> > 10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note
> > that we already fsync the multixact SLRUs today, so any increase will arise
> > from the widening of member entries from 4 bytes to 5. ?The realism of this
> > test is attractive. ?Nearly-static parent tables are plenty common, and this
> > test will illustrate the impact on those designs.
> 
> Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
> today?

Good question.  So far, I can't think of a reason.  "nextMulti" is critical,
but we already fsync it with pg_control.  We could delete the other multixact
state data at every startup and set OldestVisibleMXactId accordingly.

> > You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
> > lookups, so I think something more sophisticated is needed to exercise that
> > cost. ?Not sure what.
> 
> I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
> all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.

True.  The patch (see ResetMultiHintBit()) also replaces a multixact xmax with
the updater xid when all transactions of the multixact have ended.  You would
need a test workload with long-running multixacts that delay such replacement.
However, the workloads that come to mind are the very workloads for which this
patch eliminates lock waits; they wouldn't illustrate a worst-case.

> >> This isn't exactly a test case, but from Noah's previous comments I
> >> gather that there is a theoretical risk of mxid consumption running
> >> ahead of xid consumption. ?We should try to think about whether there
> >> are any realistic workloads where that might actually happen. ?I'm
> >> willing to believe that there aren't, but not just because somebody
> >> asserts it. ?The reason I'm concerned about this is because, if it
> >> should happen, the result will be more frequent anti-wraparound
> >> vacuums on every table in the cluster. ?Those are already quite
> >> painful for some users.
> >
> > Yes. ?Pre-release, what can we really do here other than have more people
> > thinking about ways it might happen in practice? ?Post-release, we could
> > suggest monitoring methods or perhaps have VACUUM emit a WARNING when a 
> > table
> > is using more mxid space than xid space.
> 
> Well, post-release, the cat is out of the bag: we'll be stuck with
> this whether the performance characteristics are acceptable or not.
> That's why we'd better be as sure as possible before committing to
> this implementation that there's nothing we can't live with.  It's not
> like there's any reasonable way to turn this off if you don't like it.

I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
syntax additions.  We could even avoid doing that by not documenting them.  A
later major release could implement them using a completely different
mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
UPDATE.  To be sure, let's still do a good job the first time.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas

On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch  wrote:
> More often than that; each 2-member mxid takes 4 bytes in an offsets file and
> 10 bytes in a members file.  So, more like one fsync per ~580 mxids.  Note
> that we already fsync the multixact SLRUs today, so any increase will arise
> from the widening of member entries from 4 bytes to 5.  The realism of this
> test is attractive.  Nearly-static parent tables are plenty common, and this
> test will illustrate the impact on those designs.

Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU today?

> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
> lookups, so I think something more sophisticated is needed to exercise that
> cost.  Not sure what.

I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.

>> This isn't exactly a test case, but from Noah's previous comments I
>> gather that there is a theoretical risk of mxid consumption running
>> ahead of xid consumption.  We should try to think about whether there
>> are any realistic workloads where that might actually happen.  I'm
>> willing to believe that there aren't, but not just because somebody
>> asserts it.  The reason I'm concerned about this is because, if it
>> should happen, the result will be more frequent anti-wraparound
>> vacuums on every table in the cluster.  Those are already quite
>> painful for some users.
>
> Yes.  Pre-release, what can we really do here other than have more people
> thinking about ways it might happen in practice?  Post-release, we could
> suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table
> is using more mxid space than xid space.

Well, post-release, the cat is out of the bag: we'll be stuck with
this whether the performance characteristics are acceptable or not.
That's why we'd better be as sure as possible before committing to
this implementation that there's nothing we can't live with.  It's not
like there's any reasonable way to turn this off if you don't like it.

> Also consider a benchmark that does plenty of non-key updates on a parent
> table with no activity on the child table.  We'll pay the overhead of
> determining that the key column(s) have not changed, but it will never pay off
> by preventing a lock wait.

Good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Noah Misch

On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
> On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs  wrote:
> > I agree with you that some worst case performance tests should be
> > done. Could you please say what you think the worst cases would be, so
> > those can be tested? That would avoid wasting time or getting anything
> > backwards.
> 
> I've thought about this some and here's what I've come up with so far:
> 
> 1. SELECT FOR SHARE on a large table on a system with no write cache.

Easy enough that we may as well check it.  Share-locking an entire large table
is impractical in a real application, so I would not worry if this shows a
substantial regression.

> 2. A small parent table (say 30 rows or so) and a larger child table
> with a many-to-one FK relationship to the parent (say 100 child rows
> per parent row), with heavy update activity on the child table, on a
> system where fsyncs are very slow.  This should generate lots of mxid
> consumption, and every 1600 or so mxids (I think) we've got to fsync;
> does that generate a noticeable performance hit?

More often than that; each 2-member mxid takes 4 bytes in an offsets file and
10 bytes in a members file.  So, more like one fsync per ~580 mxids.  Note
that we already fsync the multixact SLRUs today, so any increase will arise
from the widening of member entries from 4 bytes to 5.  The realism of this
test is attractive.  Nearly-static parent tables are plenty common, and this
test will illustrate the impact on those designs.

> 3. It would be nice to test the impact of increased mxid lookups in
> the parent, but I've realized that the visibility map will probably
> mask a good chunk of that effect, which is a good thing.  Still, maybe
> something like this: a fairly large parent table, say a million rows,
> but narrow rows, so that many of them fit on a page, with frequent
> reads and occasional updates (if there are only reads, autovacuum
> might end with all the visibility map bits set); plus a child table
> with one or a few rows per parent which is heavily updated.  In theory
> this ought to be good for the patch, since the the more fine-grained
> locking will avoid blocking, but in this case the parent table is
> large enough that you shouldn't get much blocking anyway, yet you'll
> still pay the cost of mxid lookups because the occasional updates on
> the parent will clear VM bits.  This might not be the exactly right
> workload to measure this effect, but if it's not maybe someone can
> devote a little time to thinking about what would be.

You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
lookups, so I think something more sophisticated is needed to exercise that
cost.  Not sure what.

> 4. A plain old pgbench run or two, to see whether there's any
> regression when none of this matters at all...

Might as well.

> This isn't exactly a test case, but from Noah's previous comments I
> gather that there is a theoretical risk of mxid consumption running
> ahead of xid consumption.  We should try to think about whether there
> are any realistic workloads where that might actually happen.  I'm
> willing to believe that there aren't, but not just because somebody
> asserts it.  The reason I'm concerned about this is because, if it
> should happen, the result will be more frequent anti-wraparound
> vacuums on every table in the cluster.  Those are already quite
> painful for some users.

Yes.  Pre-release, what can we really do here other than have more people
thinking about ways it might happen in practice?  Post-release, we could
suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table
is using more mxid space than xid space.

Also consider a benchmark that does plenty of non-key updates on a parent
table with no activity on the child table.  We'll pay the overhead of
determining that the key column(s) have not changed, but it will never pay off
by preventing a lock wait.  Granted, this is barely representative of
application behavior.  Perhaps, too, we already have a good sense of this cost
from the HOT benchmarking efforts and have no cause to revisit it.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Robert Haas

On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs  wrote:
> I agree with you that some worst case performance tests should be
> done. Could you please say what you think the worst cases would be, so
> those can be tested? That would avoid wasting time or getting anything
> backwards.

I've thought about this some and here's what I've come up with so far:

1. SELECT FOR SHARE on a large table on a system with no write cache.

2. A small parent table (say 30 rows or so) and a larger child table
with a many-to-one FK relationship to the parent (say 100 child rows
per parent row), with heavy update activity on the child table, on a
system where fsyncs are very slow.  This should generate lots of mxid
consumption, and every 1600 or so mxids (I think) we've got to fsync;
does that generate a noticeable performance hit?

3. It would be nice to test the impact of increased mxid lookups in
the parent, but I've realized that the visibility map will probably
mask a good chunk of that effect, which is a good thing.  Still, maybe
something like this: a fairly large parent table, say a million rows,
but narrow rows, so that many of them fit on a page, with frequent
reads and occasional updates (if there are only reads, autovacuum
might end with all the visibility map bits set); plus a child table
with one or a few rows per parent which is heavily updated.  In theory
this ought to be good for the patch, since the the more fine-grained
locking will avoid blocking, but in this case the parent table is
large enough that you shouldn't get much blocking anyway, yet you'll
still pay the cost of mxid lookups because the occasional updates on
the parent will clear VM bits.  This might not be the exactly right
workload to measure this effect, but if it's not maybe someone can
devote a little time to thinking about what would be.

4. A plain old pgbench run or two, to see whether there's any
regression when none of this matters at all...

This isn't exactly a test case, but from Noah's previous comments I
gather that there is a theoretical risk of mxid consumption running
ahead of xid consumption.  We should try to think about whether there
are any realistic workloads where that might actually happen.  I'm
willing to believe that there aren't, but not just because somebody
asserts it.  The reason I'm concerned about this is because, if it
should happen, the result will be more frequent anti-wraparound
vacuums on every table in the cluster.  Those are already quite
painful for some users.

It would be nice if Noah or someone else who has reviewed this patch
in detail could comment further.  I am shooting from the hip here, a
bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Alvaro Herrera


Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
> 
> On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:

> > When there is a single locker in a tuple, we can just store the locking info
> > in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
> > setting hint bits specifying the locking strength.  There is one exception
> > here: since hint bit space is limited, we do not provide a separate hint bit
> > for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
> > that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, 
> > are
> > presumably more commonly used due to being the standards-mandated locking
> > mechanism, or heavily used by the RI code, so we want to provide fast paths
> > for those.)
> 
> Are those tuple bits actually "hint" bits?  They seem quite a bit more
> powerful than a "hint".

I'm not sure what's your point.  We've had a "hint" bit for SELECT FOR
UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
"hints", but it's not the job of this patch to fix that problem.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Robert Haas

On Mon, Mar 12, 2012 at 9:24 PM, Noah Misch  wrote:
> When we lock an update-in-progress row, we walk the t_ctid chain and lock all
> descendant tuples.  They may all have uncommitted xmins.  This is essential to
> ensure that the final outcome of the updating transaction does not affect
> whether the locking transaction has its KEY SHARE lock.  Similarly, when we
> update a previously-locked tuple, we copy any locks (always KEY SHARE locks)
> to the new version.  That new tuple is both uncommitted and has locks, and we
> cannot easily sacrifice either property.  Do you see a way to extend your
> scheme to cover these needs?

No, I think that sinks it.  Good analysis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Bruce Momjian

On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
> Here's a first attempt at a README illustrating this.  I intend this to
> be placed in src/backend/access/heap/README.tuplock; the first three
> paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove
> those from there, directing people to this new file instead.  Is there
> something that you think should be covered more extensively (or at all)
> here?
...
> 
> When there is a single locker in a tuple, we can just store the locking info
> in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
> setting hint bits specifying the locking strength.  There is one exception
> here: since hint bit space is limited, we do not provide a separate hint bit
> for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
> that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
> presumably more commonly used due to being the standards-mandated locking
> mechanism, or heavily used by the RI code, so we want to provide fast paths
> for those.)

Are those tuple bits actually "hint" bits?  They seem quite a bit more
powerful than a "hint".

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Noah Misch

On Mon, Mar 12, 2012 at 01:28:11PM -0400, Robert Haas wrote:
> I spent some time thinking about this over the weekend, and I have an
> observation, and an idea.  Here's the observation: I believe that
> locking a tuple whose xmin is uncommitted is always a noop, because if
> it's ever possible for a transaction to wait for an XID that is part
> of its own transaction (exact same XID, or sub-XIDs of the same top
> XID), then a transaction could deadlock against itself.  I believe
> that this is not possible: if a transaction were to wait for an XID
> assigned to that same backend, then the lock manager would observe
> that an ExclusiveLock on the xid is already held, so the request for a
> ShareLock would be granted immediately.  I also don't believe there's
> any situation in which the existence of an uncommitted tuple fails to
> block another backend, but a lock on that same uncommitted tuple would
> have caused another backend to block.  If any of that sounds wrong,
> you can stop reading here (but please tell me why it sounds wrong).

When we lock an update-in-progress row, we walk the t_ctid chain and lock all
descendant tuples.  They may all have uncommitted xmins.  This is essential to
ensure that the final outcome of the updating transaction does not affect
whether the locking transaction has its KEY SHARE lock.  Similarly, when we
update a previously-locked tuple, we copy any locks (always KEY SHARE locks)
to the new version.  That new tuple is both uncommitted and has locks, and we
cannot easily sacrifice either property.  Do you see a way to extend your
scheme to cover these needs?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs

On Mon, Mar 12, 2012 at 6:14 PM, Robert Haas  wrote:

> Considering that nobody's done any work to resolve the uncertainty
> about whether the worst-case performance characteristics of this patch
> are acceptable, and considering further that it was undergoing massive
> code churn for more than a month after the final CommitFest, I think
> it's not that unreasonable to think it might not be ready for prime
> time at this point.

Thank you for cutting to the chase.

The "uncertainty" of which you speak is a theoretical point you
raised. It has been explained, but nobody has yet shown the
performance numbers to illustrate the point but only because they
seemed so clear. I would point out that you haven't demonstrated the
existence of a problem either, so redesigning something without any
proof of a problem seems a strange.

Let me explain again what this patch does and why it has such major
performance benefit.

This feature give us a step change in lock reductions from FKs. A real
world "best case" might be to examine the benefit this patch has on a
large batch load that inserts many new orders for existing customers.
In my example case the orders table has a FK to the customer table. At
the same time as the data load, we attempt to update a customer's
additional details, address or current balance etc. The large load
takes locks on the customer table and keeps them for the whole
transaction. So the customer updates are locked out for multiple
seconds, minutes or maybe hours, depending upon how far you want to
stretch the example. With this patch the customer updates don't cause
lock conflicts but they require mxact lookups in *some* cases, so they
might take 1-10ms extra, rather than 1-10 minutes more. >1000x faster.
The only case that causes the additional lookups is the case that
otherwise would have been locked. So producing "best case" results is
trivial and can be as enormous as you like.

I agree with you that some worst case performance tests should be
done. Could you please say what you think the worst cases would be, so
those can be tested? That would avoid wasting time or getting anything
backwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 1:50 PM, Simon Riggs  wrote:
> On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas  wrote:
>> In other words, we'd entirely avoid needing to make mxacts crash-safe,
>> and we'd avoid most of the extra SLRU lookups that the current
>> implementation requires; they'd only be needed when (and for as long
>> as) the locked tuple was not yet all-visible.
>
> The current implementation only requires additional lookups in the
> update/check case, which is the case that doesn't do anything other
> than block right now. Since we're replacing lock contention with
> physical access contention even the worst case situation is still
> better than what we have now. Please feel free to point out worst case
> situations and show that isn't true.

I think I already have:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01258.php

The case I'm worried about is where we are allocating mxids quickly,
and we end up having to wait for fsyncs on mxact segments.  That might
be very slow, but you could argue that it could *possibly* be still
worthwhile if it avoids blocking.  That doesn't strike me as a
slam-dunk, though, because we've already seen and fixed cases where
too many fsyncs causes the performance of the entire system to go down
the tubes (cf. commit 7f242d880b5b5d9642675517466d31373961cf98).  But
it's really bad if there are no updates on the parent table - then,
whatever extra overhead there is will be all for naught, since the
more fine-grained locking doesn't help anyway.

> I've also pointed out how to avoid overhead of making mxacts crash
> safe when the new facilities are not in use, so I don't see problems
> with the proposed mechanism. Given that I am still myself reviewing
> the actual code.

The closest thing I can find to a proposal from you in that regard is
this comment:

# I was really thinking we could skip the fsync of a page if we've not
# persisted anything important on that page, since that was one of
# Robert's performance points.

It might be possible to do something with that idea, but at the moment
I'm not seeing how to make it work.

> So those things are not something we need to avoid.
>
> My feeling is that overwriting xmin is a clever idea, but arrives too
> late to require sensible analysis in this stage of the CF. It's not
> solving a problem, its just an alternate mechanism and at best an
> optimisation of the mechanism. Were we to explore it now, it seems
> certain that another person would observe that design were taking
> place and so the patch should be rejected, which would be unnecessary
> and wasteful.

Considering that nobody's done any work to resolve the uncertainty
about whether the worst-case performance characteristics of this patch
are acceptable, and considering further that it was undergoing massive
code churn for more than a month after the final CommitFest, I think
it's not that unreasonable to think it might not be ready for prime
time at this point.  In any event, your argument is exactly backwards:
we need to first decide whether the patch needs a redesign and then,
if it does, postpone it.  Deciding that we don't want to postpone it
first, and therefore we're not going to redesign it even if that is
what's really needed makes no sense.

> I also think it would alter our ability to diagnose
> problems, not least the normal test that xmax matches xmin across an
> update.

There's nothing stopping the new tuple from being frozen before the
old one, even today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs

On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas  wrote:

> In other words, we'd entirely avoid needing to make mxacts crash-safe,
> and we'd avoid most of the extra SLRU lookups that the current
> implementation requires; they'd only be needed when (and for as long
> as) the locked tuple was not yet all-visible.

The current implementation only requires additional lookups in the
update/check case, which is the case that doesn't do anything other
than block right now. Since we're replacing lock contention with
physical access contention even the worst case situation is still
better than what we have now. Please feel free to point out worst case
situations and show that isn't true.

I've also pointed out how to avoid overhead of making mxacts crash
safe when the new facilities are not in use, so I don't see problems
with the proposed mechanism. Given that I am still myself reviewing
the actual code.

So those things are not something we need to avoid.

My feeling is that overwriting xmin is a clever idea, but arrives too
late to require sensible analysis in this stage of the CF. It's not
solving a problem, its just an alternate mechanism and at best an
optimisation of the mechanism. Were we to explore it now, it seems
certain that another person would observe that design were taking
place and so the patch should be rejected, which would be unnecessary
and wasteful. I also think it would alter our ability to diagnose
problems, not least the normal test that xmax matches xmin across an
update.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas

On Sun, Feb 26, 2012 at 9:47 PM, Robert Haas  wrote:
>> Regarding performance, the good thing about this patch is that if you
>> have an operation that used to block, it might now not block.  So maybe
>> multixact-related operation is a bit slower than before, but if it
>> allows you to continue operating rather than sit waiting until some
>> other transaction releases you, it's much better.
>
> That's probably true, although there is some deferred cost that is
> hard to account for.  You might not block immediately, but then later
> somebody might block either because the mxact SLRU now needs fsyncs or
> because they've got to decode an mxid long after the relevant segment
> has been evicted from the SLRU buffers.  In general, it's hard to
> bound that latter cost, because you only avoid blocking once (when the
> initial update happens) but you might pay the extra cost of decoding
> the mxid as many times as the row is read, which could be arbitrarily
> many.  How much of a problem that is in practice, I'm not completely
> sure, but it has worried me before and it still does.  In the worst
> case scenario, a handful of frequently-accessed rows with MXIDs all of
> whose members are dead except for the UPDATE they contain could result
> in continual SLRU cache-thrashing.
>
> From a performance standpoint, we really need to think not only about
> the cases where the patch wins, but also, and maybe more importantly,
> the cases where it loses.  There are some cases where the current
> mechanism, use SHARE locks for foreign keys, is adequate.  In
> particular, it's adequate whenever the parent table is not updated at
> all, or only very lightly.  I believe that those people will pay
> somewhat more with this patch, and especially in any case where
> backends end up waiting for fsyncs in order to create new mxids, but
> also just because I think this patch will have the effect of
> increasing the space consumed by each individual mxid, which imposes a
> distributed cost of its own.

I spent some time thinking about this over the weekend, and I have an
observation, and an idea.  Here's the observation: I believe that
locking a tuple whose xmin is uncommitted is always a noop, because if
it's ever possible for a transaction to wait for an XID that is part
of its own transaction (exact same XID, or sub-XIDs of the same top
XID), then a transaction could deadlock against itself.  I believe
that this is not possible: if a transaction were to wait for an XID
assigned to that same backend, then the lock manager would observe
that an ExclusiveLock on the xid is already held, so the request for a
ShareLock would be granted immediately.  I also don't believe there's
any situation in which the existence of an uncommitted tuple fails to
block another backend, but a lock on that same uncommitted tuple would
have caused another backend to block.  If any of that sounds wrong,
you can stop reading here (but please tell me why it sounds wrong).

If it's right, then here's the idea: what if we stored mxids using
xmin rather than xmax?  This would mean that, instead of making mxids
contain the tuple's original xmax, they'd need to instead contain the
tuple's original xmin.  This might seem like rearranging the deck
chairs on the titanic, but I think it actually works out substantially
better, because if we can assume that the xmin is committed, then we
only need to know its exact value until it becomes older than
RecentGlobalXmin.  This means that a tuple can be both updated and
locked at the same time without the MultiXact SLRU needing to be
crash-safe, because if we crash and restart, any mxids that are still
around from before the crash are known to contain only xmins that are
now all-visible.  We therefore don't need their exact values, so it
doesn't matter if that data actually made it to disk.  Furthermore, in
the case where a previously-locked tuple is read repeatedly, we only
need to keep doing SLRU lookups until the xmin becomes all-visible;
after that, we can set a hint bit indicating that the tuple's xmin is
all-visible, and any future readers (or writers) can use that to skip
the SLRU lookup.  In the degenerate (and probably common) case where a
tuple is already all-visible at the time it's locked, we don't really
need to record the original xmin at all; we can still do so if
convenient, but we can set the xmin-all-visible hint right away, so
nobody needs to probe the SLRU just to get xmin.

In other words, we'd entirely avoid needing to make mxacts crash-safe,
and we'd avoid most of the extra SLRU lookups that the current
implementation requires; they'd only be needed when (and for as long
as) the locked tuple was not yet all-visible.

This also seems like it would make the anti-wraparound issues simpler
to handle - once an mxid is old enough that any xmin it contains must
be all-visible, we can simply overwrite the tuple's xmin with
FrozenXID, which is pretty much what we're already doing anyway.  It's
already the c

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs

n Wed, Mar 7, 2012 at 11:37 AM, Gokulakannan Somasundaram
 wrote:
>>
>> Please explain in detail your idea of how it will work.

> So we will take some kind of lock, which will stop such a happening.
...
> May be someone can come up with better ideas than this.

With respect, I don't call this a detailed explanation of an idea. For
consideration here, come up with a very detailed design of how your
suggestion will work. Think about it carefully, spend hours and days
thinking it through and when you are personally sure it is better than
what we have now, please raise it on list at an appropriate time. Bear
in mind that most people throw away 90% of their ideas before even
mentioning them here. I hope that helps you to contribute.

At the moment we're trying to review patches for specific code to
include or exclude, not discuss huge redesign of internal mechanisms
using broad brush descriptions. It is possible you may find an
improvement and if you do, people will be interested but that seems an
unlikely thing to happen here and now.

If you have specific comments or tests on this patch those are very welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram

>
>
> Please explain in detail your idea of how it will work.
>
>
OK. I will try to explain the abstract idea, i have.
a) Referential integrity gets violated, when there are referencing key
values, not present in the referenced key values. We are maintaining the
integrity by taking a Select for Share Lock during the foreign key checks,
 so that referred value is not updated/deleted during the operation.

b) We can do the same in the reverse way. When there is a update/delete of
the referred value, we don't want any new inserts with the referred value
in referring table, any update that will update its value to the referred
value being updated/deleted. So we will take some kind of lock, which will
stop such a happening. This can be achieved through
i) predicate locking infrastructure already present (or)
ii) a temporary B-Tree index ( no WAL protection ), that gets created only
for the referred value updations and holds those values that are being
updated/deleted (if we are scared of predicate locking).

So whenever we de foreign key checks, we just need to make sure there is no
such referential integrity lock in our own PGPROC structure(if implemented
with predicate locking) /  check the temporary B-Tree index for any entry
matching the entry that we are going to insert/update to.( the empty tree
can be tracked with a flag to optimize )

May be someone can come up with better ideas than this.

Gokul.

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs

On Wed, Mar 7, 2012 at 10:18 AM, Gokulakannan Somasundaram
 wrote:
>>
>> Insert, Update and Delete don't take locks they simply mark the tuples
>> they change with an xid. Anybody else wanting to "wait on the lock"
>> just waits on the xid. We do insert a lock row for each xid, but not
>> one per row changed.
>
> I mean the foreign key checks here. They take a Select for Share Lock right.
> That's what we are trying to optimize here. Or am i missing something? So by
> following the suggested methodology, the foreign key checks won't take any
> locks.

Please explain in detail your idea of how it will work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram

>
>
> Insert, Update and Delete don't take locks they simply mark the tuples
> they change with an xid. Anybody else wanting to "wait on the lock"
> just waits on the xid. We do insert a lock row for each xid, but not
> one per row changed.
>
I mean the foreign key checks here. They take a Select for Share Lock
right. That's what we are trying to optimize here. Or am i missing
something? So by following the suggested methodology, the foreign key
checks won't take any locks.


> It's worked that way for 5 years, so its too late to modify it now and
> this patch won't change that.
>
> The way we do RI locking is designed to prevent holding that in memory
> and then having the lock table overflow, which then either requires us
> to revert to the current design or upgrade to table level locks to
> save space in the lock table - which is a total disaster, if you've
> ever worked with DB2.
>
> What you're suggesting is that we store the locks in memory only as a
> way of avoiding updating the row.
>
> But that memory would be consumed, only when someone updates the
referenced column( which will usually be the primary key of the referenced
table). Any normal database programmer knows that updating primary key is
not good for performance. So we go by the same logic.


> No, updates of referenced columns are exactly the same as now when no
> RI checks happening.
>
> If the update occurs when an RI check takes place there is more work
> to do, but previously it would have just blocked and done nothing. So
> that path is relatively heavyweight but much better than nothing.
>
> As i have already said, that path is definitely heavy weight( like how
Robert has made the DDL path heavy weight). If we assume that DDLs are
going to be a rare phenomenon, then we can also assume that update of
primary keys is a rare phenomenon in a normal database.


>
> The most useful way to help with this patch right now is to run
> performance investigations and see if there are non-optimal cases. We
> can then see how the patch handles those. Theory is good, but it needs
> to drive experimentation, as I myself re-discover continually.
>
> I understand. I just wanted to know, whether the developer considered that
line of thought.

Thanks,
Gokul.

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs

On Wed, Mar 7, 2012 at 9:24 AM, Gokulakannan Somasundaram
 wrote:
> I feel sad, that i followed this topic very late. But i still want to put
> forward my views.
> Have we thought on the lines of how Robert has implemented relation level
> locks. In short it should go like this
>
> a) The locks for enforcing Referential integrity should be taken only when
> the rarest of the events( that would cause the integrity failure) occur.
> That would be the update of the referenced column. Other cases of update,
> delete and insert should not be required to take locks. In this way, we can
> reduce a lot of lock traffic.

Insert, Update and Delete don't take locks they simply mark the tuples
they change with an xid. Anybody else wanting to "wait on the lock"
just waits on the xid. We do insert a lock row for each xid, but not
one per row changed.

> So if we have a table like employee( empid, empname, ... depid references
> dept(deptid)) and table dept(depid depname).
>
> Currently we are taking shared locks on referenced rows in dept table,
> whenever we are updating something in the employee table. This should not
> happen. Instead any insert / update of referenced column / delete should
> check for some lock in its PGPROC structure, which will only get created
> when the depid gets updated / deleted( rare event )

It's worked that way for 5 years, so its too late to modify it now and
this patch won't change that.

The way we do RI locking is designed to prevent holding that in memory
and then having the lock table overflow, which then either requires us
to revert to the current design or upgrade to table level locks to
save space in the lock table - which is a total disaster, if you've
ever worked with DB2.

What you're suggesting is that we store the locks in memory only as a
way of avoiding updating the row.

My understanding is we have two optimisation choices. A single set of
xids can be used in many places, since the same set of transactions
may do roughly the same thing.

1. We could assign a new mxactid every time we touch a new row. That
way there is no correspondence between sets of xids, and we may hold
the same set many times. OTOH since each set is unique we can expand
it easily and we don't need to touch each row once for each lock. That
saves on row touches but it also greatly increases the mxactid
creation rate, which causes cache scrolling.

2. We assign a new mxactid each time we create a new unique set of
rows. We have a separate cache for local sets. This way reduces the
mxactid creation rate but causes row updates each time we lock the
row, which then needs WAL.

(2) is how we currently handle the very difficult decision of how to
optimise this for the general case. I'm not sure that is right in all
cases, but it is at least scalable and it is the devil we know.

> b) But the operation of update of the referenced column will be made more
> costly. May be it can create something like a predicate lock(used for
> enforcing serializable) and keep it in all the PG_PROC structures.

No, updates of referenced columns are exactly the same as now when no
RI checks happening.

If the update occurs when an RI check takes place there is more work
to do, but previously it would have just blocked and done nothing. So
that path is relatively heavyweight but much better than nothing.

> I know this is a abstract idea, but just wanted to know, whether we have
> thought on those lines.

Thanks for your thoughts.

The most useful way to help with this patch right now is to run
performance investigations and see if there are non-optimal cases. We
can then see how the patch handles those. Theory is good, but it needs
to drive experimentation, as I myself re-discover continually.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram

I feel sad, that i followed this topic very late. But i still want to put
forward my views.
Have we thought on the lines of how Robert has implemented relation level
locks. In short it should go like this

a) The locks for enforcing Referential integrity should be taken only when
the rarest of the events( that would cause the integrity failure) occur.
That would be the update of the referenced column. Other cases of update,
delete and insert should not be required to take locks. In this way, we can
reduce a lot of lock traffic.

So if we have a table like employee( empid, empname, ... depid references
dept(deptid)) and table dept(depid depname).

Currently we are taking shared locks on referenced rows in dept table,
whenever we are updating something in the employee table. This should not
happen. Instead any insert / update of referenced column / delete should
check for some lock in its PGPROC structure, which will only get created
when the depid gets updated / deleted( rare event )

b) But the operation of update of the referenced column will be made more
costly. May be it can create something like a predicate lock(used for
enforcing serializable) and keep it in all the PG_PROC structures.

I know this is a abstract idea, but just wanted to know, whether we have
thought on those lines.

Thanks,
Gokul.

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Robert Haas

On Tue, Mar 6, 2012 at 4:27 PM, Alvaro Herrera
 wrote:
> Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:
>>
>> Preliminary comment:
>>
>> This README is very helpful.
>
> Thanks.  I feel silly that I didn't write it earlier.
>
>> On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
>>  wrote:
>> > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
>> > super-exclusive locking (used to delete tuples and more generally to update
>> > tuples modifying the values of the columns that make up the key of the 
>> > tuple);
>> > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
>> > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
>> > mode
>> > that does not conflict with exclusive mode, but conflicts with SELECT FOR 
>> > KEY
>> > UPDATE.  This last mode implements a mode just strong enough to implement 
>> > RI
>> > checks, i.e. it ensures that tuples do not go away from under a check, 
>> > without
>> > blocking when some other transaction that want to update the tuple without
>> > changing its key.
>>
>> I feel like there is a naming problem here.  The semantics that have
>> always been associated with SELECT FOR UPDATE are now attached to
>> SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
>>  I think users will be surprised to find that SELECT FOR UPDATE
>> doesn't block all concurrent updates.
>
> I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
> block all updates.  It continues to block SELECT FOR SHARE as well.
> The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
> since those didn't exist before, it doesn't seem correct to consider
> that SELECT FOR UPDATE changed in any way.
>
> The main difference in the UPDATE behavior is that an UPDATE is regarded
> as though it might acquire two different lock modes -- it either
> acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
> UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
> consider that previous to this patch, what UPDATE did was always acquire
> a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
> weakened.

Ah, I see.  My mistake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs

On Tue, Mar 6, 2012 at 9:10 PM, Robert Haas  wrote:
> Preliminary comment:
>
> This README is very helpful.
>
> On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
>  wrote:
>> We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
>> super-exclusive locking (used to delete tuples and more generally to update
>> tuples modifying the values of the columns that make up the key of the 
>> tuple);
>> SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
>> implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
>> mode
>> that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
>> UPDATE.  This last mode implements a mode just strong enough to implement RI
>> checks, i.e. it ensures that tuples do not go away from under a check, 
>> without
>> blocking when some other transaction that want to update the tuple without
>> changing its key.
>
> I feel like there is a naming problem here.  The semantics that have
> always been associated with SELECT FOR UPDATE are now attached to
> SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
>  I think users will be surprised to find that SELECT FOR UPDATE
> doesn't block all concurrent updates.
>
> It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
> UPDATE, and what you're calling SELECT FOR UPDATE should be called
> something else - essentially NONKEY UPDATE, though I don't much like
> that name.

No, because that would stop it from doing what it is designed to do.

The lock modes are correct, appropriate and IMHO have meaningful
names. No redesign required here.

Not sure about the naming of some of the flag bits however.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:
> 
> Preliminary comment:
> 
> This README is very helpful.

Thanks.  I feel silly that I didn't write it earlier.

> On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
>  wrote:
> > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
> > super-exclusive locking (used to delete tuples and more generally to update
> > tuples modifying the values of the columns that make up the key of the 
> > tuple);
> > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
> > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
> > mode
> > that does not conflict with exclusive mode, but conflicts with SELECT FOR 
> > KEY
> > UPDATE.  This last mode implements a mode just strong enough to implement RI
> > checks, i.e. it ensures that tuples do not go away from under a check, 
> > without
> > blocking when some other transaction that want to update the tuple without
> > changing its key.
> 
> I feel like there is a naming problem here.  The semantics that have
> always been associated with SELECT FOR UPDATE are now attached to
> SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
>  I think users will be surprised to find that SELECT FOR UPDATE
> doesn't block all concurrent updates.

I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
block all updates.  It continues to block SELECT FOR SHARE as well.
The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
since those didn't exist before, it doesn't seem correct to consider
that SELECT FOR UPDATE changed in any way.

The main difference in the UPDATE behavior is that an UPDATE is regarded
as though it might acquire two different lock modes -- it either
acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
consider that previous to this patch, what UPDATE did was always acquire
a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
weakened.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Robert Haas

Preliminary comment:

This README is very helpful.

On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 wrote:
> We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
> super-exclusive locking (used to delete tuples and more generally to update
> tuples modifying the values of the columns that make up the key of the tuple);
> SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
> implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
> that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
> UPDATE.  This last mode implements a mode just strong enough to implement RI
> checks, i.e. it ensures that tuples do not go away from under a check, without
> blocking when some other transaction that want to update the tuple without
> changing its key.

I feel like there is a naming problem here.  The semantics that have
always been associated with SELECT FOR UPDATE are now attached to
SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
 I think users will be surprised to find that SELECT FOR UPDATE
doesn't block all concurrent updates.

It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
UPDATE, and what you're calling SELECT FOR UPDATE should be called
something else - essentially NONKEY UPDATE, though I don't much like
that name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs

On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera
 wrote:

> We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
> super-exclusive locking (used to delete tuples and more generally to update
> tuples modifying the values of the columns that make up the key of the tuple);
> SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
> implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
> that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
> UPDATE.  This last mode implements a mode just strong enough to implement RI
> checks, i.e. it ensures that tuples do not go away from under a check, without
> blocking when some other transaction that want to update the tuple without
> changing its key.

So there are 4 lock types, but we only have room for 3 on the tuple
header, so we store the least common/deprecated of the 4 types as a
multixactid. Some rewording would help there.

Neat scheme!


My understanding is that all of theses workloads will change

* Users of explicit SHARE lockers will be slightly worse in the case
of the 1st locker, but then after that they'll be the same as before.

* Updates against an RI locked table will be dramatically faster
because of reduced lock waits


...and that these previous workloads are effectively unchanged:

* Stream of RI checks causes mxacts

* Multi row deadlocks still possible

* Queues of writers still wait in the same way

* Deletes don't cause mxacts unless by same transaction



> In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
> locked in shared mode by multiple transactions.  This is no longer the case; a
> MultiXact may contain an update or delete Xid.  (Keep in mind that tuple locks
> in a transaction do not conflict with other tuple locks in the same
> transaction, so it's possible to have otherwise conflicting locks in a
> MultiXact if they belong to the same transaction).

Somewhat confusing, but am getting there.

> Note that each lock is attributed to the subtransaction that acquires it.
> This means that a subtransaction that aborts is seen as though it releases the
> locks it acquired; concurrent transactions can then proceed without having to
> wait for the main transaction to finish.  It also means that a subtransaction
> can upgrade to a stronger lock level than an earlier transaction had, and if
> the subxact aborts, the earlier, weaker lock is kept.

OK

> The possibility of having an update within a MultiXact means that they must
> persist across crashes and restarts: a future reader of the tuple needs to
> figure out whether the update committed or aborted.  So we have a requirement
> that pg_multixact needs to retain pages of its data until we're certain that
> the MultiXacts in them are no longer of interest.

I think the "no longer of interest" aspect needs to be tracked more
closely because it will necessarily lead to more I/O.

If we store the LSN on each mxact page, as I think we need to, we can
get rid of pages more quickly if we know they don't have an LSN set.
So its possible we can optimise that more.

> VACUUM is in charge of removing old MultiXacts at the time of tuple freezing.

You mean mxact segments?

Surely we set hint bits on tuples same as now? Hope so.

> This works in the same way that pg_clog segments are removed: we have a
> pg_class column that stores the earliest multixact that could possibly be
> stored in the table; the minimum of all such values is stored in a pg_database
> column.  VACUUM computes the minimum across all pg_database values, and
> removes pg_multixact segments older than the minimum.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs

On Mon, Mar 5, 2012 at 8:35 PM, Simon Riggs  wrote:

>>> * Why do we need multixact to be persistent? Do we need every page of
>>> multixact to be persistent, or just particular pages in certain
>>> circumstances?
>>
>> Any page that contains at least one multi with an update as a member
>> must persist.  It's possible that some pages contain no update (and this
>> is even likely in some workloads, if updates are rare), but I'm not sure
>> it's worth complicating the code to cater for early removal of some
>> pages.
>
> If the multixact contains an xid and that is being persisted then you
> need to set an LSN to ensure that a page writes causes an XLogFlush()
> before the multixact write. And you need to set do_fsync, no? Or
> explain why not in comments...
>
> I was really thinking we could skip the fsync of a page if we've not
> persisted anything important on that page, since that was one of
> Robert's performance points.

We need to increase these values to 32 as well

 #define NUM_MXACTOFFSET_BUFFERS8
 #define NUM_MXACTMEMBER_BUFFERS16

using same logic as for clog.

We're using 25% more space and we already know clog benefits from
increasing them, so there's little doubt we need it here also, since
we are increasing the access rate and potentially the longevity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012:

> It does however, illustrate my next review comment which is that the
> comments and README items are sorely lacking here. It's quite hard to
> see how it works, let along comment on major design decisions. It
> would help myself and others immensely if we could improve that.

Here's a first attempt at a README illustrating this.  I intend this to
be placed in src/backend/access/heap/README.tuplock; the first three
paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove
those from there, directing people to this new file instead.  Is there
something that you think should be covered more extensively (or at all)
here?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Locking tuples
--

Because the shared-memory lock table is of finite size, but users could
reasonably want to lock large numbers of tuples, we do not rely on the
standard lock manager to store tuple-level locks over the long term.  Instead,
a tuple is marked as locked by setting the current transaction's XID as its
XMAX, and setting additional infomask bits to distinguish this usage from the
more normal case of having deleted the tuple.  When multiple transactions
concurrently lock a tuple, a MultiXact is used; see below.

When it is necessary to wait for a tuple-level lock to be released, the basic
delay is provided by XactLockTableWait or MultiXactIdWait on the contents of
the tuple's XMAX.  However, that mechanism will release all waiters
concurrently, so there would be a race condition as to which waiter gets the
tuple, potentially leading to indefinite starvation of some waiters.  The
possibility of share-locking makes the problem much worse --- a steady stream
of share-lockers can easily block an exclusive locker forever.  To provide
more reliable semantics about who gets a tuple-level lock first, we use the
standard lock manager.  The protocol for waiting for a tuple-level lock is
really

 LockTuple()
 XactLockTableWait()
 mark tuple as locked by me
 UnlockTuple()

When there are multiple waiters, arbitration of who is to get the lock next
is provided by LockTuple().  However, at most one tuple-level lock will
be held or awaited per backend at any time, so we don't risk overflow
of the lock table.  Note that incoming share-lockers are required to
do LockTuple as well, if there is any conflict, to ensure that they don't
starve out waiting exclusive-lockers.  However, if there is not any active
conflict for a tuple, we don't incur any extra overhead.

We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
super-exclusive locking (used to delete tuples and more generally to update
tuples modifying the values of the columns that make up the key of the tuple);
SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
UPDATE.  This last mode implements a mode just strong enough to implement RI
checks, i.e. it ensures that tuples do not go away from under a check, without
blocking when some other transaction that want to update the tuple without
changing its key.

The conflict table is:

KEY UPDATEUPDATESHAREKEY SHARE
KEY UPDATE   conflictconflict  conflict  conflict
UPDATE   conflictconflict  conflict
SHAREconflictconflict
KEY SHAREconflict

When there is a single locker in a tuple, we can just store the locking info
in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
setting hint bits specifying the locking strength.  There is one exception
here: since hint bit space is limited, we do not provide a separate hint bit
for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
presumably more commonly used due to being the standards-mandated locking
mechanism, or heavily used by the RI code, so we want to provide fast paths
for those.)

MultiXacts
--

A tuple header provides very limited space for storing information about tuple
locking and updates: there is room only for a single Xid and a small number of
hint bits.  Whenever we need to store more than one lock, we replace the first
locker's Xid with a new MultiXactId.  Each MultiXact provides extended locking
data; it comprises an array of Xids plus some flags bits for each one.  The
flags are currently used to store the locking strength of each member
transaction.  (The flags also distinguish a pure locker from an actual
updater.)

In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
locked in shared mode by multiple transactions.  This is no longe

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs

On Mon, Mar 5, 2012 at 7:53 PM, Alvaro Herrera
 wrote:

>> My other comments so far are
>>
>> * some permutations commented out - no comments as to why
>> Something of a fault with the isolation tester that it just shows
>> output, there's no way to record expected output in the spec
>
> The reason they are commented out is that they are "invalid", that is,
> it requires running a command on a session that's blocked in the
> previous command.  Obviously, that cannot happen in real life.
>
> isolationtester now has support for detecting such conditions; if the
> spec specifies running a command in a locked session, the permutation is
> killed with an error message "invalid permutation" and just continues
> with the next permutation.  It used to simply die, aborting the test.
> Maybe we could just modify the specs so that all permutations are there
> (this can be done by simply removing the permutation lines), and the
> "invalid permutation" messages are part of the expected file.  Would
> that be better?

It would be better to have an isolation tester mode that checks to see
it was invalid and if not, report that.

At the moment we can't say why you commented something out. There's no
comment or explanation, and we need something, otherwise 3 years from
now we'll be completely in the dark.


>> Comments required for these points
>>
>> * Why do we need multixact to be persistent? Do we need every page of
>> multixact to be persistent, or just particular pages in certain
>> circumstances?
>
> Any page that contains at least one multi with an update as a member
> must persist.  It's possible that some pages contain no update (and this
> is even likely in some workloads, if updates are rare), but I'm not sure
> it's worth complicating the code to cater for early removal of some
> pages.

If the multixact contains an xid and that is being persisted then you
need to set an LSN to ensure that a page writes causes an XLogFlush()
before the multixact write. And you need to set do_fsync, no? Or
explain why not in comments...

I was really thinking we could skip the fsync of a page if we've not
persisted anything important on that page, since that was one of
Robert's performance points.


>> * Why do we need to expand multixact with flags? Can we avoid that in
>> some cases?
>
> Did you read my blog post?
> http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/
> This explains the reason -- the point is that we need to distinguish the
> lock strength acquired by each locker.

Thanks, I will, but it all belongs in a README please.


>> * Why do we need to store just single xids in multixact members?
>> Didn't understand comments, no explanation
>
> This is just for SELECT FOR SHARE.  We don't have a hint bit to indicate
> "this tuple has a for-share lock", so we need to create a multi for it.
> Since FOR SHARE is probably going to be very uncommon, this isn't likely
> to be a problem.  We're mainly catering for users of SELECT FOR SHARE so
> that it continues to work, i.e. maintain backwards compatibility.

Good, thanks.

Are we actively recommending people use FOR KEY SHARE rather than FOR
SHARE, in explicit use?

> (Maybe I misunderstood your question -- what I think you're asking is,
> "why are there some multixacts that have a single member?")
>
> I'll try to come up with a good place to add some paragraphs about all
> this.  Please let me know if answers here are unclear and/or you have
> further questions.

Thanks

I think we need to define some test workloads to measure the
performance impact of this patch. We need to be certain that it has a
good impact in target cases, plus a known impact in other cases.

Suggest

* basic pgbench - no RI

* inserts into large table, RI checks to small table, no activity on small table

* large table parent, large table: child
20 child rows per parent, fk from child to parent
updates of multiple children at same time
low/medium/heavy locking

* large table parent, large table: child
20 child rows per parent,fk from child to parent
updates of parent and child at same time
low/medium/heavy locking

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012:
> On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera
>  wrote:

> It does however, illustrate my next review comment which is that the
> comments and README items are sorely lacking here. It's quite hard to
> see how it works, let along comment on major design decisions. It
> would help myself and others immensely if we could improve that.

Hm.  Okay.

> Is there a working copy on a git repo? Easier than waiting for next
> versions of a patch.

No, I don't have an external mirror of my local repo.

> My other comments so far are
> 
> * some permutations commented out - no comments as to why
> Something of a fault with the isolation tester that it just shows
> output, there's no way to record expected output in the spec

The reason they are commented out is that they are "invalid", that is,
it requires running a command on a session that's blocked in the
previous command.  Obviously, that cannot happen in real life.

isolationtester now has support for detecting such conditions; if the
spec specifies running a command in a locked session, the permutation is
killed with an error message "invalid permutation" and just continues
with the next permutation.  It used to simply die, aborting the test.
Maybe we could just modify the specs so that all permutations are there
(this can be done by simply removing the permutation lines), and the
"invalid permutation" messages are part of the expected file.  Would
that be better?

> Comments required for these points
> 
> * Why do we need multixact to be persistent? Do we need every page of
> multixact to be persistent, or just particular pages in certain
> circumstances?

Any page that contains at least one multi with an update as a member
must persist.  It's possible that some pages contain no update (and this
is even likely in some workloads, if updates are rare), but I'm not sure
it's worth complicating the code to cater for early removal of some
pages.

> * Why do we need to expand multixact with flags? Can we avoid that in
> some cases?

Did you read my blog post?
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/
This explains the reason -- the point is that we need to distinguish the
lock strength acquired by each locker.

> * Why do we need to store just single xids in multixact members?
> Didn't understand comments, no explanation

This is just for SELECT FOR SHARE.  We don't have a hint bit to indicate
"this tuple has a for-share lock", so we need to create a multi for it.
Since FOR SHARE is probably going to be very uncommon, this isn't likely
to be a problem.  We're mainly catering for users of SELECT FOR SHARE so
that it continues to work, i.e. maintain backwards compatibility.

(Maybe I misunderstood your question -- what I think you're asking is,
"why are there some multixacts that have a single member?")

I'll try to come up with a good place to add some paragraphs about all
this.  Please let me know if answers here are unclear and/or you have
further questions.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs

On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:
>>
>> On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas  wrote:
>
>> > From a performance standpoint, we really need to think not only about
>> > the cases where the patch wins, but also, and maybe more importantly,
>> > the cases where it loses.  There are some cases where the current
>> > mechanism, use SHARE locks for foreign keys, is adequate.  In
>> > particular, it's adequate whenever the parent table is not updated at
>> > all, or only very lightly.  I believe that those people will pay
>> > somewhat more with this patch, and especially in any case where
>> > backends end up waiting for fsyncs in order to create new mxids, but
>> > also just because I think this patch will have the effect of
>> > increasing the space consumed by each individual mxid, which imposes a
>> > distributed cost of its own.
>>
>> That is a concern also.
>>
>> It's taken me a while reviewing the patch to realise that space usage
>> is actually 4 times worse than before.
>
> Eh.  You're probably misreading something.  Previously each member of a
> multixact used 4 bytes (the size of an Xid).  With the current patch a
> member uses 5 bytes (same plus a flags byte).  An earlier version used
> 4.25 bytes per multi, which I increased to leave space for future
> expansion.
>
> So it's 1.25x worse, not 4x worse.

Thanks for correcting me. That sounds better.

It does however, illustrate my next review comment which is that the
comments and README items are sorely lacking here. It's quite hard to
see how it works, let along comment on major design decisions. It
would help myself and others immensely if we could improve that.

Is there a working copy on a git repo? Easier than waiting for next
versions of a patch.

My other comments so far are

* some permutations commented out - no comments as to why
Something of a fault with the isolation tester that it just shows
output, there's no way to record expected output in the spec

Comments required for these points

* Why do we need multixact to be persistent? Do we need every page of
multixact to be persistent, or just particular pages in certain
circumstances?

* Why do we need to expand multixact with flags? Can we avoid that in
some cases?

* Why do we need to store just single xids in multixact members?
Didn't understand comments, no explanation

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:
> 
> On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas  wrote:

> > From a performance standpoint, we really need to think not only about
> > the cases where the patch wins, but also, and maybe more importantly,
> > the cases where it loses.  There are some cases where the current
> > mechanism, use SHARE locks for foreign keys, is adequate.  In
> > particular, it's adequate whenever the parent table is not updated at
> > all, or only very lightly.  I believe that those people will pay
> > somewhat more with this patch, and especially in any case where
> > backends end up waiting for fsyncs in order to create new mxids, but
> > also just because I think this patch will have the effect of
> > increasing the space consumed by each individual mxid, which imposes a
> > distributed cost of its own.
> 
> That is a concern also.
> 
> It's taken me a while reviewing the patch to realise that space usage
> is actually 4 times worse than before.

Eh.  You're probably misreading something.  Previously each member of a
multixact used 4 bytes (the size of an Xid).  With the current patch a
member uses 5 bytes (same plus a flags byte).  An earlier version used
4.25 bytes per multi, which I increased to leave space for future
expansion.

So it's 1.25x worse, not 4x worse.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs

On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas  wrote:
> On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera
>  wrote:
>>> This
>>> seems like a horrid mess that's going to be unsustainable both from a
>>> complexity and a performance standpoint.  The only reason multixacts
>>> were tolerable at all was that they had only one semantics.  Changing
>>> it so that maybe a multixact represents an actual updater and maybe
>>> it doesn't is not sane.
>>
>> As far as complexity, yeah, it's a lot more complex now -- no question
>> about that.
>>
>> Regarding performance, the good thing about this patch is that if you
>> have an operation that used to block, it might now not block.  So maybe
>> multixact-related operation is a bit slower than before, but if it
>> allows you to continue operating rather than sit waiting until some
>> other transaction releases you, it's much better.
>
> That's probably true, although there is some deferred cost that is
> hard to account for.  You might not block immediately, but then later
> somebody might block either because the mxact SLRU now needs fsyncs or
> because they've got to decode an mxid long after the relevant segment
> has been evicted from the SLRU buffers.  In general, it's hard to
> bound that latter cost, because you only avoid blocking once (when the
> initial update happens) but you might pay the extra cost of decoding
> the mxid as many times as the row is read, which could be arbitrarily
> many.  How much of a problem that is in practice, I'm not completely
> sure, but it has worried me before and it still does.  In the worst
> case scenario, a handful of frequently-accessed rows with MXIDs all of
> whose members are dead except for the UPDATE they contain could result
> in continual SLRU cache-thrashing.

Cases I regularly see involve wait times of many seconds.

When this patch helps, it will help performance by algorithmic gains,
so perhaps x10-100.

That can and should be demonstrated though, I agree.

> From a performance standpoint, we really need to think not only about
> the cases where the patch wins, but also, and maybe more importantly,
> the cases where it loses.  There are some cases where the current
> mechanism, use SHARE locks for foreign keys, is adequate.  In
> particular, it's adequate whenever the parent table is not updated at
> all, or only very lightly.  I believe that those people will pay
> somewhat more with this patch, and especially in any case where
> backends end up waiting for fsyncs in order to create new mxids, but
> also just because I think this patch will have the effect of
> increasing the space consumed by each individual mxid, which imposes a
> distributed cost of its own.

That is a concern also.

It's taken me a while reviewing the patch to realise that space usage
is actually 4 times worse than before.

> I think we should avoid having a theoretical argument about how
> serious these problems are; instead, you should try to construct
> somewhat-realistic worst case scenarios and benchmark them.  Tom's
> complaint about code complexity is basically a question of opinion, so
> I don't know how to evaluate that objectively, but performance is
> something we can measure.  We might still disagree on the
> interpretation of the results, but I still think having some real
> numbers to talk about based on carefully-thought-out test cases would
> advance the debate.

It's a shame that the isolation tester can't be used directly by
pgbench - I think we need something similar for performance regression
testing.

So yes, performance testing is required.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Simon Riggs

On Tue, Feb 28, 2012 at 12:28 AM, Noah Misch  wrote:
> On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote:
>> On 23.02.2012 18:01, Alvaro Herrera wrote:
>>> As far as complexity, yeah, it's a lot more complex now -- no question
>>> about that.
>>
>> How about assigning a new, real, transaction id, to represent the group
>> of transaction ids. The new transaction id would be treated as a
>> subtransaction of the updater, and the xids of the lockers would be
>> stored in the multixact-members slru. That way the multixact structures
>> wouldn't need to survive a crash; you don't care about the shared
>> lockers after a crash, and the xid of the updater would be safely stored
>> as is in the xmax field.
>>
>> That way you wouldn't need to handle multixact wraparound, because we
>> already handle xid wraparound, and you wouldn't need to make multixact
>> slrus crash-safe.
>>
>> Not sure what the performance implications would be. You would use up
>> xids more quickly, which would require more frequent anti-wraparound
>> vacuuming. And if we just start using real xids as the key to
>> multixact-offsets slru, we would need to extend that a lot more often.
>> But I feel it would probably be acceptable.
>
> When a key locker arrives after the updater and creates this implicit
> subtransaction of the updater, how might you arrange for the xid's clog status
> to eventually get updated in accordance with the updater's outcome?

Somewhat off-topic, but just seen another bad case of FK lock contention.

Thanks for working on this everybody.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Noah Misch

On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote:
> On 23.02.2012 18:01, Alvaro Herrera wrote:
>> As far as complexity, yeah, it's a lot more complex now -- no question
>> about that.
>
> How about assigning a new, real, transaction id, to represent the group  
> of transaction ids. The new transaction id would be treated as a  
> subtransaction of the updater, and the xids of the lockers would be  
> stored in the multixact-members slru. That way the multixact structures  
> wouldn't need to survive a crash; you don't care about the shared  
> lockers after a crash, and the xid of the updater would be safely stored  
> as is in the xmax field.
>
> That way you wouldn't need to handle multixact wraparound, because we  
> already handle xid wraparound, and you wouldn't need to make multixact  
> slrus crash-safe.
>
> Not sure what the performance implications would be. You would use up  
> xids more quickly, which would require more frequent anti-wraparound  
> vacuuming. And if we just start using real xids as the key to  
> multixact-offsets slru, we would need to extend that a lot more often.  
> But I feel it would probably be acceptable.

When a key locker arrives after the updater and creates this implicit
subtransaction of the updater, how might you arrange for the xid's clog status
to eventually get updated in accordance with the updater's outcome?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Heikki Linnakangas

On 23.02.2012 18:01, Alvaro Herrera wrote:

Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012:

Alvaro Herrera writes:

Sure. The problem is that we are allowing updated rows to be locked (and
locked rows to be updated). This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.

(In the previous code, a multixact never meant an update, it always
signified only shared locks. After a crash, all backends that could
have been holding locks must necessarily be gone, so the multixact info
is not interesting and can be treated like the tuple is simply live.)

Ugh. I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.

Uhm, yeah, a lot earlier -- I initially blogged about this in August
last year:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/

and in several posts in pgsql-hackers.

This
seems like a horrid mess that's going to be unsustainable both from a
complexity and a performance standpoint. The only reason multixacts
were tolerable at all was that they had only one semantics. Changing
it so that maybe a multixact represents an actual updater and maybe
it doesn't is not sane.

As far as complexity, yeah, it's a lot more complex now -- no question
about that.

How about assigning a new, real, transaction id, to represent the group
of transaction ids. The new transaction id would be treated as a
subtransaction of the updater, and the xids of the lockers would be
stored in the multixact-members slru. That way the multixact structures
wouldn't need to survive a crash; you don't care about the shared
lockers after a crash, and the xid of the updater would be safely stored
as is in the xmax field.

That way you wouldn't need to handle multixact wraparound, because we
already handle xid wraparound, and you wouldn't need to make multixact
slrus crash-safe.

Not sure what the performance implications would be. You would use up
xids more quickly, which would require more frequent anti-wraparound
vacuuming. And if we just start using real xids as the key to
multixact-offsets slru, we would need to extend that a lot more often.
But I feel it would probably be acceptable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-26 Thread Robert Haas

On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera
 wrote:
>> This
>> seems like a horrid mess that's going to be unsustainable both from a
>> complexity and a performance standpoint.  The only reason multixacts
>> were tolerable at all was that they had only one semantics.  Changing
>> it so that maybe a multixact represents an actual updater and maybe
>> it doesn't is not sane.
>
> As far as complexity, yeah, it's a lot more complex now -- no question
> about that.
>
> Regarding performance, the good thing about this patch is that if you
> have an operation that used to block, it might now not block.  So maybe
> multixact-related operation is a bit slower than before, but if it
> allows you to continue operating rather than sit waiting until some
> other transaction releases you, it's much better.

That's probably true, although there is some deferred cost that is
hard to account for.  You might not block immediately, but then later
somebody might block either because the mxact SLRU now needs fsyncs or
because they've got to decode an mxid long after the relevant segment
has been evicted from the SLRU buffers.  In general, it's hard to
bound that latter cost, because you only avoid blocking once (when the
initial update happens) but you might pay the extra cost of decoding
the mxid as many times as the row is read, which could be arbitrarily
many.  How much of a problem that is in practice, I'm not completely
sure, but it has worried me before and it still does.  In the worst
case scenario, a handful of frequently-accessed rows with MXIDs all of
whose members are dead except for the UPDATE they contain could result
in continual SLRU cache-thrashing.

From a performance standpoint, we really need to think not only about
the cases where the patch wins, but also, and maybe more importantly,
the cases where it loses.  There are some cases where the current
mechanism, use SHARE locks for foreign keys, is adequate.  In
particular, it's adequate whenever the parent table is not updated at
all, or only very lightly.  I believe that those people will pay
somewhat more with this patch, and especially in any case where
backends end up waiting for fsyncs in order to create new mxids, but
also just because I think this patch will have the effect of
increasing the space consumed by each individual mxid, which imposes a
distributed cost of its own.

I think we should avoid having a theoretical argument about how
serious these problems are; instead, you should try to construct
somewhat-realistic worst case scenarios and benchmark them.  Tom's
complaint about code complexity is basically a question of opinion, so
I don't know how to evaluate that objectively, but performance is
something we can measure.  We might still disagree on the
interpretation of the results, but I still think having some real
numbers to talk about based on carefully-thought-out test cases would
advance the debate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-25 Thread Kevin Grittner

Vik Reykja  wrote:
> Kevin Grittner wrote:
> 
>> One of the problems that Florian was trying to address is that
>> people often have a need to enforce something with a lot of
>> similarity to a foreign key, but with more subtle logic than
>> declarative foreign keys support.  One example would be the case
>> Robert has used in some presentations, where the manager column
>> in each row in a project table must contain the id of a row in a
>> person table *which has the project_manager boolean column set to
>> TRUE*.  Short of using the new serializable transaction isolation
>> level in all related transactions, hand-coding enforcement of
>> this useful invariant through trigger code (or application code
>> enforced through some framework) is very tricky.  The change to
>> SELECT FOR UPDATE that Florian was working on would make it
>> pretty straightforward.
> 
> I'm not sure what Florian's patch does, but I've been trying to
> advocate syntax like the following for this exact scenario:
> 
> foreign key (manager_id, true) references person (id, is_manager)
> 
> Basically, allow us to use constants instead of field names as
> part of foreign keys.

Interesting.  IMV, a declarative approach like that is almost always
better than the alternatives, so something like this (possibly with
different syntax) would be another step in the right direction.  I
suspect that there will always be a few corner cases where the
business logic required is too esoteric to be handled by a
generalized declarative construct, so I think Florian's idea still
has merit -- especially if we want to ease the transition to
PostgreSQL for large shops using other products.

> I have no idea what the implementation aspect of this is,
> but I need the user aspect of it and don't know the best way to
> get it.

There are those in the community who make their livings by helping
people get the features they want.  If you have some money to fund
development, I would bet you could get this addressed -- it sure
sounds reasonable to me.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-24 Thread Vik Reykja

On Thu, Feb 23, 2012 at 19:44, Kevin Grittner
wrote:

> One of the problems that Florian was trying to address is that
> people often have a need to enforce something with a lot of
> similarity to a foreign key, but with more subtle logic than
> declarative foreign keys support.  One example would be the case
> Robert has used in some presentations, where the manager column in
> each row in a project table must contain the id of a row in a person
> table *which has the project_manager boolean column set to TRUE*.
> Short of using the new serializable transaction isolation level in
> all related transactions, hand-coding enforcement of this useful
> invariant through trigger code (or application code enforced through
> some framework) is very tricky.  The change to SELECT FOR UPDATE
> that Florian was working on would make it pretty straightforward.
>

I'm not sure what Florian's patch does, but I've been trying to advocate
syntax like the following for this exact scenario:

foreign key (manager_id, true) references person (id, is_manager)

Basically, allow us to use constants instead of field names as part of
foreign keys.  I have no idea what the implementation aspect of this is,
but I need the user aspect of it and don't know the best way to get it.

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-24 Thread Jeroen Vermeulen


On 2012-02-23 22:12, Noah Misch wrote:


That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.


Ah, so there's the technical hitch.  From previous discussion I was 
under the impression that:


1. Foreign-key updates only inherently conflict with _key_ updates on 
the foreign side, and that non-key updates on the foreign side were just 
caught in the locking cross-fire, so to speak.


And

2. The DELETE case was somehow trivially accounted for.  But, for 
instance, there does not seem to be a lighter lock type that DELETE 
conflicts with but UPDATE does not.  Bummer.




By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.


No, to be clear, it was never meant to be.  Only a possible way to give 
users a way out of foreign-key locks more quickly.  Not a way to get 
some of the branch out to users more quickly.


At any rate, that seems to be moot then.  And to be honest, mechanisms 
designed for more than one purpose rarely pan out.


Thanks for explaining!


Jeroen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch

On Thu, Feb 23, 2012 at 12:43:11PM -0300, Alvaro Herrera wrote:
> Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012:
> > On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch  wrote:
> > 
> > >> All in all, I think this is in pretty much final shape. ??Only pg_upgrade
> > >> bits are still missing. ??If sharp eyes could give this a critical look
> > >> and knuckle-cracking testers could give it a spin, that would be
> > >> helpful.
> > >
> > > Lack of pg_upgrade support leaves this version incomplete, because that
> > > omission would constitute a blocker for beta 2. ??This version changes as 
> > > much
> > > code compared to the version I reviewed at the beginning of the 
> > > CommitFest as
> > > that version changed overall. ??In that light, it's time to close the 
> > > books on
> > > this patch for the purpose of this CommitFest; I'm marking it Returned 
> > > with
> > > Feedback. ??Thanks for your efforts thus far.
> 
> Now this is an interesting turn of events.  I must thank you for your
> extensive review effort in the current version of the patch, and also
> thank you and credit you for the idea that initially kicked this patch
> from the older, smaller, simpler version I wrote during the 9.1 timeline
> (which you also reviewed exhaustively).  Without your and Simon's
> brilliant ideas, this patch wouldn't exist at all.
> 
> I completely understand that you don't want to review this latest
> version of the patch; it's a lot of effort and I wouldn't inflict it on
> anybody who hasn't not volunteered.  However, it doesn't seem to me that
> this is reason to boot the patch from the commitfest.  I think the thing
> to do would be to remove yourself from the reviewers column and set it
> back to "needs review", so that other reviewers can pick it up.

It would indeed be wrong to change any patch from Needs Review to Returned
with Feedback on account of a personal distaste for reviewing the patch.  I
hope I did not harbor such a motive here.  Rather, this CommitFest has given
your patch its fair shake, and I and other reviewers would better serve the
CF's needs by reviewing, say, "ECPG FETCH readahead" instead of your latest
submission.  Likewise, you would better serve the CF by evaluating one of the
four non-committer patches that have been Ready for Committer since January.
That's not to imply that the goals of the CF align with my goals, your goals,
or broader PGDG goals.  The patch status on commitfest.postgresql.org does
exist solely for the advancement of the CF, and I have set it in accordingly.

> As for the late code churn, it mostly happened as a result of your
> own feedback; I would have left most of it in the original state, but as
> I went ahead it seemed much better to refactor things.  This is mostly
> in heapam.c.  As for multixact.c, it also had a lot of churn, but that
> was mostly to restore it to the state it has in the master branch,
> dropping much of the code I had written to handle multixact truncation.
> The new code there and in the vacuum code path (relminmxid and so on) is
> a lot smaller than that other code was, and it's closely based on
> relfrozenxid which is a known piece of technology.

I appreciate that.

> > However, review of such a large patch should not be simply pass or
> > fail. We should be looking back at the original problem and ask
> > ourselves whether some subset of the patch could solve a useful subset
> > of the problem. For me, that seems quite likely and this is very
> > definitely an important patch.

Incidentally, I find it harmful to think of "Returned with Feedback" as
"fail".  For large patches, it's healthier to think of a CF as a bimonthly
project status meeting with stakeholders.  When the project is done,
wonderful!  When there's work left, that's no great surprise.

> > Even if we can't solve some part of the problem we can at least commit
> > some useful parts of infrastructure to allow later work to happen more
> > smoothly and quickly.
> > 
> > So please let's not focus on the 100%, lets focus on 80/20.
> 
> Well, we have the patch I originally posted in the 9.1 timeframe.
> That's a lot smaller and simpler.  However, that solves only part of the
> blocking problem, and in particular it doesn't fix the initial deadlock
> reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case
> you wonder about his change of email address) that started this effort
> in the first place.  I don't think we can cut down to that and still
> satisfy the users that requested this; and Glue was just the first one,
> because after I started blogging about this, some more people started
> asking for it.
> 
> I don't think there's any useful middlepoint between that one and the
> current one, but maybe I'm wrong.

Nothing additional comes to my mind, either.  This patch is monolithic.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch

On Thu, Feb 23, 2012 at 06:36:42PM -0300, Alvaro Herrera wrote:
> 
> Excerpts from Noah Misch's message of mi?? feb 22 14:00:07 -0300 2012:
> > 
> > On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:
> 
> > On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
> > > On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
> > > > * Columns that are part of the key
> > > >   Noah thinks the set of columns should only consider those actually 
> > > > referenced
> > > >   by keys, not those that *could* be referenced.
> > > 
> > > Well, do you disagree?  To me it's low-hanging fruit, because it isolates 
> > > the
> > > UPDATE-time overhead of this patch to FK-referenced tables rather than all
> > > tables having a PK or PK-like index (often just "all tables").
> > 
> > You have not answered my question above.
> 
> Sorry.  The reason I didn't research this is that at the very start of
> the discussion it was said that having heapam.c figure out whether
> columns are being used as FK destinations or not would be more of a
> modularity violation than "indexed columns" already are for HOT support
> (this was a contentious issue for HOT, so I don't take it lightly.  I
> don't think I need any more reasons for Tom to object to this patch, or
> more bulk into it.  Both are already serious issues.)

That's fair.

> In any case, with the way we've defined FOR KEY SHARE locks (in the docs
> it's explicitely said that the set of columns considered could vary in
> the future), it's a relatively easy patch to add on top of what I've
> submitted.  Just as the ALTER TABLE bits to add columns to the set of
> columns considered, it could be left for a second pass on the issue.

Agreed.  Let's have that debate another day, as a follow-on patch.

Thanks for shedding this light.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Excerpts from Noah Misch's message of mié feb 22 14:00:07 -0300 2012:
> 
> On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:

> On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
> > On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
> > > * Columns that are part of the key
> > >   Noah thinks the set of columns should only consider those actually 
> > > referenced
> > >   by keys, not those that *could* be referenced.
> > 
> > Well, do you disagree?  To me it's low-hanging fruit, because it isolates 
> > the
> > UPDATE-time overhead of this patch to FK-referenced tables rather than all
> > tables having a PK or PK-like index (often just "all tables").
> 
> You have not answered my question above.

Sorry.  The reason I didn't research this is that at the very start of
the discussion it was said that having heapam.c figure out whether
columns are being used as FK destinations or not would be more of a
modularity violation than "indexed columns" already are for HOT support
(this was a contentious issue for HOT, so I don't take it lightly.  I
don't think I need any more reasons for Tom to object to this patch, or
more bulk into it.  Both are already serious issues.)

In any case, with the way we've defined FOR KEY SHARE locks (in the docs
it's explicitely said that the set of columns considered could vary in
the future), it's a relatively easy patch to add on top of what I've
submitted.  Just as the ALTER TABLE bits to add columns to the set of
columns considered, it could be left for a second pass on the issue.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch

On Thu, Feb 23, 2012 at 02:08:28PM +0100, Jeroen Vermeulen wrote:
> On 2012-02-23 10:18, Simon Riggs wrote:
>
>> However, review of such a large patch should not be simply pass or
>> fail. We should be looking back at the original problem and ask
>> ourselves whether some subset of the patch could solve a useful subset
>> of the problem. For me, that seems quite likely and this is very
>> definitely an important patch.
>>
>> Even if we can't solve some part of the problem we can at least commit
>> some useful parts of infrastructure to allow later work to happen more
>> smoothly and quickly.
>>
>> So please let's not focus on the 100%, lets focus on 80/20.
>
> The suggested immutable-column constraint was meant as a potential  
> "80/20 workaround."  Definitely not a full solution, helpful to some,  
> probably easier to do.  I don't know if an immutable key would actually  
> be enough to elide foreign-key locks though.

That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.

If you had both an immutable column constraint and a never-deleted table
constraint, that combination would be sufficient to simplify the picture.
(Directly or indirectly, it would not actually be a never-deleted constraint
so much as a "you must take AccessExclusiveLock to DELETE" constraint.)
Foreign-side DML would then take an AccessShareLock on the parent table with
no tuple lock at all.

By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner

Alvaro Herrera  wrote:
> Excerpts from Kevin Grittner's message:

>> Since the limitation on what can be stored in xmax was the killer
>> for Florian's attempt to support SELECT FOR UPDATE in a form
>> which was arguably more useful (and certainly more convenient for
>> those converting from other database products), I'm wondering
>> whether anyone has determined whether this new scheme would allow
>> Florian's work to be successfully completed.  The issues seem
>> very similar. If this approach also provides a basis for the
>> other work, I think it helps bolster the argument that this is a
>> good design; if not, I think it suggests that maybe it should be
>> made more general or extensible in some way.  Once this has to be
>> supported by pg_upgrade it will be harder to change the format,
>> if that is needed for some other feature.
> 
> I have no idea what improvements Florian was seeking, but
> multixacts now have plenty of bit flag space to indicate whatever
> we want for each member transaction, so most likely the answer is
> yes.  However we need to make clear that a single SELECT FOR
> UPDATE in a tuple does not currently use a multixact; if we wish
> to always store flags then we are forced to use one which incurs a
> performance hit.

Well, his effort really started to go into a tailspin on the related
issues here:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg01743.php

... with a summary of the problem and possible directions for a
solution here:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg01833.php

One of the problems that Florian was trying to address is that
people often have a need to enforce something with a lot of
similarity to a foreign key, but with more subtle logic than
declarative foreign keys support.  One example would be the case
Robert has used in some presentations, where the manager column in
each row in a project table must contain the id of a row in a person
table *which has the project_manager boolean column set to TRUE*. 
Short of using the new serializable transaction isolation level in
all related transactions, hand-coding enforcement of this useful
invariant through trigger code (or application code enforced through
some framework) is very tricky.  The change to SELECT FOR UPDATE
that Florian was working on would make it pretty straightforward.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Feb 23, 2012 at 4:01 PM, Alvaro Herrera
 wrote:

> As far as complexity, yeah, it's a lot more complex now -- no question
> about that.

As far as complexity goes, would it be easier if we treated the UPDATE
of a primary key column as a DELETE plus an INSERT?

There's not really a logical reason why updating a primary key has
meaning, so allowing an ExecPlanQual to follow the chain across
primary key values doesn't seem valid to me.

That would make all primary keys IMMUTABLE to updates.

No primary key, no problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith


On 02/23/2012 01:04 PM, Alvaro Herrera wrote:

"manual vacuum is teh sux0r"


I think you've just named my next conference talk submission.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Excerpts from Greg Smith's message of jue feb 23 14:48:13 -0300 2012:
> On 02/23/2012 10:43 AM, Alvaro Herrera wrote:
> > I completely understand that you don't want to review this latest
> > version of the patch; it's a lot of effort and I wouldn't inflict it on
> > anybody who hasn't not volunteered.  However, it doesn't seem to me that
> > this is reason to boot the patch from the commitfest.  I think the thing
> > to do would be to remove yourself from the reviewers column and set it
> > back to "needs review", so that other reviewers can pick it up.
> 
> This feature made Robert's list of serious CF concerns, too, and the 
> idea that majorly revised patches might be punted isn't a new one.

Well, this patch (or rather, a previous incarnation of it) got punted
from 9.1's fourth commitfest; I intended to have the new version in
9.2's first CF, but business reasons (which I will not discuss in
public) forced me otherwise.  So here we are again -- as I said to Tom,
I don't intend to let go of this one easily, though of course I will
concede to whatever the community decides.

> Noah 
> is certainly justified in saying you're off his community support list, 
> after all the review work he's been doing for this CF.

Yeah, I can't blame him.  I've been trying to focus most of my review
availability on his own patches precisely due to that, but it's very
clear to me that his effort is larger than mine.

> We here think it would be a shame for all of these other performance 
> bits to be sorted but still have this one loose though, if it's possible 
> to keep going on it.  It's well known as something on Simon's peeve list 
> for some time now.  I was just reading someone else ranting about how 
> this foreign key locking issue proves Postgres isn't "enterprise scale" 
> yesterday, it was part of an article proving why DB2 is worth paying for 
> I think.  This change crosses over into the advocacy area due to that, 
> albeit only for the people who have been burned by this already.

Yeah, Simon's been on this particular issue for quite some time -- which
is probably why the initial idea that kickstarted this patch was his.
Personally I've been in the "not enterprise strength" camp for a long
time, mostly unintentionally; you can see that by tracing how my major
patches close holes in that kind of area ("cluster loses indexes", "we
don't have subtransactions", "foreign key concurrency sucks" (--> SELECT
FOR SHARE), "manual vacuum is teh sux0r", and now this one about FKs
again).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith


On 02/23/2012 10:43 AM, Alvaro Herrera wrote:

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to "needs review", so that other reviewers can pick it up.


This feature made Robert's list of serious CF concerns, too, and the 
idea that majorly revised patches might be punted isn't a new one.  Noah 
is certainly justified in saying you're off his community support list, 
after all the review work he's been doing for this CF.


We here think it would be a shame for all of these other performance 
bits to be sorted but still have this one loose though, if it's possible 
to keep going on it.  It's well known as something on Simon's peeve list 
for some time now.  I was just reading someone else ranting about how 
this foreign key locking issue proves Postgres isn't "enterprise scale" 
yesterday, it was part of an article proving why DB2 is worth paying for 
I think.  This change crosses over into the advocacy area due to that, 
albeit only for the people who have been burned by this already.


If the main problem is pg_upgrade complexity, eventually progress on 
that front needs to be made.  I'm surprised the project has survived 
this long without needing anything beyond catalog conversion for 
in-place upgrade.  That luck won't hold forever.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Kevin Grittner's message of jue feb 23 13:31:36 -0300 2012:
> 
> Alvaro Herrera  wrote:
>  
> > As for sanity -- I regard multixacts as a way to store extended
> > Xmax information.  The original idea was obviously much more
> > limited in that the extended info was just locking info.  We've
> > extended it but I don't think it's such a stretch.
>  
> Since the limitation on what can be stored in xmax was the killer
> for Florian's attempt to support SELECT FOR UPDATE in a form which
> was arguably more useful (and certainly more convenient for those
> converting from other database products), I'm wondering whether
> anyone has determined whether this new scheme would allow Florian's
> work to be successfully completed.  The issues seem very similar. 
> If this approach also provides a basis for the other work, I think
> it helps bolster the argument that this is a good design; if not, I
> think it suggests that maybe it should be made more general or
> extensible in some way.  Once this has to be supported by pg_upgrade
> it will be harder to change the format, if that is needed for some
> other feature.

I have no idea what improvements Florian was seeking, but multixacts now
have plenty of bit flag space to indicate whatever we want for each
member transaction, so most likely the answer is yes.  However we need
to make clear that a single SELECT FOR UPDATE in a tuple does not
currently use a multixact; if we wish to always store flags then we are
forced to use one which incurs a performance hit.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner

Alvaro Herrera  wrote:

> As for sanity -- I regard multixacts as a way to store extended
> Xmax information.  The original idea was obviously much more
> limited in that the extended info was just locking info.  We've
> extended it but I don't think it's such a stretch.

Since the limitation on what can be stored in xmax was the killer
for Florian's attempt to support SELECT FOR UPDATE in a form which
was arguably more useful (and certainly more convenient for those
converting from other database products), I'm wondering whether
anyone has determined whether this new scheme would allow Florian's
work to be successfully completed.  The issues seem very similar. 
If this approach also provides a basis for the other work, I think
it helps bolster the argument that this is a good design; if not, I
think it suggests that maybe it should be made more general or
extensible in some way.  Once this has to be supported by pg_upgrade
it will be harder to change the format, if that is needed for some
other feature.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012:
> 
> Alvaro Herrera  writes:
> > Sure.  The problem is that we are allowing updated rows to be locked (and
> > locked rows to be updated).  This means that we need to store extended
> > Xmax information in tuples that goes beyond mere locks, which is what we
> > were doing previously -- they may now have locks and updates simultaneously.
> 
> > (In the previous code, a multixact never meant an update, it always
> > signified only shared locks.  After a crash, all backends that could
> > have been holding locks must necessarily be gone, so the multixact info
> > is not interesting and can be treated like the tuple is simply live.)
> 
> Ugh.  I had not been paying attention to what you were doing in this
> patch, and now that I read this I wish I had objected earlier.

Uhm, yeah, a lot earlier -- I initially blogged about this in August
last year:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/

and in several posts in pgsql-hackers.

> This
> seems like a horrid mess that's going to be unsustainable both from a
> complexity and a performance standpoint.  The only reason multixacts
> were tolerable at all was that they had only one semantics.  Changing
> it so that maybe a multixact represents an actual updater and maybe
> it doesn't is not sane.

As far as complexity, yeah, it's a lot more complex now -- no question
about that.

Regarding performance, the good thing about this patch is that if you
have an operation that used to block, it might now not block.  So maybe
multixact-related operation is a bit slower than before, but if it
allows you to continue operating rather than sit waiting until some
other transaction releases you, it's much better.

As for sanity -- I regard multixacts as a way to store extended Xmax
information.  The original idea was obviously much more limited in that
the extended info was just locking info.  We've extended it but I don't
think it's such a stretch.

I have been posting about (most? all of?) the ideas that I've been
following to make this work at all, so that people had plenty of chances
to disagree with them -- and Noah and others did disagree with many of
them, so I changed the patch accordingly.  I'm not closed to further
rework, but I'm not going to entirely abandon the idea too lightly.

I'm sure there are bugs too, but hopefully there are as shallow as
interested reviewer eyeballs there are.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Excerpts from Simon Riggs's message of jue feb 23 12:12:13 -0300 2012:
> On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
>  wrote:

> > Sure.  The problem is that we are allowing updated rows to be locked (and
> > locked rows to be updated).  This means that we need to store extended
> > Xmax information in tuples that goes beyond mere locks, which is what we
> > were doing previously -- they may now have locks and updates simultaneously.

> OK, thanks.
> 
> So why do we need pg_upgrade support?

Two reasons.  One is that in upgrades from a version that contains this
patch to another version that also contains this patch (i.e. future
upgrades), we need to copy the multixact files from the old cluster to
the new.

The other is that in upgrades from a version that doesn't contain this
patch to a version that does, we need to set the multixact limit values
so that values that were used in the old cluster are returned as empty
values (keeping the old semantics); otherwise they would cause errors
trying to read the member Xids from disk.

> If pg_multixact is not persistent now, surely there is no requirement
> to have pg_upgrade do any form of upgrade? The only time we'll need to
> do this is from 9.2 to 9.3, which can of course occur any time in next
> year. That doesn't sound like a reason to block a patch now, because
> of something that will be needed a year from now.

I think there's a policy that we must allow upgrades from one beta to
the next, which is why Noah says this is a blocker starting from beta2.

The pg_upgrade code for this is rather simple however.  There's no
rocket science there.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012:
> 
> On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch  wrote:
> 
> >> All in all, I think this is in pretty much final shape.  Only pg_upgrade
> >> bits are still missing.  If sharp eyes could give this a critical look
> >> and knuckle-cracking testers could give it a spin, that would be
> >> helpful.
> >
> > Lack of pg_upgrade support leaves this version incomplete, because that
> > omission would constitute a blocker for beta 2.  This version changes as 
> > much
> > code compared to the version I reviewed at the beginning of the CommitFest 
> > as
> > that version changed overall.  In that light, it's time to close the books 
> > on
> > this patch for the purpose of this CommitFest; I'm marking it Returned with
> > Feedback.  Thanks for your efforts thus far.

Now this is an interesting turn of events.  I must thank you for your
extensive review effort in the current version of the patch, and also
thank you and credit you for the idea that initially kicked this patch
from the older, smaller, simpler version I wrote during the 9.1 timeline
(which you also reviewed exhaustively).  Without your and Simon's
brilliant ideas, this patch wouldn't exist at all.

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to "needs review", so that other reviewers can pick it up.

As for the late code churn, it mostly happened as a result of your
own feedback; I would have left most of it in the original state, but as
I went ahead it seemed much better to refactor things.  This is mostly
in heapam.c.  As for multixact.c, it also had a lot of churn, but that
was mostly to restore it to the state it has in the master branch,
dropping much of the code I had written to handle multixact truncation.
The new code there and in the vacuum code path (relminmxid and so on) is
a lot smaller than that other code was, and it's closely based on
relfrozenxid which is a known piece of technology.

> My view would be that with 90 files touched this is a very large
> patch, so that alone makes me wonder whether we should commit this
> patch, so I agree with Noah and compliment him on an excellent
> detailed review.

I note, however, that the bulk of the patch is in three files --
multixact.c, tqual.c, heapam.c, as is clearly illustrated in the diff
stats I posted.  The rest of them are touched mostly to follow their new
APIs (and of course to add tests and docs).

To summarize, of 94 files touched in total:
* 22 files are in src/test/isolation/
  (new and updated tests and expected files)
* 19 files are in src/include/
* 10 files are in contrib/
* 39 files are in src/backend;
 * in that subdir, there are 3097 insertions and 1006 deletions
 * 3047 (83%) of which are in heapam.c multixact.c tqual.c
 * one is a README

> However, review of such a large patch should not be simply pass or
> fail. We should be looking back at the original problem and ask
> ourselves whether some subset of the patch could solve a useful subset
> of the problem. For me, that seems quite likely and this is very
> definitely an important patch.
> 
> Even if we can't solve some part of the problem we can at least commit
> some useful parts of infrastructure to allow later work to happen more
> smoothly and quickly.
> 
> So please let's not focus on the 100%, lets focus on 80/20.

Well, we have the patch I originally posted in the 9.1 timeframe.
That's a lot smaller and simpler.  However, that solves only part of the
blocking problem, and in particular it doesn't fix the initial deadlock
reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case
you wonder about his change of email address) that started this effort
in the first place.  I don't think we can cut down to that and still
satisfy the users that requested this; and Glue was just the first one,
because after I started blogging about this, some more people started
asking for it.

I don't think there's any useful middlepoint between that one and the
current one, but maybe I'm wrong.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Tom Lane

Alvaro Herrera  writes:
> Sure.  The problem is that we are allowing updated rows to be locked (and
> locked rows to be updated).  This means that we need to store extended
> Xmax information in tuples that goes beyond mere locks, which is what we
> were doing previously -- they may now have locks and updates simultaneously.

> (In the previous code, a multixact never meant an update, it always
> signified only shared locks.  After a crash, all backends that could
> have been holding locks must necessarily be gone, so the multixact info
> is not interesting and can be treated like the tuple is simply live.)

Ugh.  I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.  This
seems like a horrid mess that's going to be unsustainable both from a
complexity and a performance standpoint.  The only reason multixacts
were tolerable at all was that they had only one semantics.  Changing
it so that maybe a multixact represents an actual updater and maybe
it doesn't is not sane.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
>> On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch  wrote:
>>
>> > Making pg_multixact persistent across clean shutdowns is no bridge to cross
>> > lightly, since it means committing to an on-disk format for an indefinite
>> > period.  We should do it; the benefits of this patch justify it, and I 
>> > haven't
>> > identified a way to avoid it without incurring worse problems.
>>
>> I can't actually see anything in the patch that explains why this is
>> required. (That is something we should reject more patches on, since
>> it creates a higher maintenance burden).
>>
>> Can someone explain? We might think of a way to avoid that.
>
> Sure.  The problem is that we are allowing updated rows to be locked (and
> locked rows to be updated).  This means that we need to store extended
> Xmax information in tuples that goes beyond mere locks, which is what we
> were doing previously -- they may now have locks and updates simultaneously.
>
> (In the previous code, a multixact never meant an update, it always
> signified only shared locks.  After a crash, all backends that could
> have been holding locks must necessarily be gone, so the multixact info
> is not interesting and can be treated like the tuple is simply live.)
>
> This means that this extended Xmax info needs to be able to survive, so
> that it's possible to retrieve it after a crash; because even if the
> lockers are all gone, the updater might have committed and this means
> the tuple is dead.  If we failed to keep this, the tuple would be
> considered live which would be wrong because the other version of the
> tuple, which was created by the update, is also live.

OK, thanks.

So why do we need pg_upgrade support?

If pg_multixact is not persistent now, surely there is no requirement
to have pg_upgrade do any form of upgrade? The only time we'll need to
do this is from 9.2 to 9.3, which can of course occur any time in next
year. That doesn't sound like a reason to block a patch now, because
of something that will be needed a year from now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
> On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch  wrote:
> 
> > Making pg_multixact persistent across clean shutdowns is no bridge to cross
> > lightly, since it means committing to an on-disk format for an indefinite
> > period.  We should do it; the benefits of this patch justify it, and I 
> > haven't
> > identified a way to avoid it without incurring worse problems.
> 
> I can't actually see anything in the patch that explains why this is
> required. (That is something we should reject more patches on, since
> it creates a higher maintenance burden).
> 
> Can someone explain? We might think of a way to avoid that.

Sure.  The problem is that we are allowing updated rows to be locked (and
locked rows to be updated).  This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.

(In the previous code, a multixact never meant an update, it always
signified only shared locks.  After a crash, all backends that could
have been holding locks must necessarily be gone, so the multixact info
is not interesting and can be treated like the tuple is simply live.)

This means that this extended Xmax info needs to be able to survive, so
that it's possible to retrieve it after a crash; because even if the
lockers are all gone, the updater might have committed and this means
the tuple is dead.  If we failed to keep this, the tuple would be
considered live which would be wrong because the other version of the
tuple, which was created by the update, is also live.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Feb 23, 2012 at 1:08 PM, Jeroen Vermeulen  wrote:

> Simon, I think you had a reason why it couldn't work, but I didn't quite get
> your meaning and didn't want to distract things further at that stage.  You
> wrote that it "doesn't do what KEY LOCKS are designed to do"...  any chance
> you might recall what the problem was?

The IMMUTABLE idea would work, but it requires all users to recode
their apps. By the time they've done that we'll have probably fixed
the problem in full anyway, so then we have to ask them to stop again,
which is hard so we'll be stuck with a performance tweak that applies
to just one release. So its the fully automatic solution we're looking
for. I don't object to someone implementing IMMUTABLE, I'm just saying
its not a way to get this patch simpler and therefore acceptable.

If people are willing to recode apps to avoid this then hire me and
I'll tell you how ;-)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch  wrote:

> Making pg_multixact persistent across clean shutdowns is no bridge to cross
> lightly, since it means committing to an on-disk format for an indefinite
> period.  We should do it; the benefits of this patch justify it, and I haven't
> identified a way to avoid it without incurring worse problems.

I can't actually see anything in the patch that explains why this is
required. (That is something we should reject more patches on, since
it creates a higher maintenance burden).

Can someone explain? We might think of a way to avoid that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Jeroen Vermeulen


On 2012-02-23 10:18, Simon Riggs wrote:


However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.


The suggested immutable-column constraint was meant as a potential 
"80/20 workaround."  Definitely not a full solution, helpful to some, 
probably easier to do.  I don't know if an immutable key would actually 
be enough to elide foreign-key locks though.


Simon, I think you had a reason why it couldn't work, but I didn't quite 
get your meaning and didn't want to distract things further at that 
stage.  You wrote that it "doesn't do what KEY LOCKS are designed to 
do"...  any chance you might recall what the problem was?


I don't mean to be pushy about my pet idea, and heaven knows I don't 
have time to implement it, but it'd be good to know whether I should put 
the whole thought to rest.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt