Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2018-04-28 Thread Justin Pryzby
On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
> > It's probably a bit late in the v10 cycle to be taking any risks in
> > this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
> > as soon as the v11 cycle opens, unless someone can show an example
> > of non-broken coding that requires it.  (And if so, there ought to
> > be a regression test incorporating that.)

I'm resending this in case it's been forgotten about and in case there's still
time this cycle to follow through in removing RememberToFreeTupleDescAtEOX.

..And because I ran into it again earlier this month, ALTERing an 1600 column
table with 500 children (actually while rewriting to reduce to 12 childs); on a
dedicated DB VM with 8GB RAM:

Mar  7 11:44:52 telsaDB kernel: Out of memory: Kill process 47490 (postmaster) 
score 644 or sacrifice child
Mar  7 11:44:52 telsaDB kernel: Killed process 47490, UID 26, (postmaster) 
total-vm:6813528kB, anon-rss:5212288kB, file-rss:2296kB

On Wed, Oct 18, 2017 at 02:54:54PM -0500, Justin Pryzby wrote:
> Hi,
> 
> I just ran into this again in another context (see original dicussion, quoted
> below).
> 
> Some time ago, while initially introducting non-default stats target, I set 
> our
> non-filtering columns to "STATISTICS 10", lower than default, to minimize the
> size of pg_statistic, which (at least at one point) I perceived to have become
> bloated and causing issue (partially due to having an excessive number of
> "daily" granularity partitions, a problem I've since mitigated).
> 
> The large number of columns with non-default stats target was (I think) 
> causing
> pg_dump --section=pre-data to take 10+ minutes, which makes pg_upgrade more
> disruptive than necessary, so now I'm going back and fixing it.
> 
> [pryzbyj@database ~]$ time sed '/SET STATISTICS 10;$/!d; s//SET STATISTICS 
> -1;/' /srv/cdrperfbackup/ts/2017-10-17/pg_dump-section\=pre-data |psql -1q ts
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
> 
> [pryzbyj@database ~]$ dmesg |tail -n2
> Out of memory: Kill process 6725 (postmaster) score 550 or sacrifice child
> Killed process 6725, UID 26, (postmaster) total-vm:13544792kB, 
> anon-rss:8977764kB, file-rss:8kB
> 
> So I'm hoping to encourage someone to commit the change contemplated earlier.



Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2018-05-01 Thread Amit Langote
On 2018/04/29 1:00, Justin Pryzby wrote:
> On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
>>> It's probably a bit late in the v10 cycle to be taking any risks in
>>> this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
>>> as soon as the v11 cycle opens, unless someone can show an example
>>> of non-broken coding that requires it.  (And if so, there ought to
>>> be a regression test incorporating that.)

Fwiw, I too thought it should be possible by now to get rid of
RememberToFreeTupleDescAtEOX, but I was able to come up with at least one
case where its existence prevents a crash.  The test case is based on one
provided by Noah Misch a while ago when the code in question was
introduced as a measure for the problem he described:

https://www.postgresql.org/message-id/20130801005351.GA325106%40tornado.leadboat.com

Suppose in get_relation_constraints() we're processing a table that has 2
constraints, one which has ccvalid=true and another which has
ccvalid=false.  Further suppose a concurrent session performs a VALIDATE
CONSTRAINT on the second constraint, which requires just
ShareUpdateExclusiveLock and so it succeeds, sending out a sinval.  Back
in the first session, eval_const_expressions() performed on the first
constraint would end up in RelationClearRelation being called on the
table, wherein, since the 2nd constraint is now marked by valid by the
concurrent session, would cause rd_att to be swapped.  Thus the pointer
into rd_att that get_relation_constraints possesses would be a dangling
pointer, if not for the call to RememberToFreeTupleDescAtEOX() from
RelationDestroyRelation().

If however get_relation_constraints() had incremented the reference count
of rd_att to begin with, this problem wouldn't have arisen.  IOW, we'd
need to replace all the direct accesses to rd_att (including those via
RelationGetDescr, of course) by something that increments rd_att's
reference count, before concluding that RememberToFreeTupleDescAtEOX() is
unnecessary.  I'm afraid that's a lot of places.

Thanks,
Amit




Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2018-04-28 Thread Justin Pryzby
On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
> > It's probably a bit late in the v10 cycle to be taking any risks in
> > this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
> > as soon as the v11 cycle opens, unless someone can show an example
> > of non-broken coding that requires it.  (And if so, there ought to
> > be a regression test incorporating that.)

I'm resending this in case it's been forgotten about and in case there's still
time this cycle to follow through in removing RememberToFreeTupleDescAtEOX.

..And because I ran into it again earlier this month, ALTERing an 1600 column
table with 500 children (actually while rewriting to reduce to 12 childs); on a
dedicated DB VM with 8GB RAM:

Mar  7 11:44:52 telsaDB kernel: Out of memory: Kill process 47490 (postmaster) 
score 644 or sacrifice child
Mar  7 11:44:52 telsaDB kernel: Killed process 47490, UID 26, (postmaster) 
total-vm:6813528kB, anon-rss:5212288kB, file-rss:2296kB

On Wed, Oct 18, 2017 at 02:54:54PM -0500, Justin Pryzby wrote:
> Hi,
> 
> I just ran into this again in another context (see original dicussion, quoted
> below).
> 
> Some time ago, while initially introducting non-default stats target, I set 
> our
> non-filtering columns to "STATISTICS 10", lower than default, to minimize the
> size of pg_statistic, which (at least at one point) I perceived to have become
> bloated and causing issue (partially due to having an excessive number of
> "daily" granularity partitions, a problem I've since mitigated).
> 
> The large number of columns with non-default stats target was (I think) 
> causing
> pg_dump --section=pre-data to take 10+ minutes, which makes pg_upgrade more
> disruptive than necessary, so now I'm going back and fixing it.
> 
> [pryzbyj@database ~]$ time sed '/SET STATISTICS 10;$/!d; s//SET STATISTICS 
> -1;/' /srv/cdrperfbackup/ts/2017-10-17/pg_dump-section\=pre-data |psql -1q ts
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
> 
> [pryzbyj@database ~]$ dmesg |tail -n2
> Out of memory: Kill process 6725 (postmaster) score 550 or sacrifice child
> Killed process 6725, UID 26, (postmaster) total-vm:13544792kB, 
> anon-rss:8977764kB, file-rss:8kB
> 
> So I'm hoping to encourage someone to commit the change contemplated earlier.



Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2018-05-01 Thread Amit Langote
On 2018/04/29 1:00, Justin Pryzby wrote:
> On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
>>> It's probably a bit late in the v10 cycle to be taking any risks in
>>> this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
>>> as soon as the v11 cycle opens, unless someone can show an example
>>> of non-broken coding that requires it.  (And if so, there ought to
>>> be a regression test incorporating that.)

Fwiw, I too thought it should be possible by now to get rid of
RememberToFreeTupleDescAtEOX, but I was able to come up with at least one
case where its existence prevents a crash.  The test case is based on one
provided by Noah Misch a while ago when the code in question was
introduced as a measure for the problem he described:

https://www.postgresql.org/message-id/20130801005351.GA325106%40tornado.leadboat.com

Suppose in get_relation_constraints() we're processing a table that has 2
constraints, one which has ccvalid=true and another which has
ccvalid=false.  Further suppose a concurrent session performs a VALIDATE
CONSTRAINT on the second constraint, which requires just
ShareUpdateExclusiveLock and so it succeeds, sending out a sinval.  Back
in the first session, eval_const_expressions() performed on the first
constraint would end up in RelationClearRelation being called on the
table, wherein, since the 2nd constraint is now marked by valid by the
concurrent session, would cause rd_att to be swapped.  Thus the pointer
into rd_att that get_relation_constraints possesses would be a dangling
pointer, if not for the call to RememberToFreeTupleDescAtEOX() from
RelationDestroyRelation().

If however get_relation_constraints() had incremented the reference count
of rd_att to begin with, this problem wouldn't have arisen.  IOW, we'd
need to replace all the direct accesses to rd_att (including those via
RelationGetDescr, of course) by something that increments rd_att's
reference count, before concluding that RememberToFreeTupleDescAtEOX() is
unnecessary.  I'm afraid that's a lot of places.

Thanks,
Amit