On 2014-01-21 19:23:57 -0500, Tom Lane wrote: > Andres Freund <and...@2ndquadrant.com> writes: > > On 2014-01-21 18:59:13 -0500, Tom Lane wrote: > >> Another thing to think about is whether we couldn't put a hard limit on > >> WAL record size somehow. Multi-megabyte WAL records are an abuse of the > >> design anyway, when you get right down to it. So for example maybe we > >> could split up commit records, with most of the bulky information dumped > >> into separate records that appear before the "real commit". This would > >> complicate replay --- in particular, if we abort the transaction after > >> writing a few such records, how does the replayer realize that it can > >> forget about those records? But that sounds probably surmountable. > > > I think removing the list of subtransactions from commit records would > > essentially require not truncating pg_subtrans after a restart > > anymore. > > I'm not suggesting that we stop providing that information! I'm just > saying that we perhaps don't need to store it all in one WAL record, > if instead we put the onus on WAL replay to be able to reconstruct what > it needs from a series of WAL records.
That'd likely require something similar to the incomplete actions used in btrees (and until recently in more places). I think that is/was a disaster I really don't want to extend. > > We could relatively easily split of logging the dropped files from > > commit records and log them in groups afterwards, we already have > > several races allowing to leak files. > > I was thinking the other way around: emit the subsidiary records before the > atomic commit or abort record, indeed before we've actually committed. > Part of the point is to reduce the risk that lack of WAL space would > prevent us from fully committing. > Replay would then involve either accumulating the subsidiary records in > memory, or being willing to go back and re-read them when the real commit > or abort record is seen. Well, the reason I suggested doing it the other way round is that we wouldn't need to reassemble anything (outside of cache invalidations which I don't know how to handle that way) which I think is a significant increase in robustness and decrease in complexity. > Also, writing those records afterwards > increases the risk of a post-commit failure, which is a bad thing. Well, most of those could be done outside of a critical section, possibly just FATALing out. Beats PANICing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers