On 07/23/2016 08:25 AM, Amit Kapila wrote: > On Sat, Jul 23, 2016 at 3:32 AM, Chapman Flack <c...@anastigmatix.net> wrote: >> >> Would it then be possible to go back to the old behavior (or make >> it selectable) of not overwriting the full 16 MB every time? >> > > I don't see going back to old behaviour is an improvement, because as > as you pointed out above that it helps to improve the compression > ratio of WAL files for tools like gzip and it doesn't seem advisable > to loose that capability. I think providing an option to select that > behaviour could be one choice, but use case seems narrow to me > considering there are tools (pglesslog) to clear the tail. Do you > find any problems with that tool which makes you think that it is not > reliable?
It was a year or so ago when I was surveying tools that attempted to do that. I had found pg_clearxlogtail, and I'm sure I also found pglesslog / pg_compresslog ... my notes from then simply refer to "contrib efforts like pg_clearxlogtail" and observed either a dearth of recent search results for them, or a predominance of results of the form "how do I get this to compile for PG x.x?" pg_compresslog is mentioned in a section, Compressed Archive Logs, of the PG 9.1 manual: https://www.postgresql.org/docs/9.1/static/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS That section is absent in the docs any version > 9.1. The impression that leaves is of tools that relied too heavily on internal format knowledge to be viable outside of core, which have had at least periods of incompatibility with newer PG versions, and whose current status, if indeed any are current, isn't easy to find out. It seems a bit risky (to me, anyway) to base a backup strategy on having a tool in the pipeline that depends so heavily on internal format knowledge, can become uncompilable between PG releases, and isn't part of core and officially supported. And that, I assume, was also the motivation to put the zeroing in AdvanceXLInsertBuffer, eliminating the need for one narrow, specialized tool like pg{_clear,_compress,less}log{,tail}, so the job can be done with ubiquitous, bog standard (and therefore *very* exhaustively tested) tools like gzip. So it's just kind of unfortunate that there used to be a *further* factor of 100 (nothing to sneeze at) possible using rsync (another non-PG-specific, ubiquitous, exhaustively tested tool) but a trivial feature of the new behavior has broken that. Factors of 100 are enough to change the sorts of things you think about, like possibly retaining years-long unbroken histories of transactions in WAL. What would happen if the overwriting of the log tail were really done with just zeros, as the git comment implied, rather than zeros with initialized headers? Could the log-reading code handle that gracefully? That would support all forms of non-PG-specific, ubiquitous tools used for compression; it would not break the rsync approach. Even so, it still seems to me that a cheaper solution is a %e substitution in archive_command: just *tell* the command where the valid bytes end. Accomplishes the same thing as ~ 16 MB of otherwise-unnecessary I/O at the time of archiving each lightly-used segment. Then the actual zeroing could be suppressed to save I/O, maybe with a GUC variable, or maybe just when archive_command is seen to contain a %e. Commands that don't have a %e continue to work and compress effectively because of the zeroing. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers