On Thu, Apr 18, 2019 at 03:43:30PM -0400, Robert Haas wrote: > You can make it kinda make sense by saying "the blocks modified by > records *beginning in* segment XYZ" or alternatively "the blocks > modified by records *ending in* segment XYZ", but that seems confusing > to me. For example, suppose you decide on the first one -- > 000000010000000100000068.modblock will contain all blocks modified by > records that begin in 000000010000000100000068. Well, that means that > to generate the 000000010000000100000068.modblock, you will need > access to 000000010000000100000068 AND probably also > 000000010000000100000069 and in rare cases perhaps > 00000001000000010000006A or even later files. I think that's actually > pretty confusing. > > It seems better to me to give the files names like > ${TLI}.${STARTLSN}.${ENDLSN}.modblock, e.g. > 00000001.0000000168000058.00000001687DBBB8.modblock, so that you can > see exactly which *records* are covered by that segment.
How would you choose the STARTLSN/ENDLSN? If you could do it per checkpoint, rather than per-WAL, I think that would be great. > And I suspect it may also be a good idea to bunch up the records from > several WAL files. Especially if you are using 16MB WAL files, > collecting all of the block references from a single WAL file is going > to produce a very small file. I suspect that the modified block files > will end up being 100x smaller than the WAL itself, perhaps more, and > I don't think anybody will appreciate us adding another PostgreSQL > systems that spews out huge numbers of tiny little files. If, for > example, somebody's got a big cluster that is churning out a WAL > segment every second, they would probably still be happy to have a new > modified block file only, say, every 10 seconds. Agreed. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +