Hi, I took a look at this today, to try to understand the purpose and how it works. Let me share some initial thoughts and questions I have. Some of this may be wrong/missing the point, so apologies for that.
The goal seems worthwhile in general - the way I understand it, the patch aims to provide tracking of WAL "velocity", i.e. how much WAL was generated over time. Which we now don't have, as we only maintain simple cumulative stats/counters. And then uses it to estimate timestamp for a given LSN, and vice versa, because that's what the pruning patch needs. When I first read this, I immediately started wondering if this might use the commit timestamp stuff we already have. Because for each commit we already store the LSN and commit timestamp, right? But I'm not sure that would be a good match - the commit_ts serves a very special purpose of mapping XID => (LSN, timestamp), I don't see how to make it work for (LSN=>timestmap) and (timestamp=>LSN) very easily. As for the inner workings of the patch, my understanding is this: - "LSNTimeline" consists of "LSNTime" entries representing (LSN,ts) points, but those points are really "buckets" that grow larger and larger for older periods of time. - The entries are being added from bgwriter, i.e. on each loop we add the current (LSN, timestamp) into the timeline. - We then estimate LSN/timestamp using the data stored in LSNTimeline (either LSN => timestamp, or the opposite direction). Some comments in arbitrary order: - AFAIK each entry represent an interval of time, and the next (older) interval is twice as long, right? So the first interval is 1 second, then 2 seconds, 4 seconds, 8 seconds, ... - But I don't understand how the LSNTimeline entries are "aging" and get less accurate, while the "current" bucket is short. lsntime_insert() seems to simply move to the next entry, but doesn't that mean we insert the entries into larger and larger buckets? - The comments never really spell what amount of time the entries cover / how granular it is. My understanding is it's simply measured in number of entries added, which is assumed to be constant and drive by bgwriter_delay, right? Which is 200ms by default. Which seems fine, but isn't the hibernation (HIBERNATE_FACTOR) going to mess with it? Is there some case where bgwriter would just loop without sleeping, filling the timeline much faster? (I can't think of any, but ...) - The LSNTimeline comment claims an array of size 64 is large enough to not need to care about filling it, but maybe it should briefly explain why we can never fill it (I guess 2^64 is just too many). - I don't quite understand why 0005 adds the functions to pageinspect. This has nothing to do with pages, right? - Not sure why we need 0001. Just so that the "estimate" functions in 0002 have a convenient "start" point? Surely we could look at the current LSNTimeline data and use the oldest value, or (if there's no data) use the current timestamp/LSN? - I wonder what happens if we lose the data - we know that if people reset statistics for whatever reason (or just lose them because of a crash, or because they're on a replica), bad things happen to autovacuum. What's the (expected) impact on pruning? - What about a SRF function that outputs the whole LSNTimeline? Would be useful for debugging / development, I think. (Just a suggestion). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company