Re: Add LSN <-> time conversion functionality

Tomas Vondra Fri, 16 Feb 2024 12:41:44 -0800

Hi,

I took a look at this today, to try to understand the purpose and how it
works. Let me share some initial thoughts and questions I have. Some of
this may be wrong/missing the point, so apologies for that.


The goal seems worthwhile in general - the way I understand it, the
patch aims to provide tracking of WAL "velocity", i.e. how much WAL was
generated over time. Which we now don't have, as we only maintain simple
cumulative stats/counters. And then uses it to estimate timestamp for a
given LSN, and vice versa, because that's what the pruning patch needs.

When I first read this, I immediately started wondering if this might
use the commit timestamp stuff we already have. Because for each commit
we already store the LSN and commit timestamp, right? But I'm not sure
that would be a good match - the commit_ts serves a very special purpose
of mapping XID => (LSN, timestamp), I don't see how to make it work for
(LSN=>timestmap) and (timestamp=>LSN) very easily.


As for the inner workings of the patch, my understanding is this:

- "LSNTimeline" consists of "LSNTime" entries representing (LSN,ts)
points, but those points are really "buckets" that grow larger and
larger for older periods of time.

- The entries are being added from bgwriter, i.e. on each loop we add
the current (LSN, timestamp) into the timeline.

- We then estimate LSN/timestamp using the data stored in LSNTimeline
(either LSN => timestamp, or the opposite direction).


Some comments in arbitrary order:

- AFAIK each entry represent an interval of time, and the next (older)
interval is twice as long, right? So the first interval is 1 second,
then 2 seconds, 4 seconds, 8 seconds, ...

- But I don't understand how the LSNTimeline entries are "aging" and get
less accurate, while the "current" bucket is short. lsntime_insert()
seems to simply move to the next entry, but doesn't that mean we insert
the entries into larger and larger buckets?

- The comments never really spell what amount of time the entries cover
/ how granular it is. My understanding is it's simply measured in number
of entries added, which is assumed to be constant and drive by
bgwriter_delay, right? Which is 200ms by default. Which seems fine, but
isn't the hibernation (HIBERNATE_FACTOR) going to mess with it?

Is there some case where bgwriter would just loop without sleeping,
filling the timeline much faster? (I can't think of any, but ...)

- The LSNTimeline comment claims an array of size 64 is large enough to
not need to care about filling it, but maybe it should briefly explain
why we can never fill it (I guess 2^64 is just too many).

- I don't quite understand why 0005 adds the functions to pageinspect.
This has nothing to do with pages, right?

- Not sure why we need 0001. Just so that the "estimate" functions in
0002 have a convenient "start" point? Surely we could look at the
current LSNTimeline data and use the oldest value, or (if there's no
data) use the current timestamp/LSN?

- I wonder what happens if we lose the data - we know that if people
reset statistics for whatever reason (or just lose them because of a
crash, or because they're on a replica), bad things happen to
autovacuum. What's the (expected) impact on pruning?

- What about a SRF function that outputs the whole LSNTimeline? Would be
useful for debugging / development, I think. (Just a suggestion).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add LSN <-> time conversion functionality

Reply via email to