On 12/12/2011 08:45 AM, Robert Haas wrote:
But I'm skeptical that anything that we only update once per
checkpoint cycle will help much in
calculating an accurate lag value.
I'm sure there is no upper bound on how much WAL lag you can build up
between commit/abort records either; they can be far less frequent than
checkpoints. All it takes is a multi-hour COPY with no other commits to
completely hose lag measured by that advance, and that is not an unusual
situation at all. Overnight daily ETL or reporting MV-ish roll-ups,
scheduled specifically for when no one is normally at the office, are
the first thing that spring to mind.
Anyway, I wasn't suggesting checkpoints as anything other than a worst
case behavior. We can always thump out more frequent updates to reduce
lag, and in what I expect to the most common case the WAL send/receive
stuff will usually do much better. I see the XID vs. WAL position UI
issues as being fundamentally unsolvable, which really bothers me. I'd
have preferred to run screaming away from this thread if it hadn't.
It also strikes me that anything that is based on augmenting the
walsender/walreceiver protocol leaves
anyone who is using WAL shipping out in the cold. I'm not clear from
the comments you or Simon have made how important you think that use
case still is.
There's a number of reasons why we might want more timestamps streamed
into the WAL; this might be one. We'd just need one to pop out one as
part of the archive_timeout switch to in theory make it possible for
these people to be happy. I think Simon was hoping to avoid WAL
timestamps, I wouldn't bet too much on that myself. The obvious
implementation problem here is that the logical place to put the
timestamps is right at the end of the WAL file, just before it's closed
for archiving. But that position isn't known until you've at least
started processing it, which you clearly are not doing fast enough if
lag exists.
As far as who's still important here, two observations. Note that the
pg_last_xact_insert_timestamp approach can fail to satisfy WAL shipping
people who are going to a separate network, where it's impractical to
connect to both servers with libpq. I have some customers who like
putting a one-way WAL wall (sorry) between production and the standby
server, with the log shipping being the only route between them; that's
one reason why they might still be doing this instead of using
streaming. There's really no good way to make these people happy and
provide time lag monitoring inside the database.
I was actually the last person I recall who suggested some extra
monitoring mainly aimed at WAL shipping environments:
http://archives.postgresql.org/pgsql-hackers/2010-01/msg01522.php Had
some pg_standby changes I was also working on back then, almost two
years ago. I never circled back to any of it due to having zero demand
since 9.0 shipped, the requests I had been regularly getting about this
all dried up. While I'm all for keeping new features working for
everyone when it doesn't hold progress back, it's not unreasonable to
recognize we can't support every monitoring option through all of the
weird ways WAL files can move around. pg_stat_replication isn't very
helpful for 9.0+ WAL shippers either, yet they still go on doing their
thing.
In the other direction, people who will immediately adopt the latest
hotness, cascading is a whole new layer of use case concerns on top of
the ones considered so far. Now you're talking two layers of
connections users have to navigate though to compute master->cascaded
standby lag. Cascade the WALSender timestamps instead, which seems
pretty simple to do, and then people can just ask their local standby.
--
Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers