On Fri, Jan 11, 2019 at 9:20 PM Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

>
>
>
> On 1/11/19 7:40 PM, Robert Haas wrote:
> > On Fri, Jan 11, 2019 at 5:21 AM Magnus Hagander <mag...@hagander.net>
> wrote:
> >> Would it make sense to add a column to pg_stat_database showing
> >> the total number of checksum errors that have occurred in a database?
> >>
> >> It's really a ">1 means it's bad", but it's a lot easier to monitor
> >> that in the statistics views, and given how much a lot of people
> >> set their systems out to log, it's far too easy to miss individual
> >> checksum matches in the logs.
> >>
> >> If we track it at the database level, I don't think the overhead
> >> of adding one more counter would be very high either.
> >
> > It's probably not the idea way to track it.  If you have a terabyte or
> > fifty of data, and you see that you have some checksum failures, good
> > luck finding the offending blocks.
> >
>
> Isn't that somewhat similar to deadlocks, which we also track in
> pg_stat_database? The number of deadlocks is rather useless on it's own,
> you need to dive into the server log to find the details. Same for
> checksum errors.
>

It is a bit similar yeah. Though a checksum counter is really a "you need
to look at fixing this right away" in a bit more sense than deadlocks. But
yes, the fact that we already tracks deadlocks there is a good example. (Of
course, I believe I added that one at some point as well, so I'm clearly
biased there)


> But I'm tentatively in favor of your proposal anyway, because it's
> > pretty simple and cheap and might help people, and doing something
> > noticeably better is probably annoyingly complicated.
> >
>
> +1
>

Yeah, that's the idea behind it -- it's cheap, and an
early-warning-indicator.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Reply via email to