On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > On Sun, Mar 10, 2019 at 1:13 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > > > On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > > > > > On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <mag...@hagander.net> > > > wrote: > > > > > > > > On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju...@gmail.com> > > > > wrote: > > > >> > > > >> Sorry, I have again new comments after a little bit more thinking. > > > >> I'm wondering if we can do something about shared objects while we're > > > >> at it. They don't belong to any database, so it's a little bit > > > >> orthogonal to this proposal, but it seems quite important to track > > > >> error on those too! > > > >> > > > >> What about adding a new field in PgStat_GlobalStats for that? We can > > > >> use the same lastDir to easily detect such objects and slightly adapt > > > >> sendFile again, which seems quite straightforward. > > > > > > > Question is then what number that should show -- only the checksum > > > > counter in non-database-fields, or the total number across the cluster? > > > > > > I'd say only for non-database-fields errors, especially if we can > > > reset each counters separately. If necessary, we can add a new view > > > to give a global overview of checksum errors for DBA convenience. > > > > I'm considering adding a new PgStat_ChecksumStats for that purpose > > instead, but I don't know if that's acceptable to do so in the last > > commitfest. It seems worthwhile to add it eventually, since we'll > > probably end up having more things to report to users related to > > checksum. Online enabling of checksum could be the most immediate > > potential target. > > I wasn't aware that we were already storing informations about shared > objects in PgStat_StatDBEntry, with an InvalidOid as databaseid > (though we don't have any system view that are actually showing > information for such objects). > > As a result I ended up simply adding counters for the number of total > checks and the timestamp of the last failure in PgStat_StatDBEntry, > making attached patch very lightweight. I moved all the checksum > related counters out of pg_stat_database in a new pg_stat_checksum > view. It avoids to make pg_stat_database too wide, and also allows to > display information about shared object in this new view (some of the > other counters don't really make sense for shared objects or could > break existing monitoring query). While at it, I tried to add a > little bit of documentation wrt. checksum monitoring.
and of course I forgot to attach the patch.
pg_stat_checksum-v1.diff
Description: Binary data