[BackupPC-users] I broke my pool graphs

Matthew Pounsett Sat, 07 Jun 2025 10:07:32 -0700

I'm not sure what I've done, but something I did in the last few days broke
my pool size graphs.  I've looked at a lot of things, and I'm not sure what
may or may not be related, so this email is a bit of a huge infodump.
Probably most of this is unrelated, but I'm being verbose to avoid missing
something because I just _think_ it's not related.


My graphs haven't had new data since what I assume is the 4th, and the
graph file and the rrd both have a last-mod time of 01:00 UTC on the 6th
(as I'm writing this it is 15:30 on the 7th).

The graph is filled to the end of day 3, week 23.  This looks like it's
using strftime's %V to get the week, which would make that 23:59:59 UTC on
the 4th.  I then have nearly three empty days (graph updated, no data)
after that—which should bring it up to sometime on the 7th, which looks
like the present and would make sense if the only problem were that I just
wasn't getting new data... but the mod time on the file that I think is
being loaded is a day and a half ago.  So that's confusing.

I'm going to attach the current graph, but I've tried to make the above
description really verbose in case the list strips the image.

   # ls -l ~backuppc/log/poolUsage*
   -rw-r----- 1 backuppc backuppc  6420 Jun  6 01:07
/var/lib/backuppc/log/poolUsage4.png
   -rw-r----- 1 backuppc backuppc  8439 Jun  6 01:07
/var/lib/backuppc/log/poolUsage52.png
   -rw-r--r-- 1 backuppc backuppc 31112 Jun  6 01:07
/var/lib/backuppc/log/poolUsage.rrd

I ran BackupPC_migrateV3toV4 on the 3rd (started late in the UTC day, ran
overnight with backuppc stopped, restarted backuppc on the 4th) to clear
out the last of our old v3 backups (there were around 10 backups that it
processed).

Since then BackupPC_nightly has been taking _ages_ to run.  On the 5th I
bumped up our MaxBackupPCNightlyJobs to 4 (for the 01:00 run on the 6th),
thinking giving it half our cores would help (the other 4 for MaxBackups),
but it just seemed to be heavily IO bound, so I returned that to its
original 2 during the day on the 6th (for today's 01:00 run).   Today, the
two processes have been running for 14.5 hours so far.   I am assuming the
long run times for the last couple of days are a direct result of the
migrateV3toV4 run, and presumably they should settle down again in 14 days
(PoolSizeNightlyUpdatePeriod is 16).

Checking log files I haven't found any references (so no errors) to the
poolUsage files.

I did find a lot of "BackupPC_refCountUpdate: missing pool file"
log entries and errors related to V4 pool files with incorrect digests on
the 5th and 7th.  None at all of either on the 6th, and none before the
5th.  Probably not related to the RRD/graph issue, but again mentioning it
just in case.

Is this "normal" while cleaning up after a migrateV3toV4 run?   Or maybe
related to incomplete backups?  I did kill some running backups this week
while working on things.

2025-06-05 18:32:02  admin3 : BackupPC_refCountUpdate: missing pool file
c0cc2020cd7e247781c0fdd893a48505 count 1
2025-06-05 18:57:20  admin1 : BackupPC_refCountUpdate: ERROR pool file
/var/lib/backuppc/cpool/44/ec/45ecf1513b6ad26f10825b16de74832d has digest
d41d8cd98f00b204e9800998ecf8427e instead of 45ecf1513b6ad26f10825b16de74832d

2025-06-07 01:44:39  admin1 : BackupPC_refCountUpdate: missing pool file
8095ad44fdb78424b9998de06f6ffedc count 1
2025-06-07 10:56:18  admin1 : BackupPC_refCountUpdate: ERROR pool file
/var/lib/backuppc/cpool/ae/ca/aeca0c1090c0c08e947465d7b4fd6ca7 has digest
d41d8cd98f00b204e9800998ecf8427e instead of aeca0c1090c0c08e947465d7b4fd6ca7

Searching for known reasons for a failure to update RRD or graph files I
mostly found just the obvious things .. check perms, check paths, make sure
the right dependencies are installed, etc.  None of those things have
changed since the graphs were updating successfully.

The only thing I found out of the ordinary was a note from 4.0 alpha about
upgrades and needing to convert the RRD file using rrd_2_v4.pl, but it
looks like that no longer applies?  I don't have that script, but I do have
an old pool.rrd file from before the v4 migration, and poolUsage.rrd seems
to have the extra DS.  This doesn't seem relevant since the graph was
working fine up until a couple of days ago, but I'm mentioning it just in
case.

I do wonder if the very long nightly runs are somehow related to the rrd
data not being updated?  Or did running the V3toV4 migration break
something I haven't found?  That doesn't quite line up because I ran that
on the 3rd/4th, but seem to have graph data for the end of the day on the
4th... but there may be other side effects I'm unaware of that would
account for that.

I have made very few actual config changes this week.   Here's what I've
touched:
- changed ServerHost from the single-label hostname to the FQDN (probably
shouldn't matter because ServerPort is -1)
- MaxBackupPCNightlyJobs changed from 2 to 4, returned to 2
- removed --one-file-system from RsyncArgs
- added an entry to the '*' list in BackupFilesExclude
- added some new hosts (the reason this whole saga started)

What have I not looked at which might explain the failure to update the RRD
data for three days?

_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/

[BackupPC-users] I broke my pool graphs

Reply via email to