I'm not sure what I've done, but something I did in the last few days broke my pool size graphs. I've looked at a lot of things, and I'm not sure what may or may not be related, so this email is a bit of a huge infodump. Probably most of this is unrelated, but I'm being verbose to avoid missing something because I just _think_ it's not related.
My graphs haven't had new data since what I assume is the 4th, and the graph file and the rrd both have a last-mod time of 01:00 UTC on the 6th (as I'm writing this it is 15:30 on the 7th). The graph is filled to the end of day 3, week 23. This looks like it's using strftime's %V to get the week, which would make that 23:59:59 UTC on the 4th. I then have nearly three empty days (graph updated, no data) after that—which should bring it up to sometime on the 7th, which looks like the present and would make sense if the only problem were that I just wasn't getting new data... but the mod time on the file that I think is being loaded is a day and a half ago. So that's confusing. I'm going to attach the current graph, but I've tried to make the above description really verbose in case the list strips the image. # ls -l ~backuppc/log/poolUsage* -rw-r----- 1 backuppc backuppc 6420 Jun 6 01:07 /var/lib/backuppc/log/poolUsage4.png -rw-r----- 1 backuppc backuppc 8439 Jun 6 01:07 /var/lib/backuppc/log/poolUsage52.png -rw-r--r-- 1 backuppc backuppc 31112 Jun 6 01:07 /var/lib/backuppc/log/poolUsage.rrd I ran BackupPC_migrateV3toV4 on the 3rd (started late in the UTC day, ran overnight with backuppc stopped, restarted backuppc on the 4th) to clear out the last of our old v3 backups (there were around 10 backups that it processed). Since then BackupPC_nightly has been taking _ages_ to run. On the 5th I bumped up our MaxBackupPCNightlyJobs to 4 (for the 01:00 run on the 6th), thinking giving it half our cores would help (the other 4 for MaxBackups), but it just seemed to be heavily IO bound, so I returned that to its original 2 during the day on the 6th (for today's 01:00 run). Today, the two processes have been running for 14.5 hours so far. I am assuming the long run times for the last couple of days are a direct result of the migrateV3toV4 run, and presumably they should settle down again in 14 days (PoolSizeNightlyUpdatePeriod is 16). Checking log files I haven't found any references (so no errors) to the poolUsage files. I did find a lot of "BackupPC_refCountUpdate: missing pool file" log entries and errors related to V4 pool files with incorrect digests on the 5th and 7th. None at all of either on the 6th, and none before the 5th. Probably not related to the RRD/graph issue, but again mentioning it just in case. Is this "normal" while cleaning up after a migrateV3toV4 run? Or maybe related to incomplete backups? I did kill some running backups this week while working on things. 2025-06-05 18:32:02 admin3 : BackupPC_refCountUpdate: missing pool file c0cc2020cd7e247781c0fdd893a48505 count 1 2025-06-05 18:57:20 admin1 : BackupPC_refCountUpdate: ERROR pool file /var/lib/backuppc/cpool/44/ec/45ecf1513b6ad26f10825b16de74832d has digest d41d8cd98f00b204e9800998ecf8427e instead of 45ecf1513b6ad26f10825b16de74832d 2025-06-07 01:44:39 admin1 : BackupPC_refCountUpdate: missing pool file 8095ad44fdb78424b9998de06f6ffedc count 1 2025-06-07 10:56:18 admin1 : BackupPC_refCountUpdate: ERROR pool file /var/lib/backuppc/cpool/ae/ca/aeca0c1090c0c08e947465d7b4fd6ca7 has digest d41d8cd98f00b204e9800998ecf8427e instead of aeca0c1090c0c08e947465d7b4fd6ca7 Searching for known reasons for a failure to update RRD or graph files I mostly found just the obvious things .. check perms, check paths, make sure the right dependencies are installed, etc. None of those things have changed since the graphs were updating successfully. The only thing I found out of the ordinary was a note from 4.0 alpha about upgrades and needing to convert the RRD file using rrd_2_v4.pl, but it looks like that no longer applies? I don't have that script, but I do have an old pool.rrd file from before the v4 migration, and poolUsage.rrd seems to have the extra DS. This doesn't seem relevant since the graph was working fine up until a couple of days ago, but I'm mentioning it just in case. I do wonder if the very long nightly runs are somehow related to the rrd data not being updated? Or did running the V3toV4 migration break something I haven't found? That doesn't quite line up because I ran that on the 3rd/4th, but seem to have graph data for the end of the day on the 4th... but there may be other side effects I'm unaware of that would account for that. I have made very few actual config changes this week. Here's what I've touched: - changed ServerHost from the single-label hostname to the FQDN (probably shouldn't matter because ServerPort is -1) - MaxBackupPCNightlyJobs changed from 2 to 4, returned to 2 - removed --one-file-system from RsyncArgs - added an entry to the '*' list in BackupFilesExclude - added some new hosts (the reason this whole saga started) What have I not looked at which might explain the failure to update the RRD data for three days?
_______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: https://github.com/backuppc/backuppc/wiki Project: https://backuppc.github.io/backuppc/