Re: [BackupPC-users] server maintenance: reconstruct missing poolCnt; find/delete references to missing pool files

2017-03-07 Thread Craig Barratt
It looks like BackupPC_refCountUpdate removes the noPoolCntOk file, even if
it doesn't write any.  So I will fix that (keep the existing file, or
create a new one, if there are no pool files after an fsck).

Craig

On Tue, Mar 7, 2017 at 9:58 AM, Alexander Kobel  wrote:

> Hi Craig,
>
> and thanks a lot for your swift reply.  Comforting to see that both
> messages are known and harmless.
>
> On 2017-03-07 18:24, Craig Barratt wrote:
> >>> BackupPC_refCountUpdate: doing fsck on  #1188 since there are no
> poolCnt files
> >>> BackupPC_refCountUpdate: doing fsck on  #1190 since there are no
> poolCnt files
> >>> ...
> >>> BackupPC_refCountUpdate: host  got 0 errors (took 5 secs)
> >>
> >> The backups in question seem to be fully intact; [...]
> >>
> > This is perfectly ok.  BackupPC 4.0.0alpha3 and prior 4.x versions
> > didn't store reference counts per backup.  [...]  So
> BackupPC_refCountUpdate is
> > simply adding reference counts to backups done by BackupPC
> > 4.0.0alpha3.  It's a one-time thing.
>
> Got that, and...
>
> > There might be an issue that an incremental done by BackupPC
> > 4.0.0alpha3 with no changes will have an empty backup tree, and
> > BackupPC_refCntUpdate will continually report that there are no
> > poolCnt files for that backup.  That's benign.  [...]
>
> ... indeed that's the case here.  It's backups of a data directory on a
> server that rarely changes.
>
> > In 4.0.0, BackupPC_dump flags that by creating a file
> > "HOST/NNN/refCnt/noPoolCntOk, which makes BackupPC_refCntUpdate
> > quietly ignore that backup.  Perhaps I should have
> > BackupPC_refCntUpdate notice that legacy case and create the
> > noPoolCntOk file...
>
> Certainly low priority, but you might keep it on the list if it's not a
> lot of work.
> On the other hand, now everyone who searches for the warning will find
> this mailing list post, and be pleased to hear that a simple
>   touch //refCnt/noPoolCntOk
> gets rid of the warning.  (Sanity check before: confirm that //
> did not contain anything but backupInfo and an empty refCnt/ directory.)
>
> >> BackupPC_refCountUpdate: missing pool file
>  count 30
> >> BackupPC_refCountUpdate: missing pool file
> 0601e1b90a7f92ce4cffa588ef2cc9da count 1
> >> ...
> >> BackupPC_refCountUpdate: missing pool file
> ea1bd7ab2e00 count 1
> >
> > This is a bug in rsync-bpc [...] (yes, I had "<" instead of "<="...
> doh!).
>
> Oh, well...  He that has never wrote that error, let him first cast a
> stone... ;-)
>
> > Future backups with 4.0.0 (assuming the same file exists on the
> > client) will be updated with the correct digest, but the old backups
> > will still have the wrong one.  The errors will go away when the
> > corresponding backups eventually expire.
>
> Okay.  Just important to know that it will "fix itself" rather than
> getting worse over time.
>
>
> Thanks again for your reply, and thanks for such a great overall program!
>
>
> Alexander
>
>
> > On Tue, Mar 7, 2017 at 6:56 AM, Alexander Kobel  > > wrote:
> >
> > Dear all,
> >
> > I have a rather small, private, non-$$$M-mission-critical instance of
> > a BackupPC server running for years (that went from several
> > 3.something through 4.0.0alpha3 to, recently, 4.0.0). After the 4.0.0
> > migration, I decided to run again some semi-manual maintenance (read:
> > fsck and refCountUpdate). Yet, after an (admitted, more or less
> > random) sequence of BackupPC_fsck BackupPC_fsck -f BackupPC_fsck -f
> > -s BackupPC_refCountUpdate -m BackupPC_refCountUpdate -m -F -c
> > BackupPC_refCountUpdate -m -F -c -s BackupPC_fixupBackupSummary
> > BackupPC_nightly 0 255 I'm still stuck with the following messages:
> >
> >> BackupPC_refCountUpdate: doing fsck on  #1188 since there are
> >> no poolCnt files BackupPC_refCountUpdate: doing fsck on 
> >> #1190 since there are no poolCnt files ... BackupPC_refCountUpdate:
> >> host  got 0 errors (took 5 secs)
> >
> > The backups in question seem to be fully intact; some are full
> > backups, some are incremental. It's just on a minority of backups
> > (appx. 15 out of 350 backups), and fortunately on small ones where
> > fsck does not take ages, so it does not bother me too much.
> > Nevertheless, can the missing poolCnt data be recomputed? fsck seems
> > to do the counting from scratch; can this be stored?
> >
> >> BackupPC_fsck: building main count database
> >> BackupPC_refCountUpdate: missing pool file
> >>  count 30 BackupPC_refCountUpdate:
> >> missing pool file 0601e1b90a7f92ce4cffa588ef2cc9da count 1 ...
> >> BackupPC_refCountUpdate: missing pool file
> >> ea1bd7ab2e00 count 1 ...
> >> BackupPC_refCountUpdate total errors: 70 BackupPC_fsck: Calling
> >> poolCountUpdate
> >
> > IIUC, this means that there are reference to files with that hash
> > *somewhere* in the backups, but the respective files are missing 

Re: [BackupPC-users] server maintenance: reconstruct missing poolCnt; find/delete references to missing pool files

2017-03-07 Thread Alexander Kobel
Hi Craig,

and thanks a lot for your swift reply.  Comforting to see that both messages 
are known and harmless.

On 2017-03-07 18:24, Craig Barratt wrote:
>>> BackupPC_refCountUpdate: doing fsck on  #1188 since there are no 
>>> poolCnt files
>>> BackupPC_refCountUpdate: doing fsck on  #1190 since there are no 
>>> poolCnt files
>>> ...
>>> BackupPC_refCountUpdate: host  got 0 errors (took 5 secs)
>>
>> The backups in question seem to be fully intact; [...]
>> 
> This is perfectly ok.  BackupPC 4.0.0alpha3 and prior 4.x versions
> didn't store reference counts per backup.  [...]  So BackupPC_refCountUpdate 
> is
> simply adding reference counts to backups done by BackupPC
> 4.0.0alpha3.  It's a one-time thing.

Got that, and...

> There might be an issue that an incremental done by BackupPC
> 4.0.0alpha3 with no changes will have an empty backup tree, and
> BackupPC_refCntUpdate will continually report that there are no
> poolCnt files for that backup.  That's benign.  [...]

... indeed that's the case here.  It's backups of a data directory on a server 
that rarely changes.

> In 4.0.0, BackupPC_dump flags that by creating a file
> "HOST/NNN/refCnt/noPoolCntOk, which makes BackupPC_refCntUpdate
> quietly ignore that backup.  Perhaps I should have
> BackupPC_refCntUpdate notice that legacy case and create the
> noPoolCntOk file...

Certainly low priority, but you might keep it on the list if it's not a lot of 
work.
On the other hand, now everyone who searches for the warning will find this 
mailing list post, and be pleased to hear that a simple
  touch //refCnt/noPoolCntOk
gets rid of the warning.  (Sanity check before: confirm that // did 
not contain anything but backupInfo and an empty refCnt/ directory.)

>> BackupPC_refCountUpdate: missing pool file  
>> count 30
>> BackupPC_refCountUpdate: missing pool file 0601e1b90a7f92ce4cffa588ef2cc9da 
>> count 1
>> ...
>> BackupPC_refCountUpdate: missing pool file ea1bd7ab2e00 
>> count 1
> 
> This is a bug in rsync-bpc [...] (yes, I had "<" instead of "<="... doh!).

Oh, well...  He that has never wrote that error, let him first cast a stone... 
;-)

> Future backups with 4.0.0 (assuming the same file exists on the
> client) will be updated with the correct digest, but the old backups
> will still have the wrong one.  The errors will go away when the
> corresponding backups eventually expire.

Okay.  Just important to know that it will "fix itself" rather than getting 
worse over time.


Thanks again for your reply, and thanks for such a great overall program!


Alexander


> On Tue, Mar 7, 2017 at 6:56 AM, Alexander Kobel  > wrote:
> 
> Dear all,
> 
> I have a rather small, private, non-$$$M-mission-critical instance of
> a BackupPC server running for years (that went from several
> 3.something through 4.0.0alpha3 to, recently, 4.0.0). After the 4.0.0
> migration, I decided to run again some semi-manual maintenance (read:
> fsck and refCountUpdate). Yet, after an (admitted, more or less
> random) sequence of BackupPC_fsck BackupPC_fsck -f BackupPC_fsck -f
> -s BackupPC_refCountUpdate -m BackupPC_refCountUpdate -m -F -c 
> BackupPC_refCountUpdate -m -F -c -s BackupPC_fixupBackupSummary 
> BackupPC_nightly 0 255 I'm still stuck with the following messages:
> 
>> BackupPC_refCountUpdate: doing fsck on  #1188 since there are
>> no poolCnt files BackupPC_refCountUpdate: doing fsck on 
>> #1190 since there are no poolCnt files ... BackupPC_refCountUpdate:
>> host  got 0 errors (took 5 secs)
> 
> The backups in question seem to be fully intact; some are full
> backups, some are incremental. It's just on a minority of backups
> (appx. 15 out of 350 backups), and fortunately on small ones where
> fsck does not take ages, so it does not bother me too much.
> Nevertheless, can the missing poolCnt data be recomputed? fsck seems
> to do the counting from scratch; can this be stored?
> 
>> BackupPC_fsck: building main count database 
>> BackupPC_refCountUpdate: missing pool file
>>  count 30 BackupPC_refCountUpdate:
>> missing pool file 0601e1b90a7f92ce4cffa588ef2cc9da count 1 ... 
>> BackupPC_refCountUpdate: missing pool file
>> ea1bd7ab2e00 count 1 ... 
>> BackupPC_refCountUpdate total errors: 70 BackupPC_fsck: Calling
>> poolCountUpdate
> 
> IIUC, this means that there are reference to files with that hash
> *somewhere* in the backups, but the respective files are missing from
> the cpool. Since the number is very low, I'm not really worried; in
> particular, I'm pretty sure that the pool file with hash 0*32 is a
> spurious one. Nevertheless, I'd like to see which filenames/entries
> refer to the missing pool files, and if possible delete those
> altogether to get a "clean" state without nagging messages.
> Unfortunately, I'm entirely clueless on how to approach that.
> 
> FWIW: the server runs on CentOS 7.3, on an 

Re: [BackupPC-users] server maintenance: reconstruct missing poolCnt; find/delete references to missing pool files

2017-03-07 Thread Craig Barratt
>
> > BackupPC_refCountUpdate: doing fsck on  #1188 since there are no
> poolCnt files
> > BackupPC_refCountUpdate: doing fsck on  #1190 since there are no
> poolCnt files
> > ...
> > BackupPC_refCountUpdate: host  got 0 errors (took 5 secs)
> The backups in question seem to be fully intact; some are full backups,
> some are incremental. It's just on a minority of backups (appx. 15 out of
> 350 backups), and fortunately on small ones where fsck does not take ages,
> so it does not bother me too much. Nevertheless, can the missing poolCnt
> data be recomputed? fsck seems to do the counting from scratch; can this be
> stored?


This is perfectly ok.  BackupPC 4.0.0alpha3 and prior 4.x versions didn't
store reference counts per backup.  Only the reference counts for the
entire host were maintained (in addition to the pool totals for all
hosts).  In 4.0.0, I changed that so reference counts were also stored
per-backup (which makes it easier to delete backups and to recompute the
per-host ref counts).  So BackupPC_refCountUpdate is simply adding
reference counts to backups done by BackupPC 4.0.0alpha3.  It's a one-time
thing.

There might be an issue that an incremental done by BackupPC 4.0.0alpha3
with no changes will have an empty backup tree, and BackupPC_refCntUpdate
will continually report that there are no poolCnt files for that backup.
That's benign.  In 4.0.0, BackupPC_dump flags that by creating a file
"HOST/NNN/refCnt/noPoolCntOk, which makes BackupPC_refCntUpdate quietly
ignore that backup.  Perhaps I should have BackupPC_refCntUpdate notice
that legacy case and create the noPoolCntOk file...

> BackupPC_refCountUpdate: missing pool file
>  count 30
> > BackupPC_refCountUpdate: missing pool file
> 0601e1b90a7f92ce4cffa588ef2cc9da count 1
> > ...
> > BackupPC_refCountUpdate: missing pool file
> ea1bd7ab2e00 count 1


This is a bug in rsync-bpc (and BackupPC::XS) that was fixed a couple of
weeks ago.  It happened about 2% of the time the attrib file for a large
directory was written (attrib file sizes >256k, approx 5k files depending
on file name lengths).  If the md5 digest of the last file written to the
256k staging buffer exactly lined up with the end of the buffer, the digest
for that file wasn't written correctly (yes, I had "<" instead of "<="...
doh!).

Future backups with 4.0.0 (assuming the same file exists on the client)
will be updated with the correct digest, but the old backups will still
have the wrong one.  The errors will go away when the corresponding backups
eventually expire.

Craig

On Tue, Mar 7, 2017 at 6:56 AM, Alexander Kobel  wrote:

> Dear all,
>
> I have a rather small, private, non-$$$M-mission-critical instance of a
> BackupPC server running for years (that went from several 3.something
> through 4.0.0alpha3 to, recently, 4.0.0).
> After the 4.0.0 migration, I decided to run again some semi-manual
> maintenance (read: fsck and refCountUpdate). Yet, after an (admitted, more
> or less random) sequence of
>   BackupPC_fsck
>   BackupPC_fsck -f
>   BackupPC_fsck -f -s
>   BackupPC_refCountUpdate -m
>   BackupPC_refCountUpdate -m -F -c
>   BackupPC_refCountUpdate -m -F -c -s
>   BackupPC_fixupBackupSummary
>   BackupPC_nightly 0 255
> I'm still stuck with the following messages:
>
> > BackupPC_refCountUpdate: doing fsck on  #1188 since there are no
> poolCnt files
> > BackupPC_refCountUpdate: doing fsck on  #1190 since there are no
> poolCnt files
> > ...
> > BackupPC_refCountUpdate: host  got 0 errors (took 5 secs)
>
> The backups in question seem to be fully intact; some are full backups,
> some are incremental. It's just on a minority of backups (appx. 15 out of
> 350 backups), and fortunately on small ones where fsck does not take ages,
> so it does not bother me too much. Nevertheless, can the missing poolCnt
> data be recomputed? fsck seems to do the counting from scratch; can this be
> stored?
>
> > BackupPC_fsck: building main count database
> > BackupPC_refCountUpdate: missing pool file 
> count 30
> > BackupPC_refCountUpdate: missing pool file 0601e1b90a7f92ce4cffa588ef2cc9da
> count 1
> > ...
> > BackupPC_refCountUpdate: missing pool file ea1bd7ab2e00
> count 1
> > ...
> > BackupPC_refCountUpdate total errors: 70
> > BackupPC_fsck: Calling poolCountUpdate
>
> IIUC, this means that there are reference to files with that hash
> *somewhere* in the backups, but the respective files are missing from the
> cpool. Since the number is very low, I'm not really worried; in particular,
> I'm pretty sure that the pool file with hash 0*32 is a spurious one.
> Nevertheless, I'd like to see which filenames/entries refer to the missing
> pool files, and if possible delete those altogether to get a "clean" state
> without nagging messages. Unfortunately, I'm entirely clueless on how to
> approach that.
>
> FWIW: the server runs on CentOS 7.3, on an ext3