Re: [BackupPC-users] Backup fails after three days, possibly millions of files

David Koski Fri, 19 Jul 2019 14:19:19 -0700


On 7/16/19 4:27 PM, Adam Goryachev wrote:

On 17/7/19 4:22 am, David Koski wrote:
Regards,
David Koski
[email protected]

On 7/8/19 6:16 PM, Adam Goryachev wrote:
On 9/7/19 10:23 am, David Koski wrote:
I am trying to back up about 24TB of data that has millions offiles. It takes a day or to before it starts backing up and thenstops with an error. I did a CLI dump and trapped the output andcan see the error message:
Can't write 32780 bytes to socket
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process fileShares/Archives/<path-removed>/COR_2630.png
Child is aborting
Done: 589666 files, 1667429241846 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Not saving this as a partial backup since it has fewer files thanthe prior one (got 589666 and 589666 files versus 4225016)
dump failed: aborted by signal=PIPE
This backup is doing rsync over ssh. I enabled SSH keepalive butit does not appear to be due to an idle network. It does notappear to be a random network interruption because the time ittakes to fail is pretty consistent, about three days. I'm stumped.
Did you check:

$Conf{ClientTimeout} = 72000;
Also, what version of rsync on the client, what version of BackupPCon the server, etc?
I think BPC v4 handles this scenario significantly better, in fact aserver I used to have trouble with on BPC3.x all the time has sincebeen combined with 4 other server (so 4 x the number of files andtotal size of data) and BPC4 handles it easily.
Thank you all for your input.  More information:

rsync version on client: 3.0.8 (Windows)
rsync version on server: 3.1.2 (Debian)
BackupPC version: 3.3.1
$(Config{ClientTimeout} = 604800
I just compared the output of two verbose BackupPC_dump runs and itlooks like the files are reported to be backed up even though theyare not. For example, this appears in logs of both backup runs:
create   644  4616/545  1085243184 <path-removed>/<name-removed>3412.zip
I checked and the file time stamp is year 2018. The log files arefull of these. I checked the real time clock on both systems andthey are correct. There are also files that have been backed up thatare not in the logs.
I suspect there are over ten million files but I don't have a goodway of telling now. Oddly, there are about 500,000 files backedaccording to the log captured from BackupPC_dump and almost the samenumber actually backed up and found in pc/<host>/0, but they aredifferent subsets of files. I have been tracking memory and swapusage on the server and see no issues.
Is this a possible bug in BackupPC 3.3.1?
Please don't top-post if you can avoid it, at least not on mailing lists.

I just realised:

Read EOF: Connection reset by peer
This is a networking issue, not BackupPC. In other words, somethinghas broken the network connection (in the middle of transferring afile, so I would presume it isn't due to some idle timeout, droppedNAT entry, etc). BackupPC has been told by the operating system thatthe connection is no longer valid, and so it has "cleaned up" byremoving the in-progress file (partial).

I just completed another backup cycle that failed in the same manner butthis time with a continuous ping with captured output. It didn't miss abeat.

It takes a day to start (presumably reading ALL the files on theclient takes this long, you could improve disk performance, orincrease RAM on the client to improve this).


You might be right.  But it's not a show stopper.

"and then stops with an error" - is that on the first file, or aresome files successfully transferred? Is that the first large file?Does it always fail on the same file (seems not, since it previouslygot many more).

Good points. Confirmed: Not the first file (over 600,000 filestransferred first), not a large file (less than 20Meg), does not alwaysfail on the same file or directory.

I'm thinking you need to check and/or improve network reliability,make sure both client and server are not running out of RAM/etc(mainly the backuppc client, the OOM might kill the rsync process),etc. Check your system logs on both client and server, and/or watchtop output on both systems during the backup.

The network did not miss a beat and generally appears responsive. It hasbeen checked. The client and server RAM usage are tracked in Zabbix andnot close to running out. Only curious thing is swap is running out onthe client (Windows Server 2016) even with 10GB RAM available, but stillhas about 2GB before crash. Server system logs (kern.log, syslog) showno signs of issues.

Try backing up other systems, try backing up a smaller subset (excludesome large directories, and then add them back in if you complete abackup successfully).


That is a good idea.  I'll try adding incrementally to the data backed up.

Overall, I would advise to upgrade to BPC v4.x, it handles backups ofsystems with huge number of files much better.


If incrementally adding doesn't solve the problem I'll try an upgrade.

Thank you,
David Koski

This doesn't look like a BPC bug, maybe a network driver, kernel, orsomething else, but not BPC (IMHO).
Regards,
Adam




_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Re: [BackupPC-users] Backup fails after three days, possibly millions of files

Reply via email to