Re: [BackupPC-users] Backup fails after three days, possibly millions of files

2019-07-19 Thread David Koski



On 7/16/19 4:27 PM, Adam Goryachev wrote:

On 17/7/19 4:22 am, David Koski wrote:


Regards,
David Koski
dko...@sutinen.com

On 7/8/19 6:16 PM, Adam Goryachev wrote:

On 9/7/19 10:23 am, David Koski wrote:
I am trying to back up about 24TB of data that has millions of 
files.  It takes a day or to before it starts backing up and then 
stops with an error.  I did a CLI dump and trapped the output and 
can see the error message:


Can't write 32780 bytes to socket
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process file 
Shares/Archives//COR_2630.png

Child is aborting
Done: 589666 files, 1667429241846 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Not saving this as a partial backup since it has fewer files than 
the prior one (got 589666 and 589666 files versus 4225016)

dump failed: aborted by signal=PIPE

This backup is doing rsync over ssh.  I enabled SSH keepalive but 
it does not appear to be due to an idle network.  It does not 
appear to be a random network interruption because the time it 
takes to fail is pretty consistent, about three days. I'm stumped. 



Did you check:

$Conf{ClientTimeout} = 72000;

Also, what version of rsync on the client, what version of BackupPC 
on the server, etc?


I think BPC v4 handles this scenario significantly better, in fact a 
server I used to have trouble with on BPC3.x all the time has since 
been combined with 4 other server (so 4 x the number of files and 
total size of data) and BPC4 handles it easily.





Thank you all for your input.  More information:

rsync version on client: 3.0.8 (Windows)
rsync version on server: 3.1.2 (Debian)
BackupPC version: 3.3.1
$(Config{ClientTimeout} = 604800

I just compared the output of two verbose BackupPC_dump runs and it 
looks like the files are reported to be backed up even though they 
are not.  For example, this appears in logs of both backup runs:


create   644  4616/545  1085243184 /3412.zip

I checked and the file time stamp is year 2018.  The log files are 
full of these.  I checked the real time clock on both systems and 
they are correct.  There are also files that have been backed up that 
are not in the logs.


I suspect there are over ten million files but I don't have a good 
way of telling now.  Oddly, there are about 500,000 files backed 
according to the log captured from BackupPC_dump and almost the same 
number actually backed up and found in pc//0, but they are 
different subsets of files.  I have been tracking memory and swap 
usage on the server and see no issues.


Is this a possible bug in BackupPC 3.3.1?


Please don't top-post if you can avoid it, at least not on mailing lists.

I just realised:

Read EOF: Connection reset by peer

This is a networking issue, not BackupPC. In other words, something 
has broken the network connection (in the middle of transferring a 
file, so I would presume it isn't due to some idle timeout, dropped 
NAT entry, etc). BackupPC has been told by the operating system that 
the connection is no longer valid, and so it has "cleaned up" by 
removing the in-progress file (partial).


I just completed another backup cycle that failed in the same manner but 
this time with a continuous ping with captured output.  It didn't miss a 
beat.




It takes a day to start (presumably reading ALL the files on the 
client takes this long, you could improve disk performance, or 
increase RAM on the client to improve this).


You might be right.  But it's not a show stopper.



"and then stops with an error" - is that on the first file, or are 
some files successfully transferred? Is that the first large file? 
Does it always fail on the same file (seems not, since it previously 
got many more).


Good points.  Confirmed: Not the first file (over 600,000 files 
transferred first), not a large file (less than 20Meg), does not always 
fail on the same file or directory.




I'm thinking you need to check and/or improve network reliability, 
make sure both client and server are not running out of RAM/etc 
(mainly the backuppc client, the OOM might kill the rsync process), 
etc. Check your system logs on both client and server, and/or watch 
top output on both systems during the backup.


The network did not miss a beat and generally appears responsive. It has 
been checked.  The client and server RAM usage are tracked in Zabbix and 
not close to running out.  Only curious thing is swap is running out on 
the client (Windows Server 2016) even with 10GB RAM available, but still 
has about 2GB before crash.  Server system logs (kern.log, syslog) show 
no signs of issues.




Try backing up other systems, try backing up a smaller subset (exclude 
some large directories, and then add them back in if you complete a 
backup successfully).


That is a good idea.  I'll try adding incrementally to the data backed up.



Overall, I would advise to upgrade to BPC v4.x, it handles backups of 
systems with huge 

[BackupPC-users] PoolSizeNightlyUpdatePeriod and BackupPCNightlyPeriod

2019-07-19 Thread Gandalf Corvotempesta
On official docs, max value is 16. As I have a very slow server with a
very big pool, is 16 the max value I can set or it's just an example ?

Can I set it to 32 ? cntUpdate phase is taking ages... (3 days)


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/