On 14/01/16 08:14, Gandalf Corvotempesta wrote: > 2016-01-13 22:07 GMT+01:00 Les Mikesell <lesmikes...@gmail.com>: >> Did your strace test show a hanging system call on any of the active >> processes in this time? > Nothing is hanged. When this occurs, no transfer is happening via network > and both "rsync_bpc" processes are parsing tons of these: > > > FIRST PROCES: > read(5, "82f3016ca8f4b309aa141fb1aee9dfb0"..., 8184) = 4117 > select(6, [5], [], NULL, {60, 0}) = 1 (in [5], left {59, 855296}) > read(5, "34166a01128e65c0e98ce44442a634ea"..., 8184) = 4093 > select(6, [5], [], NULL, {60, 0}) = 1 (in [5], left {59, 988318}) > > SECOND PROCESS: > select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {59, 999934}) > read(3, "63db4c25f333ca\0\202\3Vg9\211\247\346\"N\233<\215\320x\4\272"..., > 4092) = 2896 > select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {59, 999675}) > read(3, > "197addc7b4\0001\2V\v\v\222\2655\242\213\242\310P\222\343\331\211Q\244T\237"..., > 1196) = 1196 > select(7, NULL, [6], [6], {60, 0}) = 1 (out [6], left {59, 999999}) > write(6, "690f3f5dc38571ac1d63db4c25f333ca"..., 4092) = 4092 > select(7, NULL, [6], [6], {60, 0}) = 1 (out [6], left {59, 999998}) > write(6, "\363a\306v}\376\364@\241\236{:A ", 14) = 14 > select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {59, 906237}) > read(3, "\374\17\0\7", 4) = 4 > select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {59, 999806}) > read(3, "ff5cd88ec882380bd1ade93a3c24\0M\2V"..., 4092) = 2896 > > > What's the meaning of two rsync_bpc processes? > > >> That's not at all like what rsync would be doing when it merges >> changes to a compressed file. > I know, but having a slow disk would slow down also rsync and bpc. > This test told me that disks are working properly and bottleneck shold > be somewhere else. >
Ummm, really? I think you are confused. Depending on where exactly the above processes are reading or writing to (most likely it isn't network, which means it is almost certainly your backuppc server disks) will tell you where the bottleneck is. You have identified that the client is providing the data to the server quickly enough, but the server is too slow to process this data (ie, do whatever needs to happen to save it in the correct place). This is almost certainly one or more of the following reasons: 1) Slow I/O 2) Not enough RAM leading to not enough cache leading to slow I/O 3) Slow CPU You provided some other information in another email: > Raw performance by direct rsync between these two servers: > > receiving incremental file list > test.img > 1,073,741,824 100% 25.04MB/s 0:00:40 (xfr#1, to-chk=0/1) > > sent 77 bytes received 1,073,873,061 bytes 24,686,738.80 bytes/sec > total size is 1,073,741,824 speedup is 1.00 > > > 24.5MB/s, not too much, but not too bad. 24 times faster than BPC > (with BPC i got about 1MB/s) This is completely rubbish, it isn't a useful comparison of anything. I am almost certain that your actual client isn't made up of files with a average size of 1GB. In fact the snippet of the rsyncd log that you previously provided showed very small files. It still isn't a meaningful comparison, but at least it is more realistic if you used the actual files you are trying to backup, even if it is only a subset of them. BTW, the reason it isn't so relevant is because backuppc does a lot more work on the server side than plain rsync, the client side performance is relevant, and could at least show that the client is capable. > raw performance writing to disk on BPC server (same partitions used by > BPC as storage): > > # dd if=/dev/zero of=/var/backups/test.img bs=1M count=10000 > ^C8815+0 records in > 8815+0 records out > 9243197440 bytes (9.2 GB) copied, 142.267 s, 65.0 MB/s Totally irrelevant. BackupPC is doing lots of small random reads and writes. However, maybe that is relevant, because 65MB/s on any single HDD from the past 5 years, let alone a RAID array is abysmal for streaming writes. Even a single drive should be capable of at least 100MB/s. Here is the same statistic from one of my BPC v3 servers: dr:/mnt/imagestore# dd if=/dev/zero of=/var/backups/test.img bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 16.135 s, 650 MB/s This is a LV sitting on a RAID5 array: md0 : active raid5 sde1[4] sdc1[3] sdd1[2] sdb1[0] 11720658432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] Which is using these drives: Model Family: Western Digital Red (AF) Device Model: WDC WD40EFRX-68WT0N0 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm I'm sure you said you had 7200rpm disks, so you should get even better performance for both random r/w as well as streaming writes. Which brings me back to my earlier concern that you are using a VM for backuppc, it is sharing it's performance with other things, which works very poorly when dealing with spinning disks (even a streaming write like your example is mixed with other random I/O which means it kills performance). Please diagnose and resolve the underlying performance issues, then come back to BPC and see how it performs. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/