On 14/01/16 08:14, Gandalf Corvotempesta wrote:
> 2016-01-13 22:07 GMT+01:00 Les Mikesell <lesmikes...@gmail.com>:
>> Did your strace test show a hanging system call on any of the active
>> processes in this time?
> Nothing is hanged. When this occurs, no transfer is happening via network
> and both "rsync_bpc" processes are parsing tons of these:
>
>
> FIRST PROCES:
> read(5, "82f3016ca8f4b309aa141fb1aee9dfb0"..., 8184) = 4117
> select(6, [5], [], NULL, {60, 0})       = 1 (in [5], left {59, 855296})
> read(5, "34166a01128e65c0e98ce44442a634ea"..., 8184) = 4093
> select(6, [5], [], NULL, {60, 0})       = 1 (in [5], left {59, 988318})
>
> SECOND PROCESS:
> select(4, [3], [], NULL, {60, 0})       = 1 (in [3], left {59, 999934})
> read(3, "63db4c25f333ca\0\202\3Vg9\211\247\346\"N\233<\215\320x\4\272"...,
> 4092) = 2896
> select(4, [3], [], NULL, {60, 0})       = 1 (in [3], left {59, 999675})
> read(3, 
> "197addc7b4\0001\2V\v\v\222\2655\242\213\242\310P\222\343\331\211Q\244T\237"...,
> 1196) = 1196
> select(7, NULL, [6], [6], {60, 0})      = 1 (out [6], left {59, 999999})
> write(6, "690f3f5dc38571ac1d63db4c25f333ca"..., 4092) = 4092
> select(7, NULL, [6], [6], {60, 0})      = 1 (out [6], left {59, 999998})
> write(6, "\363a\306v}\376\364@\241\236{:A ", 14) = 14
> select(4, [3], [], NULL, {60, 0})       = 1 (in [3], left {59, 906237})
> read(3, "\374\17\0\7", 4)               = 4
> select(4, [3], [], NULL, {60, 0})       = 1 (in [3], left {59, 999806})
> read(3, "ff5cd88ec882380bd1ade93a3c24\0M\2V"..., 4092) = 2896
>
>
> What's the meaning of two rsync_bpc processes?
>
>
>> That's not at all like what rsync would be doing when it merges
>> changes to a compressed file.
> I know, but having a slow disk would slow down also rsync and bpc.
> This test told me that disks are working properly and bottleneck shold
> be somewhere else.
>

Ummm, really? I think you are confused. Depending on where exactly the 
above processes are reading or writing to (most likely it isn't network, 
which means it is almost certainly your backuppc server disks) will tell 
you where the bottleneck is. You have identified that the client is 
providing the data to the server quickly enough, but the server is too 
slow to process this data (ie, do whatever needs to happen to save it in 
the correct place). This is almost certainly one or more of the 
following reasons:
1) Slow I/O
2) Not enough RAM leading to not enough cache leading to slow I/O
3) Slow CPU

You provided some other information in another email:
> Raw performance by direct rsync between these two servers:
>
> receiving incremental file list
> test.img
>    1,073,741,824 100%   25.04MB/s    0:00:40 (xfr#1, to-chk=0/1)
>
> sent 77 bytes  received 1,073,873,061 bytes  24,686,738.80 bytes/sec
> total size is 1,073,741,824  speedup is 1.00
>
>
> 24.5MB/s, not too much, but not too bad. 24 times faster than BPC
> (with BPC i got about 1MB/s)
This is completely rubbish, it isn't a useful comparison of anything. I 
am almost certain that your actual client isn't made up of files with a 
average size of 1GB. In fact the snippet of the rsyncd log that you 
previously provided showed very small files. It still isn't a meaningful 
comparison, but at least it is more realistic if you used the actual 
files you are trying to backup, even if it is only a subset of them.

BTW, the reason it isn't so relevant is because backuppc does a lot more 
work on the server side than plain rsync, the client side performance is 
relevant, and could at least show that the client is capable.
> raw performance writing to disk on BPC server (same partitions used by
> BPC as storage):
>
> # dd if=/dev/zero of=/var/backups/test.img bs=1M count=10000
> ^C8815+0 records in
> 8815+0 records out
> 9243197440 bytes (9.2 GB) copied, 142.267 s, 65.0 MB/s
Totally irrelevant. BackupPC is doing lots of small random reads and 
writes. However, maybe that is relevant, because 65MB/s on any single 
HDD from the past 5 years, let alone a RAID array is abysmal for 
streaming writes. Even a single drive should be capable of at least 100MB/s.

Here is the same statistic from one of my BPC v3 servers:
dr:/mnt/imagestore# dd if=/dev/zero of=/var/backups/test.img bs=1M 
count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 16.135 s, 650 MB/s

This is a LV sitting on a RAID5 array:
md0 : active raid5 sde1[4] sdc1[3] sdd1[2] sdb1[0]
       11720658432 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[4/4] [UUUU]

Which is using these drives:
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD40EFRX-68WT0N0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm

I'm sure you said you had 7200rpm disks, so you should get even better 
performance for both random r/w as well as streaming writes. Which 
brings me back to my earlier concern that you are using a VM for 
backuppc, it is sharing it's performance with other things, which works 
very poorly when dealing with spinning disks (even a streaming write 
like your example is mixed with other random I/O which means it kills 
performance).

Please diagnose and resolve the underlying performance issues, then come 
back to BPC and see how it performs.

Regards,
Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to