Re: problems encountered in 2.4.6
I've had rsync hangs when transferring hug filesystems (~80Gb) over network, but till i've suppress the -v option from my command line there's no hang anymore hang. The -v option under 2.4.6 is bugged, try to mutiplie v's and the hangs will increase too. ( rsync -axWP --rsync-path=/usr/local/bin/rsync --stat --delete source target) David Bolen wrote: [EMAIL PROTECTED] [[EMAIL PROTECTED]] writes: Actually, the lack of -W isn't helping me at all. The reason is that even for the stuff I do over the network, 99% of it is compressed with gzip or bzip2. If the files change, the originals were changed and a new compression is made, and usually most of the file is different. Just to clarify, when you say over the network you mean in true client/server rsync (or across an rsh/ssh stream) and not just using one rsync with references using network mount points, right? In the latter case, not having -W is hurting you, never helping. But yes, any format (e.g., encryption, compression) that effectively distributes changes randomly over a file is going to be a killer for rsync. For the case of gzip'd files when a client and server rsync are in use, you may want to look back through the archives of this list - there was a reference to a patch for the gzip sources that created rsync-friendly gzip's. Not as great as the non-gzip'd version, but far better than normal gzip. Ah yes - here was the URL: http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2 At the time when I tried it (1/2001), here were some test results: For comparison, here's a database file (delta between one day and the next), both uncompressed and gzip'd (normal and -9). For the uncompressed I also transferred with a fixed 1K blocksize since I know that's the page size for the database - the others are default computations (I tried the 1K with the gzip'd version but it was worse, as expected). Normal Normal+1Kgzip gzip-9 Size54206464 54206464 21867539 21845091 Wrote29021821011490 31698643214740 Read 60176 31764860350 60290 Total29623581329138 32302143275030 Speedup18.30 40.786.77 6.67 Compression 1.00 1.002.479 2.481 Normalized 18.30 40.78 16.78 16.54 And in terms of size: As Rusty's page comments, they are slightly larger, but not tremendously so. In my one case: Normal gzip:21627629 gzip --rsyncable: 21867539 gzip -9 --rsyncable:21845091 So about a 1-1.1% hit in compressed size. Personally, here we end up just leaving the major stuff we transfer uncompressed - as we're using slow analog lines, the cost recovery was easily worth the cost in disk space, particularly in cases like our databases where knowledge of the page size and method of change goes a long way. It definitely helped for transferring ISO images where the whole image would be changed if some files changed. I set the chunk size to 2048 for that. Why it defaults to 700 seems odd to me. Not sure - perhaps some early empirical work. When I'm moving files that I know something about I definitely control the block size myself, so for example, when moving databases with a 1K page size, I always use a multiple of that (since I know a priori that's how the database dirties the file), and then I scale that up a bit based on database size, to get a reasonable tradeoff between block overhead and extra transfer upon a change detection. -- David /---\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \---/ -- @ Remi LAPORTE @ @ TEXAS INSTRUMENTS UNIX SUPPORT @ @[EMAIL PROTECTED]@
Re: problems encountered in 2.4.6
On Tue, May 29, 2001 at 12:02:41PM -0500, Phil Howard wrote: Dave Dykstra wrote: On Fri, May 25, 2001 at 02:19:59PM -0500, Dave Dykstra wrote: ... Use the -W option to disable the rsync algorithm. We really ought to make that the default when both the source and destination are local. I went ahead and submitted a change to the rsync CVS to automatically turn on -W when the source and destination are both on the local machine. So how do I revert that on the command line? I've been trying with -W doing my disk to disk backups, and I've had to go back to not using -W. Will -c do that? There's currently no way to revert it. I thought it wouldn't be necessary, and I'm not sure how to do it cleanly, because there's currently no precedent in rsync for a general undoing of options that have different defaults depending on the situation. Another one that comes to mind is --block-io. The latest rsync in CVS is now using the popt package to process options intead of getopt. Does anybody know if that package has a standard way to negate options, for example prefixing a no (like --no-block-io) or something like that? I took a quick look through the man page and it wasn't obvious. The reason is the load on the machine gets so high, nothing else can run. This is not CPU load, but rather, buffering/swapping load. CPU load just slows other things down. But buffering/swapping load brings other things to a grinding halt. I suspect Linux's tendency to want to keep everything that anything writes in RAM, even if that means swapping out all other processes, is impacted by this. So I'll need a way to not have the effect of -W to use rsync for disk to disk backups. Wow. Rsync is just going too fast for it I guess. The -W makes it do a lot of unnecessary disk I/O which must be enough to throttle its progress. Sure seems like leaving out -W is the wrong solution. Maybe -W has to turn off more of rsync's pipelining since it is no longer performing the rsync algorithm. The fact that rsync loads so much into VM probably makes the problem a bit worse in this case. I saw 1 process at 35M and 2 processes at 70M (total 175M used by rsync, in addition to all the buffered writes). Does -W have an impact on that? I would think that if anything -W would lessen that effect. I'm wondering if rsync is even a good choice for disk to disk backup duty. Is there some option I missed that disables pre-loading all the file names into memory? Maybe it isn't. There is no such option. I also tried the --bwlimit option and it had no effect, not even on the usual download syncronizing over a dialup that I do. I could not get it to pace the rate below the dialup speed no matter what I would specify. I haven't used the --bwlimit option and don't really know how it works. I remember when somebody contributed it that I was skeptical about how well it could work. I'm especially not surprised that it has no impact on local-to-local transfers. - Dave Dykstra
Re: problems encountered in 2.4.6
On Fri, May 25, 2001 at 02:19:59PM -0500, Dave Dykstra wrote: ... Use the -W option to disable the rsync algorithm. We really ought to make that the default when both the source and destination are local. I went ahead and submitted a change to the rsync CVS to automatically turn on -W when the source and destination are both on the local machine. - Dave Dykstra
rsync 3 (was Re: problems encountered in 2.4.6)
There is a feature I would like, and I notice that even with -c this does not happen, but I think it could based on the way rsync works. What I'd like to have is when a whole file is moved from one directory to another, rsync would detect a new file with the same checksum as an existing (potentially to be deleted) file, and copy, move, or link, as appropriate. In theory this should apply to anything anywhere in the whole file tree being processed. See the note I posted on May 17th, title is Storing updates and it includes a tcl script i run on rsync -n output to spot obvious renames of files and gzip'ing + takes evasive action. It would be excellent if rsync could do this sort of thing for me. The basic principle is that if you are using --delete then when a file is missing a good place to look is in the list of deletions. I spoke to Rusty Russell last November when he was visiting Dublin and he mentioned there had been some thinking about an rsync 3. One feature being considered was allowing users to supply arbitrary rules for what to do when a file is missing, based on file suffix etc. Did anyone follow up these ideas? John
Re: problems encountered in 2.4.6
Dave Dykstra wrote: 2 = When syncronizing a very large number of files, all files in a large partition, rsync frequently hangs. It's about 50% of the time, but seems to be a function of how much work there was to be done. That is, if I run it soon after it just ran, it tends to not hang, but if I run it after quite some time (and lots of stuff to syncronize) it tends to hang. It appears to have completed all the files, but I don't get any stats. There are 3 rsync processes sitting idle with no files open in the source or target trees. At last count there were 368827 files and 8083 symlinks in 21749 directories. df shows: /dev/hda4 42188460 38303916 3884544 91% /home /dev/hdb4 42188460 38301972 3886488 91% /mnt/hdb/home df -i shows: /dev/hda42662400 398419 2263981 15% /home /dev/hdb42662400 398462 2263938 15% /mnt/hdb/home The df numbers are not exact because change is constantly happening on this active server. Drives hda and hdb are identical and are partitioned alike. The command line is echoed from the script that runs it: rsync -axv --stats --delete /home/. /mnt/hdb/home/. 1'/home/root/backup-hda-to-hdb/home.log' 21 Use the -W option to disable the rsync algorithm. We really ought to make that the default when both the source and destination are local. I don't want to copy everything every time. That's why I am using rsync to do this in the first place. I don't understand why this would be what's hanging. A deadly embrace? It seems possible. No, the receiving side of an rsync transaction splits itself into two processes for the sake of pipelining: one to generate checksums and one to accept updates. When you're sending and receiving to the same machine then you've got one sender and 2 receivers. Right. But what I was suggesting was a deadly embrace in that the process killed was waiting for something, and the parent was waiting for something. I'm not using the c option, so why would checksum be generated? I'm also curious why 26704 has no fd 1. I don't know. When I tried it all 3 processes had an fd 1. Were you looking at it after it hung? Or is it not hanging for you? I am curious if the lack of fd 1 is related to the hang. It is being started with 1 and 2 redirected to a log file _and_ the whole thing is being run via the script command for a big picture logfile. It was set up this way with the intent to run it from cron, although I haven't actually added it to crontab, yet, due to the problems. 3 = @ERROR: max connections (16) reached - try again later This occurs after just one connection is active. It behaves as if I had specified max connections = 1. On another server I set it to 40, and it showed: @ERROR: max connections (40) reached - try again later so it obvious is parsing and keeping the value I configure, but it isn't using it correctly. Also, if I ^C the client, then I get this error every time until I restart the daemon (running in standalone daemon mode, not inetd). So it seems like it counts clients wrong. But I can't get more that 1 right after restarting the server, so it's a little more than that somewhere. I don't know, I never used max connections. Could indeed be a bug. The code looks pretty tricky. It's trying to lock pieces of the file /var/run/rsyncd.lock in order for independent processes to coordinate. Are you running as root (the lsof above suggests you are)? If not, you probably need to specify another file that your daemon has access to in the lock file option. Otherwise it would probably help for you to run some straces. I would have presumed since there was a daemon process running (as opposed to running from inetd) that the daemon itself could simply track the connection count. One possibility here is that I do have /var/run symlinked to /ram/run which is on a ramdisk. So the lock file is there. The file is there but it is empty. Should it have data in it? BTW, it was in ramdisk in 2.4.4 and this max connections problem did not exist, so if there is a ramdisk sensitivity, it's new since 2.4.4. -- - | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ | -
Re: problems encountered in 2.4.6
On Fri, May 25, 2001 at 04:33:28PM -0500, Phil Howard wrote: Dave Dykstra wrote: One possibility here is that I do have /var/run symlinked to /ram/run which is on a ramdisk. So the lock file is there. The file is there but it is empty. Should it have data in it? BTW, it was in ramdisk in 2.4.4 and this max connections problem did not exist, so if there is a ramdisk sensitivity, it's new since 2.4.4. I don't know if it will show up with data in it or not, I've never tried it. You'll probably need to do some straces. Where is the count of number of current connections supposed to be kept? It's obviously not actually being kept in this file, at least not when on a ramdisk. But if it's supposed to be, that's the problem. OTOH, it is easy to get the count out of sync this way, too. If a process is killed or otherwise just dies, the count is higher than real. When I do multi- process servers with controlled process counts, I like to have the parent track the number of children running. Of course that precludes using inetd. It locks different ranges of bytes of the file rather than keeping a count in it. I guess the idea with that is if a process dies the operating system will automatically remove the lock. - Dave Dykstra
RE: problems encountered in 2.4.6
[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes: Dave Dykstra wrote: That's two different kinds of checksums. The -c option runs a whole-file checksum on both sides, but if you don't use -W the rsync rolling checksum will be applied. So the chunk-by-chunk checksum always is used w/o -W? I guess the docs are more confusing than I originally thought. It might help if you think of it as two phases - discovery of what files need to be transferred, and then the transfer itself. The discovery phase will by default just check timestamps and sizes. You can adjust that with command line options, including the use of -c to include a full file checksum as part of the comparison, if for example, files might change without affecting timestamp or size. Once rsync knows what it needs to transfer, then it works its way through the file list, and for each file it performs a transfer. By default, that transfer is the rsync protocol - which involves the full process of dividing the file into chunks with both a strong and rolling checksum, and doing the computations to figure out what parts to send and so on. Now, normally this process is divided so that the copy of rsync that does the I/O is local to the file - e.g., for discovery both client and server rsync identify file timestamp/sizes independently (and optionally compute the checksums locally) and then exchange that information. For transfer both rsyncs build up the rolling and chunk checksums and exchange them and then decide what file data to send. But when you are copying with a single rsync (and in particular when one of the files is on the network), then that rsync has to do all the work. That means that during discovery it either 'stat's all files or optionally computes checksums. To do the checksum it has to read the file, so both source and destination get read fully - if either are on the network you will have already spent the network traffic to pull the complete files back to the local machine. Likewise for the transfer - under the rsync protocol, rsync has to compute the checksums for both source and destination files. Now, it'll only do this for those that it wants to transfer, but in those cases it effectively pulls back complete files from the network just to compute the checksums, only to then start transferring them. Even if the rsync protocol yields a very small amount of difference, anything beyond that point is already more than the full file with respect to the network activity that takes place. That's why the -W option is really the only logical thing to use with a single rsync and local (on-system or network share/mount) copies. Under such circumstances, the rsync protocol isn't going to help at all, and will probably slow things down and take more memory instead. With -W rsync becomes an intelligent copier (in terms of figuring out what changed), but that's about it. -- David /---\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \---/
Re: problems encountered in 2.4.6
David Bolen wrote: The discovery phase will by default just check timestamps and sizes. You can adjust that with command line options, including the use of -c to include a full file checksum as part of the comparison, if for example, files might change without affecting timestamp or size. Once rsync knows what it needs to transfer, then it works its way through the file list, and for each file it performs a transfer. By default, that transfer is the rsync protocol - which involves the full process of dividing the file into chunks with both a strong and rolling checksum, and doing the computations to figure out what parts to send and so on. This is where the docs were a bit confusing. There was no clear distinction of checksum types related to the -c option. This implied to me that w/o -c there would be no checksum at all, and what I thought the behaviour would be was what I now understand it to be with -W. That's why the -W option is really the only logical thing to use with a single rsync and local (on-system or network share/mount) copies. Under such circumstances, the rsync protocol isn't going to help at all, and will probably slow things down and take more memory instead. With -W rsync becomes an intelligent copier (in terms of figuring out what changed), but that's about it. Actually, the lack of -W isn't helping me at all. The reason is that even for the stuff I do over the network, 99% of it is compressed with gzip or bzip2. If the files change, the originals were changed and a new compression is made, and usually most of the file is different. It definitely helped for transferring ISO images where the whole image would be changed if some files changed. I set the chunk size to 2048 for that. Why it defaults to 700 seems odd to me. There is a feature I would like, and I notice that even with -c this does not happen, but I think it could based on the way rsync works. What I'd like to have is when a whole file is moved from one directory to another, rsync would detect a new file with the same checksum as an existing (potentially to be deleted) file, and copy, move, or link, as appropriate. In theory this should apply to anything anywhere in the whole file tree being processed. -- - | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ | -
RE: problems encountered in 2.4.6
[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes: Actually, the lack of -W isn't helping me at all. The reason is that even for the stuff I do over the network, 99% of it is compressed with gzip or bzip2. If the files change, the originals were changed and a new compression is made, and usually most of the file is different. Just to clarify, when you say over the network you mean in true client/server rsync (or across an rsh/ssh stream) and not just using one rsync with references using network mount points, right? In the latter case, not having -W is hurting you, never helping. But yes, any format (e.g., encryption, compression) that effectively distributes changes randomly over a file is going to be a killer for rsync. For the case of gzip'd files when a client and server rsync are in use, you may want to look back through the archives of this list - there was a reference to a patch for the gzip sources that created rsync-friendly gzip's. Not as great as the non-gzip'd version, but far better than normal gzip. Ah yes - here was the URL: http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2 At the time when I tried it (1/2001), here were some test results: For comparison, here's a database file (delta between one day and the next), both uncompressed and gzip'd (normal and -9). For the uncompressed I also transferred with a fixed 1K blocksize since I know that's the page size for the database - the others are default computations (I tried the 1K with the gzip'd version but it was worse, as expected). Normal Normal+1Kgzip gzip-9 Size54206464 54206464 21867539 21845091 Wrote29021821011490 31698643214740 Read 60176 31764860350 60290 Total29623581329138 32302143275030 Speedup18.30 40.786.77 6.67 Compression 1.00 1.002.479 2.481 Normalized 18.30 40.78 16.78 16.54 And in terms of size: As Rusty's page comments, they are slightly larger, but not tremendously so. In my one case: Normal gzip:21627629 gzip --rsyncable: 21867539 gzip -9 --rsyncable:21845091 So about a 1-1.1% hit in compressed size. Personally, here we end up just leaving the major stuff we transfer uncompressed - as we're using slow analog lines, the cost recovery was easily worth the cost in disk space, particularly in cases like our databases where knowledge of the page size and method of change goes a long way. It definitely helped for transferring ISO images where the whole image would be changed if some files changed. I set the chunk size to 2048 for that. Why it defaults to 700 seems odd to me. Not sure - perhaps some early empirical work. When I'm moving files that I know something about I definitely control the block size myself, so for example, when moving databases with a 1K page size, I always use a multiple of that (since I know a priori that's how the database dirties the file), and then I scale that up a bit based on database size, to get a reasonable tradeoff between block overhead and extra transfer upon a change detection. -- David /---\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \---/