Re: problems encountered in 2.4.6
I've had rsync hangs when transferring hug filesystems (~80Gb) over network, but till i've suppress the -v option from my command line there's no hang anymore hang. The -v option under 2.4.6 is bugged, try to mutiplie v's and the hangs will increase too. ( rsync -axWP --rsync-path=/usr/local/bin/rsync --stat --delete source target) David Bolen wrote: > [EMAIL PROTECTED] [[EMAIL PROTECTED]] writes: > > > Actually, the lack of -W isn't helping me at all. The reason is that > > even for the stuff I do over the network, 99% of it is compressed with > > gzip or bzip2. If the files change, the originals were changed and a > > new compression is made, and usually most of the file is different. > > Just to clarify, when you say "over the network" you mean in true > client/server rsync (or across an rsh/ssh stream) and not just using > one rsync with references using network mount points, right? In the > latter case, not having -W is hurting you, never helping. > > But yes, any format (e.g., encryption, compression) that effectively > distributes changes randomly over a file is going to be a killer for > rsync. > > For the case of gzip'd files when a client and server rsync are in > use, you may want to look back through the archives of this list - > there was a reference to a patch for the gzip sources that created > rsync-friendly gzip's. Not as great as the non-gzip'd version, but > far better than normal gzip. > > Ah yes - here was the URL: > > http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2 > > At the time when I tried it (1/2001), here were some test results: > > For comparison, here's a database file (delta between one day and the > next), both uncompressed and gzip'd (normal and -9). For the > uncompressed I also transferred with a fixed 1K blocksize since I know > that's the page size for the database - the others are default > computations (I tried the 1K with the gzip'd version but it was > worse, as expected). > > Normal Normal+1Kgzip gzip-9 > Size54206464 54206464 21867539 21845091 > Wrote29021821011490 31698643214740 > Read 60176 31764860350 60290 > Total29623581329138 32302143275030 > > Speedup18.30 40.786.77 6.67 > Compression 1.00 1.002.479 2.481 > Normalized 18.30 40.78 16.78 16.54 > > And in terms of size: > > As Rusty's page comments, they are slightly larger, but not > tremendously so. In my one case: > > Normal gzip:21627629 > gzip --rsyncable: 21867539 > gzip -9 --rsyncable:21845091 > > So about a 1-1.1% hit in compressed size. > > Personally, here we end up just leaving the major stuff we transfer > uncompressed - as we're using slow analog lines, the cost recovery was > easily worth the cost in disk space, particularly in cases like our > databases where knowledge of the page size and method of change goes a > long way. > > > It definitely helped for transferring ISO images where the whole image > > would be changed if some files changed. I set the chunk size to 2048 > > for that. Why it defaults to 700 seems odd to me. > > Not sure - perhaps some early empirical work. When I'm moving files > that I know something about I definitely control the block size > myself, so for example, when moving databases with a 1K page size, I > always use a multiple of that (since I know a priori that's how the > database "dirties" the file), and then I scale that up a bit based on > database size, to get a reasonable tradeoff between block overhead and > extra transfer upon a change detection. > > -- David > > /---\ > \ David Bolen\ E-mail: [EMAIL PROTECTED] / > | FitLinxx, Inc.\ Phone: (203) 708-5192| > / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ > \---/ -- @ Remi LAPORTE @ @ TEXAS INSTRUMENTS UNIX SUPPORT @ @[EMAIL PROTECTED]@
RE: problems encountered in 2.4.6
>I haven't used the --bwlimit option and don't really know how it works. >I remember when somebody contributed it that I was skeptical about how >well it could work. I'm especially not surprised that it has no impact >on local-to-local transfers. I have used --bwlimit and it works a treat. It is in a specialised situation where I am using 20 simultaneous rsyncs (don't ask...) where each rsync is limited to 128 Kbyte/sec. The 30Mbit/sec link I am sending over sits exactly at 20Mbit/sec, which is what I wanted. Cheers Mark CAUTION - This message may contain privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message you are hereby notified that any use, dissemination, distribution or reproduction of this message is prohibited. If you have received this message in error please notify Air New Zealand immediately. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of Air New Zealand. _ For more information on the Air New Zealand Group, visit us online at http://www.airnewzealand.com or http://www.ansett.com.au _
Re: problems encountered in 2.4.6
On Tue, May 29, 2001 at 12:02:41PM -0500, Phil Howard wrote: > Dave Dykstra wrote: > > > On Fri, May 25, 2001 at 02:19:59PM -0500, Dave Dykstra wrote: > > ... > > > Use the -W option to disable the rsync algorithm. We really ought to make > > > that the default when both the source and destination are local. > > > > I went ahead and submitted a change to the rsync CVS to automatically turn > > on -W when the source and destination are both on the local machine. > > So how do I revert that on the command line? > > I've been trying with -W doing my disk to disk backups, and I've had > to go back to not using -W. Will -c do that? There's currently no way to revert it. I thought it wouldn't be necessary, and I'm not sure how to do it cleanly, because there's currently no precedent in rsync for a general undoing of options that have different defaults depending on the situation. Another one that comes to mind is --block-io. The latest rsync in CVS is now using the "popt" package to process options intead of getopt. Does anybody know if that package has a standard way to negate options, for example prefixing a "no" (like --no-block-io) or something like that? I took a quick look through the man page and it wasn't obvious. > The reason is the load > on the machine gets so high, nothing else can run. This is not CPU > load, but rather, buffering/swapping load. CPU load just slows other > things down. But buffering/swapping load brings other things to a > grinding halt. I suspect Linux's tendency to want to keep everything > that anything writes in RAM, even if that means swapping out all other > processes, is impacted by this. So I'll need a way to not have the > effect of -W to use rsync for disk to disk backups. Wow. Rsync is just going too fast for it I guess. The -W makes it do a lot of unnecessary disk I/O which must be enough to throttle its progress. Sure seems like leaving out -W is the wrong solution. Maybe -W has to turn off more of rsync's pipelining since it is no longer performing the rsync algorithm. > The fact that rsync loads so much into VM probably makes the problem > a bit worse in this case. I saw 1 process at 35M and 2 processes at > 70M (total 175M used by rsync, in addition to all the buffered writes). Does -W have an impact on that? I would think that if anything -W would lessen that effect. > I'm wondering if rsync is even a good choice for disk to disk backup > duty. Is there some option I missed that disables pre-loading all > the file names into memory? Maybe it isn't. There is no such option. > I also tried the --bwlimit option and it had no effect, not even on > the usual download syncronizing over a dialup that I do. I could > not get it to pace the rate below the dialup speed no matter what > I would specify. I haven't used the --bwlimit option and don't really know how it works. I remember when somebody contributed it that I was skeptical about how well it could work. I'm especially not surprised that it has no impact on local-to-local transfers. - Dave Dykstra
Re: problems encountered in 2.4.6
Dave Dykstra wrote: > On Fri, May 25, 2001 at 02:19:59PM -0500, Dave Dykstra wrote: > ... > > Use the -W option to disable the rsync algorithm. We really ought to make > > that the default when both the source and destination are local. > > I went ahead and submitted a change to the rsync CVS to automatically turn > on -W when the source and destination are both on the local machine. So how do I revert that on the command line? I've been trying with -W doing my disk to disk backups, and I've had to go back to not using -W. Will -c do that? The reason is the load on the machine gets so high, nothing else can run. This is not CPU load, but rather, buffering/swapping load. CPU load just slows other things down. But buffering/swapping load brings other things to a grinding halt. I suspect Linux's tendency to want to keep everything that anything writes in RAM, even if that means swapping out all other processes, is impacted by this. So I'll need a way to not have the effect of -W to use rsync for disk to disk backups. The fact that rsync loads so much into VM probably makes the problem a bit worse in this case. I saw 1 process at 35M and 2 processes at 70M (total 175M used by rsync, in addition to all the buffered writes). I'm wondering if rsync is even a good choice for disk to disk backup duty. Is there some option I missed that disables pre-loading all the file names into memory? I also tried the --bwlimit option and it had no effect, not even on the usual download syncronizing over a dialup that I do. I could not get it to pace the rate below the dialup speed no matter what I would specify. -- - | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ | -
Re: problems encountered in 2.4.6
On Fri, May 25, 2001 at 02:19:59PM -0500, Dave Dykstra wrote: ... > Use the -W option to disable the rsync algorithm. We really ought to make > that the default when both the source and destination are local. I went ahead and submitted a change to the rsync CVS to automatically turn on -W when the source and destination are both on the local machine. - Dave Dykstra
rsync 3 (was Re: problems encountered in 2.4.6)
> There is a feature I would like, and I notice that even with -c this > does not happen, but I think it could based on the way rsync works. > What I'd like to have is when a whole file is moved from one directory > to another, rsync would detect a new file with the same checksum as an > existing (potentially to be deleted) file, and copy, move, or link, as > appropriate. In theory this should apply to anything anywhere in the > whole file tree being processed. See the note I posted on May 17th, title is "Storing updates" and it includes a tcl script i run on rsync -n output to spot obvious renames of files and gzip'ing + takes evasive action. It would be excellent if rsync could do this sort of thing for me. The basic principle is that if you are using --delete then when a file is missing a good place to look is in the list of deletions. I spoke to Rusty Russell last November when he was visiting Dublin and he mentioned there had been some thinking about an "rsync 3". One feature being considered was allowing users to supply arbitrary rules for what to do when a file is missing, based on file suffix etc. Did anyone follow up these ideas? John
RE: problems encountered in 2.4.6
[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes: > Actually, the lack of -W isn't helping me at all. The reason is that > even for the stuff I do over the network, 99% of it is compressed with > gzip or bzip2. If the files change, the originals were changed and a > new compression is made, and usually most of the file is different. Just to clarify, when you say "over the network" you mean in true client/server rsync (or across an rsh/ssh stream) and not just using one rsync with references using network mount points, right? In the latter case, not having -W is hurting you, never helping. But yes, any format (e.g., encryption, compression) that effectively distributes changes randomly over a file is going to be a killer for rsync. For the case of gzip'd files when a client and server rsync are in use, you may want to look back through the archives of this list - there was a reference to a patch for the gzip sources that created rsync-friendly gzip's. Not as great as the non-gzip'd version, but far better than normal gzip. Ah yes - here was the URL: http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2 At the time when I tried it (1/2001), here were some test results: For comparison, here's a database file (delta between one day and the next), both uncompressed and gzip'd (normal and -9). For the uncompressed I also transferred with a fixed 1K blocksize since I know that's the page size for the database - the others are default computations (I tried the 1K with the gzip'd version but it was worse, as expected). Normal Normal+1Kgzip gzip-9 Size54206464 54206464 21867539 21845091 Wrote29021821011490 31698643214740 Read 60176 31764860350 60290 Total29623581329138 32302143275030 Speedup18.30 40.786.77 6.67 Compression 1.00 1.002.479 2.481 Normalized 18.30 40.78 16.78 16.54 And in terms of size: As Rusty's page comments, they are slightly larger, but not tremendously so. In my one case: Normal gzip:21627629 gzip --rsyncable: 21867539 gzip -9 --rsyncable:21845091 So about a 1-1.1% hit in compressed size. Personally, here we end up just leaving the major stuff we transfer uncompressed - as we're using slow analog lines, the cost recovery was easily worth the cost in disk space, particularly in cases like our databases where knowledge of the page size and method of change goes a long way. > It definitely helped for transferring ISO images where the whole image > would be changed if some files changed. I set the chunk size to 2048 > for that. Why it defaults to 700 seems odd to me. Not sure - perhaps some early empirical work. When I'm moving files that I know something about I definitely control the block size myself, so for example, when moving databases with a 1K page size, I always use a multiple of that (since I know a priori that's how the database "dirties" the file), and then I scale that up a bit based on database size, to get a reasonable tradeoff between block overhead and extra transfer upon a change detection. -- David /---\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \---/
Re: problems encountered in 2.4.6
David Bolen wrote: > The discovery phase will by default just check timestamps and sizes. > You can adjust that with command line options, including the use of -c > to include a full file checksum as part of the comparison, if for > example, files might change without affecting timestamp or size. > > Once rsync knows what it needs to transfer, then it works its way > through the file list, and for each file it performs a transfer. By > default, that transfer is the rsync protocol - which involves the full > process of dividing the file into chunks with both a strong and > rolling checksum, and doing the computations to figure out what parts > to send and so on. This is where the docs were a bit confusing. There was no clear distinction of checksum types related to the -c option. This implied to me that w/o -c there would be no checksum at all, and what I thought the behaviour would be was what I now understand it to be with -W. > That's why the -W option is really the only logical thing to use with > a single rsync and "local" (on-system or network share/mount) copies. > Under such circumstances, the rsync protocol isn't going to help at > all, and will probably slow things down and take more memory instead. > With -W rsync becomes an intelligent copier (in terms of figuring out > what changed), but that's about it. Actually, the lack of -W isn't helping me at all. The reason is that even for the stuff I do over the network, 99% of it is compressed with gzip or bzip2. If the files change, the originals were changed and a new compression is made, and usually most of the file is different. It definitely helped for transferring ISO images where the whole image would be changed if some files changed. I set the chunk size to 2048 for that. Why it defaults to 700 seems odd to me. There is a feature I would like, and I notice that even with -c this does not happen, but I think it could based on the way rsync works. What I'd like to have is when a whole file is moved from one directory to another, rsync would detect a new file with the same checksum as an existing (potentially to be deleted) file, and copy, move, or link, as appropriate. In theory this should apply to anything anywhere in the whole file tree being processed. -- - | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ | -
RE: problems encountered in 2.4.6
[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes: >Dave Dykstra wrote: > >> That's two different kinds of checksums. The -c option runs a whole-file >> checksum on both sides, but if you don't use -W the rsync rolling checksum >> will be applied. > >So the chunk-by-chunk checksum always is used w/o -W? I guess the docs are >more confusing than I originally thought. It might help if you think of it as two phases - discovery of what files need to be transferred, and then the transfer itself. The discovery phase will by default just check timestamps and sizes. You can adjust that with command line options, including the use of -c to include a full file checksum as part of the comparison, if for example, files might change without affecting timestamp or size. Once rsync knows what it needs to transfer, then it works its way through the file list, and for each file it performs a transfer. By default, that transfer is the rsync protocol - which involves the full process of dividing the file into chunks with both a strong and rolling checksum, and doing the computations to figure out what parts to send and so on. Now, normally this process is divided so that the copy of rsync that does the I/O is local to the file - e.g., for discovery both client and server rsync identify file timestamp/sizes independently (and optionally compute the checksums locally) and then exchange that information. For transfer both rsyncs build up the rolling and chunk checksums and exchange them and then decide what file data to send. But when you are copying with a single rsync (and in particular when one of the files is on the network), then that rsync has to do all the work. That means that during discovery it either 'stat's all files or optionally computes checksums. To do the checksum it has to read the file, so both source and destination get read fully - if either are on the network you will have already spent the network traffic to pull the complete files back to the local machine. Likewise for the transfer - under the rsync protocol, rsync has to compute the checksums for both source and destination files. Now, it'll only do this for those that it wants to transfer, but in those cases it effectively pulls back complete files from the network just to compute the checksums, only to then start transferring them. Even if the rsync protocol yields a very small amount of difference, anything beyond that point is already more than the full file with respect to the network activity that takes place. That's why the -W option is really the only logical thing to use with a single rsync and "local" (on-system or network share/mount) copies. Under such circumstances, the rsync protocol isn't going to help at all, and will probably slow things down and take more memory instead. With -W rsync becomes an intelligent copier (in terms of figuring out what changed), but that's about it. -- David /---\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \---/
Re: problems encountered in 2.4.6
On Fri, May 25, 2001 at 04:33:28PM -0500, Phil Howard wrote: > Dave Dykstra wrote: > > > One possibility here is that I do have /var/run symlinked to /ram/run > > > which is on a ramdisk. So the lock file is there. The file is there > > > but it is empty. Should it have data in it? BTW, it was in ramdisk > > > in 2.4.4 and this max connections problem did not exist, so if there > > > is a ramdisk sensitivity, it's new since 2.4.4. > > > > I don't know if it will show up with data in it or not, I've never tried it. > > You'll probably need to do some straces. > > Where is the count of number of current connections supposed to be kept? > It's obviously not actually being kept in this file, at least not when on > a ramdisk. But if it's supposed to be, that's the problem. OTOH, it is > easy to get the count out of sync this way, too. If a process is killed > or otherwise just dies, the count is higher than real. When I do multi- > process servers with controlled process counts, I like to have the parent > track the number of children running. Of course that precludes using inetd. It locks different ranges of bytes of the file rather than keeping a count in it. I guess the idea with that is if a process dies the operating system will automatically remove the lock. - Dave Dykstra
Re: problems encountered in 2.4.6
On Fri, May 25, 2001 at 03:39:31PM -0500, Phil Howard wrote: > Dave Dykstra wrote: > > > > 2 = > > > When syncronizing a very large number of files, all files in a large > > > partition, rsync frequently hangs. It's about 50% of the time, but > > > seems to be a function of how much work there was to be done. That > > > is, if I run it soon after it just ran, it tends to not hang, but if > > > I run it after quite some time (and lots of stuff to syncronize) it > > > tends to hang. It appears to have completed all the files, but I > > > don't get any stats. There are 3 rsync processes sitting idle with > > > no files open in the source or target trees. > > > > > > At last count there were 368827 files and 8083 symlinks in 21749 > > > directories. > > > > > > df shows: > > > /dev/hda4 42188460 38303916 3884544 91% /home > > > /dev/hdb4 42188460 38301972 3886488 91% /mnt/hdb/home > > > > > > df -i shows: > > > /dev/hda42662400 398419 2263981 15% /home > > > /dev/hdb42662400 398462 2263938 15% /mnt/hdb/home > > > > > > The df numbers are not exact because change is constantly happening > > > on this active server. Drives hda and hdb are identical and are > > > partitioned alike. > > > > > > The command line is echoed from the script that runs it: > > > > > > rsync -axv --stats --delete /home/. /mnt/hdb/home/. >1>'/home/root/backup-hda-to-hdb/home.log' 2>&1 > > > > > > Use the -W option to disable the rsync algorithm. We really ought to make > > that the default when both the source and destination are local. > > I don't want to copy everything every time. That's why I am using > rsync to do this in the first place. I don't understand why this > would be what's hanging. I'm talking about the per-file changes. Even with -W it will only copy the whole files that changed. However, it will copy whole files rather than pieces of files. This is turns out to be much faster when you're mounting a remote filesystem than trying to go through the per-file rsync algorithm because that trades off extra "local" disk access to save bandwidth between the two machine endpoints. If you would run to hdb:/home rather than /mnt/hdb/home that would make a big difference. > > > A deadly embrace? It seems possible. > > > > > > No, the receiving side of an rsync transaction splits itself into two > > processes for the sake of pipelining: one to generate checksums and one to > > accept updates. When you're sending and receiving to the same machine then > > you've got one sender and 2 receivers. > > Right. But what I was suggesting was a deadly embrace in that the > process killed was waiting for something, and the parent was waiting > for something. > > I'm not using the "c" option, so why would checksum be generated? That's two different kinds of checksums. The -c option runs a whole-file checksum on both sides, but if you don't use -W the rsync rolling checksum will be applied. > > > I'm also curious why 26704 has no fd 1. > > > > I don't know. When I tried it all 3 processes had an fd 1. > > Were you looking at it after it hung? Or is it not hanging for you? It didn't hang for me; I didn't try it over a remote filesystem mount. > I am curious if the lack of fd 1 is related to the hang. It is being > started with 1> and 2> redirected to a log file _and_ the whole thing > is being run via the "script" command for a "big picture" logfile. > It was set up this way with the intent to run it from cron, although > I haven't actually added it to crontab, yet, due to the problems. I doubt it. > > > 3 = > > > @ERROR: max connections (16) reached - try again later > > > > > > This occurs after just one connection is active. It behaves as if > > > I had specified "max connections = 1". On another server I set it > > > to 40, and it showed: > > > > > > @ERROR: max connections (40) reached - try again later > > > > > > so it obvious is parsing and keeping the value I configure, but it > > > isn't using it correctly. > > > > > > Also, if I ^C the client, then I get this error every time until I > > > restart the daemon (running in standalone daemon mode, not inetd). > > > So it seems like it counts clients wrong. But I can't get more > > > that 1 right after restarting the server, so it's a little more > > > than that somewhere. > > > > I don't know, I never used max connections. Could indeed be a bug. > > The code looks pretty tricky. It's trying to lock pieces of the file > > /var/run/rsyncd.lock in order for independent processes to coordinate. > > Are you running as root (the lsof above suggests you are)? If not, you > > probably need to specify another file that your daemon has access to in the > > "lock file" option. Otherwise it would probably help for you to run some
Re: problems encountered in 2.4.6
Dave Dykstra wrote: > > 2 = > > When syncronizing a very large number of files, all files in a large > > partition, rsync frequently hangs. It's about 50% of the time, but > > seems to be a function of how much work there was to be done. That > > is, if I run it soon after it just ran, it tends to not hang, but if > > I run it after quite some time (and lots of stuff to syncronize) it > > tends to hang. It appears to have completed all the files, but I > > don't get any stats. There are 3 rsync processes sitting idle with > > no files open in the source or target trees. > > > > At last count there were 368827 files and 8083 symlinks in 21749 > > directories. > > > > df shows: > > /dev/hda4 42188460 38303916 3884544 91% /home > > /dev/hdb4 42188460 38301972 3886488 91% /mnt/hdb/home > > > > df -i shows: > > /dev/hda42662400 398419 2263981 15% /home > > /dev/hdb42662400 398462 2263938 15% /mnt/hdb/home > > > > The df numbers are not exact because change is constantly happening > > on this active server. Drives hda and hdb are identical and are > > partitioned alike. > > > > The command line is echoed from the script that runs it: > > > > rsync -axv --stats --delete /home/. /mnt/hdb/home/. >1>'/home/root/backup-hda-to-hdb/home.log' 2>&1 > > > Use the -W option to disable the rsync algorithm. We really ought to make > that the default when both the source and destination are local. I don't want to copy everything every time. That's why I am using rsync to do this in the first place. I don't understand why this would be what's hanging. > > A deadly embrace? It seems possible. > > > No, the receiving side of an rsync transaction splits itself into two > processes for the sake of pipelining: one to generate checksums and one to > accept updates. When you're sending and receiving to the same machine then > you've got one sender and 2 receivers. Right. But what I was suggesting was a deadly embrace in that the process killed was waiting for something, and the parent was waiting for something. I'm not using the "c" option, so why would checksum be generated? > > I'm also curious why 26704 has no fd 1. > > I don't know. When I tried it all 3 processes had an fd 1. Were you looking at it after it hung? Or is it not hanging for you? I am curious if the lack of fd 1 is related to the hang. It is being started with 1> and 2> redirected to a log file _and_ the whole thing is being run via the "script" command for a "big picture" logfile. It was set up this way with the intent to run it from cron, although I haven't actually added it to crontab, yet, due to the problems. > > 3 = > > @ERROR: max connections (16) reached - try again later > > > > This occurs after just one connection is active. It behaves as if > > I had specified "max connections = 1". On another server I set it > > to 40, and it showed: > > > > @ERROR: max connections (40) reached - try again later > > > > so it obvious is parsing and keeping the value I configure, but it > > isn't using it correctly. > > > > Also, if I ^C the client, then I get this error every time until I > > restart the daemon (running in standalone daemon mode, not inetd). > > So it seems like it counts clients wrong. But I can't get more > > that 1 right after restarting the server, so it's a little more > > than that somewhere. > > I don't know, I never used max connections. Could indeed be a bug. > The code looks pretty tricky. It's trying to lock pieces of the file > /var/run/rsyncd.lock in order for independent processes to coordinate. > Are you running as root (the lsof above suggests you are)? If not, you > probably need to specify another file that your daemon has access to in the > "lock file" option. Otherwise it would probably help for you to run some > straces. I would have presumed since there was a daemon process running (as opposed to running from inetd) that the daemon itself could simply track the connection count. One possibility here is that I do have /var/run symlinked to /ram/run which is on a ramdisk. So the lock file is there. The file is there but it is empty. Should it have data in it? BTW, it was in ramdisk in 2.4.4 and this max connections problem did not exist, so if there is a ramdisk sensitivity, it's new since 2.4.4. -- - | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ | -
Re: problems encountered in 2.4.6
On Fri, May 25, 2001 at 12:14:17PM -0500, Phil Howard wrote: > I switched to 2.4.6 a while back, but have only been making heavy > use of rsync the past couple of months, and have been running into > a few problems that may be bugs. I looked at the bug tracker, but > it was too cumbersome to use effectively. I don't know if these > are real bugs or just configuration mistakes. Maybe you can tell > me. > > The host OS is Linux 2.4.X (X is 0 on some and 2 on others) and > Slackware 7.1 in all cases. > > Here are the things I'm running into that I did not have in 2.4.4: > > 1 = > Write failed: Cannot allocate memory > unexpected EOF in read_timeout > unexpected EOF in read_timeout > > I've seen this happen when there was over 256 meg available space > between ram and swap, so it should not have failed as a result of > not being able to get a reasonable amount from the system. This > also occurs randomly; if I wipe the files back out and run it all > again, it often does not happen the next time, or if it does, it > is not at the same point. Also, the number of files being copied > is smallish (not more than 100), and it happens even when no file > is larger than about 4 meg and the total transfer is no larger than > 20 meg (so even if it pre-loaded every file into ram, there should > be enough space). The happens even if the target directory starts > empty. > > This is done through ssh and I do not recall it happening when using > an rsync daemon. This must be coming from ssh, not rsync. There is no string "Write failed" in the rsync source code, but there is in both ssh 1.2.27 and openssh 2.9p1. > 2 = > When syncronizing a very large number of files, all files in a large > partition, rsync frequently hangs. It's about 50% of the time, but > seems to be a function of how much work there was to be done. That > is, if I run it soon after it just ran, it tends to not hang, but if > I run it after quite some time (and lots of stuff to syncronize) it > tends to hang. It appears to have completed all the files, but I > don't get any stats. There are 3 rsync processes sitting idle with > no files open in the source or target trees. > > At last count there were 368827 files and 8083 symlinks in 21749 > directories. > > df shows: > /dev/hda4 42188460 38303916 3884544 91% /home > /dev/hdb4 42188460 38301972 3886488 91% /mnt/hdb/home > > df -i shows: > /dev/hda42662400 398419 2263981 15% /home > /dev/hdb42662400 398462 2263938 15% /mnt/hdb/home > > The df numbers are not exact because change is constantly happening > on this active server. Drives hda and hdb are identical and are > partitioned alike. > > The command line is echoed from the script that runs it: > > rsync -axv --stats --delete /home/. /mnt/hdb/home/. >1>'/home/root/backup-hda-to-hdb/home.log' 2>&1 Use the -W option to disable the rsync algorithm. We really ought to make that the default when both the source and destination are local. > The log file shows a file list gone all the way to the last file > and lsof done after the hang shows: > > rsync 26651root cwdDIR3,2 4096 24 /root > rsync 26651root rtdDIR3,2 4096 2 / > rsync 26651root txtREG 3,10 187443 8758 >/usr/local/bin/rsync > rsync 26651root memREG3,279276 4239 /lib/ld-2.1.3.so > rsync 26651root memREG3,2 1013224 4249 /lib/libc-2.1.3.so > rsync 26651root memREG3,240360 4274 >/lib/libnss_compat-2.1.3.so > rsync 26651root memREG3,275500 4272 >/lib/libnsl-2.1.3.so > rsync 26651root0u CHR 136,14 16 /dev/pts/14 > rsync 26651root1w REG3,4568981778435 >/home/root/backup-hda-to-hdb/home.log > rsync 26651root2w REG3,4568981778435 >/home/root/backup-hda-to-hdb/home.log > rsync 26651root4u unix 0xcb0f4040 135813770 socket > rsync 26651root5u unix 0xcb0f4cc0 135813771 socket > rsync 26652root cwdDIR 3,68 4096 2 /mnt/hdb/home > rsync 26652root rtdDIR3,2 4096 2 / > rsync 26652root txtREG 3,10 187443 8758 >/usr/local/bin/rsync > rsync 26652root memREG3,279276 4239 /lib/ld-2.1.3.so > rsync 26652root memREG3,2 1013224 4249 /lib/libc-2.1.3.so > rsync 26652root memREG3,240360 4274 >/lib/libnss_compat-2.1.3.so > rsync 26652root memREG3,275500 4272 >/lib/libnsl-2.1.3.so > rsync 26652root1u unix
problems encountered in 2.4.6
I switched to 2.4.6 a while back, but have only been making heavy use of rsync the past couple of months, and have been running into a few problems that may be bugs. I looked at the bug tracker, but it was too cumbersome to use effectively. I don't know if these are real bugs or just configuration mistakes. Maybe you can tell me. The host OS is Linux 2.4.X (X is 0 on some and 2 on others) and Slackware 7.1 in all cases. Here are the things I'm running into that I did not have in 2.4.4: 1 = Write failed: Cannot allocate memory unexpected EOF in read_timeout unexpected EOF in read_timeout I've seen this happen when there was over 256 meg available space between ram and swap, so it should not have failed as a result of not being able to get a reasonable amount from the system. This also occurs randomly; if I wipe the files back out and run it all again, it often does not happen the next time, or if it does, it is not at the same point. Also, the number of files being copied is smallish (not more than 100), and it happens even when no file is larger than about 4 meg and the total transfer is no larger than 20 meg (so even if it pre-loaded every file into ram, there should be enough space). The happens even if the target directory starts empty. This is done through ssh and I do not recall it happening when using an rsync daemon. 2 = When syncronizing a very large number of files, all files in a large partition, rsync frequently hangs. It's about 50% of the time, but seems to be a function of how much work there was to be done. That is, if I run it soon after it just ran, it tends to not hang, but if I run it after quite some time (and lots of stuff to syncronize) it tends to hang. It appears to have completed all the files, but I don't get any stats. There are 3 rsync processes sitting idle with no files open in the source or target trees. At last count there were 368827 files and 8083 symlinks in 21749 directories. df shows: /dev/hda4 42188460 38303916 3884544 91% /home /dev/hdb4 42188460 38301972 3886488 91% /mnt/hdb/home df -i shows: /dev/hda42662400 398419 2263981 15% /home /dev/hdb42662400 398462 2263938 15% /mnt/hdb/home The df numbers are not exact because change is constantly happening on this active server. Drives hda and hdb are identical and are partitioned alike. The command line is echoed from the script that runs it: rsync -axv --stats --delete /home/. /mnt/hdb/home/. 1>'/home/root/backup-hda-to-hdb/home.log' 2>&1 The log file shows a file list gone all the way to the last file and lsof done after the hang shows: rsync 26651root cwdDIR3,2 4096 24 /root rsync 26651root rtdDIR3,2 4096 2 / rsync 26651root txtREG 3,10 187443 8758 /usr/local/bin/rsync rsync 26651root memREG3,279276 4239 /lib/ld-2.1.3.so rsync 26651root memREG3,2 1013224 4249 /lib/libc-2.1.3.so rsync 26651root memREG3,240360 4274 /lib/libnss_compat-2.1.3.so rsync 26651root memREG3,275500 4272 /lib/libnsl-2.1.3.so rsync 26651root0u CHR 136,14 16 /dev/pts/14 rsync 26651root1w REG3,4568981778435 /home/root/backup-hda-to-hdb/home.log rsync 26651root2w REG3,4568981778435 /home/root/backup-hda-to-hdb/home.log rsync 26651root4u unix 0xcb0f4040 135813770 socket rsync 26651root5u unix 0xcb0f4cc0 135813771 socket rsync 26652root cwdDIR 3,68 4096 2 /mnt/hdb/home rsync 26652root rtdDIR3,2 4096 2 / rsync 26652root txtREG 3,10 187443 8758 /usr/local/bin/rsync rsync 26652root memREG3,279276 4239 /lib/ld-2.1.3.so rsync 26652root memREG3,2 1013224 4249 /lib/libc-2.1.3.so rsync 26652root memREG3,240360 4274 /lib/libnss_compat-2.1.3.so rsync 26652root memREG3,275500 4272 /lib/libnsl-2.1.3.so rsync 26652root1u unix 0xcb93b9a0 135813772 socket rsync 26652root2w REG3,4568981778435 /home/root/backup-hda-to-hdb/home.log rsync 26652root3u unix 0xca8edcc0 135814161 socket rsync 26652root5u unix 0xcc9969a0 135814163 socket rsync 26704root cwdDIR 3,68 4096 2 /mnt/hdb/home rsync 26704root rtdDIR3,2 4096 2 / rsync 26704root txtREG 3,10 187443 8758 /usr/local/bin/rsync rsync